Click here to Skip to main content
16,022,069 members
Please Sign up or sign in to vote.
1.00/5 (1 vote)
See more:
I need to extract paragraphs from a PDF using a free library for C# and VS. If a paragraph continues on the next page, should it be returned as one paragraph and not two? Do you have an example in C#? I'm trying to find out if iText8 does this but I can't find the answer.

What I have tried:

(Nothing, apparently; typing the word "itext" 28 times doesn't count.)
Posted
Updated 1-Aug-24 21:24pm
v2

Try NuGet Gallery | iTextSharp 5.5.13.4[^] - it's free and pretty much the standard.
It's a C# port of the Java iText and there is plenty of documentation on Google.
 
Share this answer
 
When you look at a PDF, it is tempting to think that what you see on the screen conforms to what you would understand from looking at a paper page. So you would think that a PDF file understood the concept of paragraphs and so on. Unfortunately this is not the case; PDFs do not support layouts such as paragraphs. Instead, they have text streams that are laid out into individual pages, as series of text runs.

One of the confusing things is that iTextSharp (and other libraries) use the idea of Paragraph as an abstraction when writing documents. This gives the impression that a paragraph is an actual property of PDFs. So sorry, there's no foolproof way to work out paragraphs.
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900