Introduction
PDF is arguably one of the most influential and widely used file formats in the world. It is no surprise that software developers go to great lengths to provide solutions that support PDF and its many features.
LEADTOOLS Document and Medical Imaging SDKs can use LEAD's Advanced PDF Plug-in to add robust PDF support to their .NET applications. In addition to loading and saving text-searchable and image-based PDF files, LEADTOOLS can extract and edit text (without requiring OCR), merge and split pages, read and update bookmarks, links, jumps, metadata and much more.
In this article, we will walk through several of the core features included with the new LEADTOOLS Advanced PDF Plug-in.
Key Features in the LEADTOOLS Advanced PDF Plug-in
PDF Document Features
- Load and view any PDF document
- Extract text (characters, words and lines), fonts, images, rectangles and hyperlinks with location and size
- Full Unicode support including Chinese, Japanese, Arabic and Hebrew
- Parse the document structure by reading PDF bookmarks (Table of Contents) and internal links (jumps)
- Generate a raster image or thumbnail of any page
PDF File Features
- Comprehensive multipage support including
- Merge existing PDF files into a single PDF
- Split a single PDF into multiple PDF files
- Extract, delete, insert or replace any page in existing PDF files
- Read and update the Table of Contents (TOC) of existing PDF files
- Convert any existing PDF to PDF/A
- Linearize (optimize for web viewing) any existing PDF
- Encrypt/decrypt documents and convert to and from any PDF version
- Read, write and update all PDF metadata such as author, title, subject and keywords
- Read, write and update the PDF document Table of Contents
- Convert (Distill) postscript to PDF with optimization for eBook, Screen and Prepress
SDK Products the Advanced PDF Plug-in can be added to
Using the Code
LEADTOOLS Advanced PDF features are built upon two classes within the Leadtools.Pdf namespace: PDFFile and PDFDocument. The PDFFile class is used for modifying metadata, pages and conversion. PDFDocument handles the parsing and modifying the document object structure of PDF files.
In the example below, we use the PDFFile and PDFDocumentProperties classes to load a PDF and modify its metadata.
string fileName = @"C:\Document.pdf";
PDFFile file = new PDFFile(fileName);
file.DocumentProperties = new PDFDocumentProperties();
file.DocumentProperties.Author = "Me";
file.DocumentProperties.Title = "My Title";
file.DocumentProperties.Subject = "My Subject";
file.DocumentProperties.Creator = "My Application";
file.DocumentProperties.Modified = DateTime.Now;
file.SetDocumentProperties(null);
Similarly, the PDFFile class exposes several high level functions for inserting, deleting, and merging pages from PDF files and performing document conversions such as linearization (optimizing for web viewing) and PDF/A. The following example merges three files and converts them to PDF/A.
string fileName1 = @"C:\File1.pdf";
string fileName2 = @"C:\File2.pdf";
string fileName3 = @"C:\File3.pdf";
string finalFileName = @"C:\Final.pdf";
PDFFile file = new PDFFile(fileName1);
file.MergeWith(new string[] { fileName2, fileName3 }, finalFileName);
file = new PDFFile(finalFileName);
file.ConvertToPDFA(null);
Probably the most important feature of a PDF is its searchable text, which is where the PDFDocument class is utilized. Using the PDFParsePagesOptions, you can choose what to parse from the PDF including objects, fonts, hyperlinks and more. In the following example, we will load a PDF and display its searchable text in a MessageBox
.
string fileName = @"C:\Document.pdf";
PDFDocument document = new PDFDocument(fileName);
document.ParsePages(PDFParsePagesOptions.Objects, 1, 1);
PDFDocumentPage page = document.Pages[0];
StringBuilder text = new StringBuilder();
foreach (PDFObject obj in page.Objects)
{
switch (obj.ObjectType)
{
case PDFObjectType.Text:
text.Append(obj.Code);
if (obj.TextProperties.IsEndOfLine)
text.AppendLine();
break;
case PDFObjectType.Image:
case PDFObjectType.Rectangle:
default:
break;
}
}
MessageBox.Show(text.ToString());
Conclusion
LEADTOOLS provides developers with access to the world's best performing and most stable imaging libraries in an easy-to-use, high-level programming interface enabling rapid development of business-critical applications.
PDF is only one of the many technologies LEADTOOLS has to offer. For more information on our other products, be sure to visit our home page, download a free fully functioning evaluation SDK, and take advantage of our free technical support during your evaluation.
Download the Full Example
The demo from which the screenshots and code snippets were taken is available within the main LEADTOOLS evaluation. To run this example you will need the following:
Support
Need help getting this sample up and going? Contact our support team for free technical support! For pricing or licensing questions, you can contact our sales team (sales@leadtools.com).