Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

Extract PDF file contents to a text file.

0.00/5 (No votes)
1 Jun 2008 1  
This application extracts the contents of a PDF file into a text file.

Introduction

This application extracts the contents of a PDF file and writes the content into a text file.

Background

I had a requirement to extract the contents of a PDF file. I had to go through so many articles to finish this thing. So, I've uploaded this in codeproject so that anyone can easily make use of this.

Using the code

First step : Add reference to the itextsharp.dll.

Second step : Copy paste the PDFParser.cs to your application.

Third step : Add the namespace "using PdfToText" to your code file.

Fourth step : In your form add appropriate controls and then in code behind, Create a object instance of "PDFParser" and use it to call the "ExtractText" method in "PDFParser.cs" and pass the input PDF file name and output text file name (to which you want the data to be sent) as parameters.

Fifth step : Build and run your application.

/// <summary>

/// Extracts a text from a PDF file.

/// </summary>

public bool ExtractText(string inFileName, string outFileName)   {.... }     

The method "ExtractTextFromPDFBytes(byte[] input)" processes an uncompressed Adobe(text) object and extracts the text.

The "itextsharp.dll" and "PDFParser.cs" were GNU public licensed. So, you can very well use them in your applications.

Note : This application will extract the text from PDF files which were created using Adobe Reader only.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here