Introduction
This application extracts the contents of a PDF file and writes the content into a text file.
Background
I had a requirement to extract the contents of a PDF file. I had to go through so many articles to finish this thing. So, I've uploaded this in codeproject so that anyone can easily make use of this.
Using the code
First step : Add reference to the itextsharp.dll.
Second step : Copy paste the PDFParser.cs to your application.
Third step : Add the namespace "using PdfToText" to your code file.
Fourth step : In your form add appropriate controls and then in code behind, Create a object instance of "PDFParser" and use it to call the "ExtractText" method in "PDFParser.cs" and pass the input PDF file name and output text file name (to which you want the data to be sent) as parameters.
Fifth step : Build and run your application.
public bool ExtractText(string inFileName, string outFileName) {.... }
The method "ExtractTextFromPDFBytes(byte[] input)" processes an uncompressed Adobe(text) object and extracts the text.
The "itextsharp.dll" and "PDFParser.cs" were GNU public licensed. So, you can very well use them in your applications.
Note : This application will extract the text from PDF files which were created using Adobe Reader only.