Introduction
Hello friends, this is my first article in CodeProject.com. This article is mainly intended to read content from
a PDF file and convert that into a string using C#.
Background
This was actually assigned as a task for me. Actually I Googled about this and finally did
it with a simple code. I'm sure this code will be very helpful for beginners.
Using the code
The following steps will guide you to read content from a PDF file:
- To start with this, you need to download itextsharp-all-5.2.1, which can be download from here.
- Extract the whole archive (inside itextsharp-all-5.2.1 folder also) to your local directory.
You have successfully completed the initial step in the process..... hurrah.....! ! ! !
Now open Microsoft Visual studio. For me it is Microsoft Visual C# 2010 Express.
- New project --> WindowsFormsApplication --> Give project name (I named
mine PDF_To_Text).
- Add itextsharp-all-5.2.1.dll as reference.
Select Project menu --> Select Browse tab --> Select itextsharp.dll from
the local directory.
- Place a "
richTextBox1
" control in the Form work space. - Now paste the following code in Form1.cs.
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Windows.Forms;
using iTextSharp.text.pdf;
using iTextSharp.text.pdf.parser;
namespace WindowsFormsApplication1
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
ExtractTextFromPDFPage("c:\sample.pdf", 1);
}
public void ExtractTextFromPDFPage(string pdfFile, int pageNumber)
{
PdfReader reader = new PdfReader(pdfFile);
string text = PdfTextExtractor.GetTextFromPage(reader, pageNumber);
try { reader.Close(); }
catch { }
richTextBox1.Text = text;
}
}
}
Look how simple it is....!!! " src="http://www.codeproject.com/script/Forums/Images/smiley_smile.gif" />
- Now Build the solution using Ctrl+Shift+B, or Build the solution by selecting
the Build menu from the menu bar.
- Once succeeded, Run the application by pressing F5.
- You will find the file content is converted into text and displayed in the
RichTextBox
control.
That's it, you have successfully converted a PDF file into text.
Note
Here c:\sample.pdf is where I kept my PDF file. So you should update the path
to your file. The second parameter denotes which page you need to get converted.
There are only 10 type of people in this programming world....
one who knows the binary and other who doesn't.