Introduction
Hello friends, this is my first article in CodeProject.com. This article is mainly intended to read content from
a PDF file and convert that into a string using C#.
Background
This was actually assigned as a task for me. Actually I Googled about this and finally did
it with a simple code. I'm sure this code will be very helpful for beginners.
Using the code
The following steps will guide you to read content from a PDF file:
- To start with this, you need to download itextsharp-all-5.2.1, which can be download from here.
- Extract the whole archive (inside itextsharp-all-5.2.1 folder also) to your local directory.
You have successfully completed the initial step in the process..... hurrah.....! ! ! !
Now open Microsoft Visual studio. For me it is Microsoft Visual C# 2010 Express.
- New project --> WindowsFormsApplication --> Give project name (I named
mine PDF_To_Text).
- Add itextsharp-all-5.2.1.dll as reference.
Select Project menu --> Select Browse tab --> Select itextsharp.dll from
the local directory.
- Place a "
richTextBox1
" control in the Form work space. - Now paste the following code in Form1.cs.
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Windows.Forms;
using iTextSharp.text.pdf;
using iTextSharp.text.pdf.parser;
namespace WindowsFormsApplication1
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
ExtractTextFromPDFPage("c:\sample.pdf", 1);
}
public void ExtractTextFromPDFPage(string pdfFile, int pageNumber)
{
PdfReader reader = new PdfReader(pdfFile);
string text = PdfTextExtractor.GetTextFromPage(reader, pageNumber);
try { reader.Close(); }
catch { }
richTextBox1.Text = text;
}
}
}
Look how simple it is....!!! " src="http://www.codeproject.com/script/Forums/Images/smiley_smile.gif" />
- Now Build the solution using Ctrl+Shift+B, or Build the solution by selecting
the Build menu from the menu bar.
- Once succeeded, Run the application by pressing F5.
- You will find the file content is converted into text and displayed in the
RichTextBox
control.
That's it, you have successfully converted a PDF file into text.
Note
Here c:\sample.pdf is where I kept my PDF file. So you should update the path
to your file. The second parameter denotes which page you need to get converted.