Introduction
This article is about extracting image files from a PDF file. I was looking for a free solution for converting .pdf files to image files, but I didn't find a simple and free solution. I therefore tried until I found a free solution by using the "Adobe Acrobat COM component" and using the CAcroPDDoc
class.
Pre-requisites
You must have "Adobe Acrobat Reader" installed on your system. My system had Adobe Acrobat 9.0 installed.
Using the code and description
First, I added a reference to the Adobe Acrobat COM component.
I then wrote the PDFConvertor
class that has a method named Convert(....)
which would do the conversion. Here is the code in my class used for accessing a PDF document:
Acrobat.CAcroPDPage pdfPage = null;
Acrobat.CAcroRect pdfRect = new Acrobat.AcroRect();
Acrobat.AcroPoint pdfPoint =new Acrobat.AcroPoint();
CAcroPDDoc
class is for working with the PDF fileCAcroPDPage
class is for working with the pages in the PDF fileCAcroRect
and AcroPoint
classes are for defining the dimensions of a page
Here is how I open a PDF document:
if (pdfDoc.Open(sourceFileName))
For opening a specified PDF file, I use the open()
method of the pdfDoc
object; it returns false
in the case of an error.
pageCount = pdfDoc.GetNumPages();
After reading the page count with pdfDoc.GetNumPages()
, I then read a page.
pdfPage = (Acrobat.CAcroPDPage)pdfDoc.AcquirePage(i);
Then, I extract a pages in the PDF file using pdfDoc.AcquirePage(i)
and assign it to the pdfPage
object; the variable i
indicates the number of the current page.
pdfPoint = (Acrobat.AcroPoint)pdfPage.GetSize();
pdfRect.Left = 0;
pdfRect.right = pdfPoint.x;
pdfRect.Top = 0;
pdfRect.bottom = pdfPoint.y;
I then get the page size with pdfPage.GetSize()
and put it into a pdfPoint
object. This is required for specifying the region of the PDF page for copying the page into the Clipboard.
pdfPage.CopyToClipboard(pdfRect, 0, 0, 100);
Make sure that pdfPage
doesn't have any method for saving as the referenced page. pdfPage.CopyToClipboard(pdfRect, 0, 0, 100)
can help us there. It has four arguments: first is the rect of the page that was previously discussed, second and third are x
, y
offsets of the page that usually are 0, and fourth is the zoom percent.
Clipboard.GetImage().Save(outimg, outPutImageFormat);
Finally, with Clipboard.GetImage().Save(...)
, we can save the output image.
Here is the code for the Convert
method of my class:
#region Convert
public int Convert(string sourceFileName,
string DestinationPath, ImageFormat outPutImageFormat)
{
if (pdfDoc.Open(sourceFileName))
{
pageCount = pdfDoc.GetNumPages();
for (int i = 0; i < pageCount; i++)
{
pdfPage = (Acrobat.CAcroPDPage)pdfDoc.AcquirePage(i);
pdfPoint = (Acrobat.AcroPoint)pdfPage.GetSize();
pdfRect.Left = 0;
pdfRect.right = pdfPoint.x;
pdfRect.Top = 0;
pdfRect.bottom = pdfPoint.y;
pdfPage.CopyToClipboard(pdfRect, 500, 110, 100);
string outimg = "";
string filename=sourceFileName.Substring(
sourceFileName.LastIndexOf("\\"));
if (pageCount == 1)
outimg = DestinationPath + "\\" + filename +
"." + outPutImageFormat.ToString();
else
outimg = DestinationPath + "\\" + filename +
"_" + i.ToString() + "." + outPutImageFormat.ToString();
Clipboard.GetImage().Save(outimg, outPutImageFormat);
OnExportProgressChanging(outimg);
}
Dispose();
}
else
{
Dispose();
throw new System.IO.FileNotFoundException(sourceFileName +
" Not Found!");
}
return pageCount;
}
#endregion
I have also implemented a class for indicating the progress with the event handler ProgressChangingEventHandler
that exists in the source of my project.
Enjoy!