Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

Counting PDF Pages using Regular Expressions

0.00/5 (No votes)
11 Jul 2006 1  
Explains how to count PDF pages using regular expressions in C#

Introduction

During one of my .NET projects working with Adobe PDF files, I encountered the need to simply retrieve the page count of a specific file. I did not need to manipulate the PDF at all so buying a .NET component for this task sounded a little inconvenient.

After a few hours of researching for an easy solution, I found out that the old regular expressions might hold the answer.

Opening the PDF in Notepad, I noticed that for each page in the file there is a specific character sequence: "/Type /Page" (depending on the PDF version with or without the space between the two words). So, all we need to do is to count how many times this sequence repeats in the file.

Getting It Done !

First, we need to open the PDF file using a FileStream and read the contents as a string using a StreamReader.

FileStream fs = new FileStream(@"c:\a.pdf", FileMode.Open, FileAccess.Read);
StreamReader r = new StreamReader(fs);
string pdfText = r.ReadToEnd();

Once we have the PDF text, all we need to do is to create the regular expression and count the matches.

Regex rx1 = new Regex(@"/Type\s*/Page[^s]");
MatchCollection matches = rx1.Matches(pdfText);
MessageBox.Show("The PDF file has " + matches.Count.ToString() + " page(s).";

Voila!

History

  • 11th July, 2006: Initial post

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here