Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / operating-systems / Windows

Recursively Create Thumbnails from PDF Files

4.73/5 (6 votes)
7 Oct 2007CPOL7 min read 1   2.6K  
Create thumbnails for your PDF files, recursive from starting dir, with seperate config using VB.NET
Screenshot - thumbs-new3.jpg

December 2006 Version (Old Style Thumbs)

October 2007 Version (New Style Thumbs)

Screenshot - cmd-new.jpg

Introduction

This project is based on the excellent project by Jonathan Hodgson, found here. Most of the tougher parts of this lump of code is his work, and the full credit goes to him for that. Reading his article will help enormously in understanding the workings and limitations of this project.

His version hasn't been updated in 2.5 years and unfortunately it lacked a few important options, mainly in configurability. This version has a few extra options, which allow for a more wide-spread usability and greater flexibility.

Note after October 2007 update: In the original version, only simple thumbnails were generated. Since then, the codebase has moved on to generate a more elaborate thumbnail style. To generate the old style thumbnails, please use the codebase from December 2006, as the changes are incompatible with the old version. This description will focus on the elaborate thumbs, and will, where needed, add references to the old codebase.

Sample screenshot

The old style thumbnails...

Screenshot - thumbs-new.jpg

And the new style thumbnails (default using a white background though)

Using the Code

The code is mainly based on a few fairly simple scripting examples, but the combinations render a decent result.

The most important change to Jonathan's code is the added recursiveness of the script. It is now able to work from a starting dir, and from there make thumbnails for all PDF files in that dir and its subdirs.

Next to that, the most important variables are now configurable from the config file, taking care of endless rebuilding for path- and thumbnailsize-changes. Next to that, the startdir is also available as a command-line parameter. Just add the path to start after the executable. e.g. pdfthumbnail.exe c:\pdf.

Lastly the few minor bugs that the original script contained have been fixed.

The full Adobe Acrobat (not the free reader) must be installed on the machine to run this application. This code has been tested with Acrobat versions 6.0 to 8.0. In notes on which version to use, see the Points of Interest section.

Luckily, at the time of writing, the Acrobat SDK is no longer needed for the application to function, which is a good thing considering the fact that the SDK is really hard to find nowadays.

For detailed descriptions of the main Acrobat functions, please read Jonathan's article, it'll be a great help. Here I will focus on the main changes and most important functions.

Part 1 - The Main Sub

First we start off with the starting code to process the given starting dir. The starting dir itself is configured using either the config file included with the binary or as a parameter in the command line.

VB.NET
Sub Main()
    ' Declaring startdir from commandline if passed on, else using
    ' the settings from the configuration file (no trailing slash)

    Dim appSettings As AppSettingsReader = New AppSettingsReader
    Dim sDir As String

    If Command() = "" Then
        With appSettings
            sDir = .GetValue("startDir", GetType(String))
        End With
    Else
        sDir = Command()
    End If

    ' Passing startdir on to the recursive sub dirsearch
    DirSearch(sDir)
End Sub

Part 2 - The Config File

The config file currently has several options. Not only is the startdir defined (which is used above), but also the outputpath, the thumbnailsizes and several other variables are defined.

In the recursive sub that does the actual work, we therefore start with requesting the first part of the variables from the config file. The second part is requested inside the codeblock that checks for the orientation of the file. For this code, see part 4.

Added here is rotatelandscapefiles parameter. This is enabled by default, and is used to rotate landscape oriented files to still generate a portrait thumbnail without losing too much detail in the thumbnail itself.

VB.NET
Sub DirSearch(ByVal sDir As String)

' Requesting variables from appsettings file
Dim appSettings As AppSettingsReader = New AppSettingsReader
Dim jpgOutputPath As String
Dim rotateLandscapeFiles As Boolean

With appSettings
    jpgOutputPath = .GetValue("jpgOutputPath", GetType(String))
    rotateLandscapeFiles = .GetValue("rotateLandscapeFiles", GetType(Boolean))
End With

Part 3 - The Recursive Part

To make the project actually recursive, the function that checks the files needs to check not only the starting dir, but all dirs below that. To achieve this, we use the IO.SearchOption.AllDirectories option.

VB.NET
Try

    ' Get full list of files to process from the startdir
    ' Searchoption.alldirectories enables recursive function in this.
    ' Pattern match for pdf is case sensitive, currently only lowercase
    ' extensions are included properly.
    Dim dirInf As New IO.DirectoryInfo(sDir)
    For Each fileInf As IO.FileInfo In dirInf.GetFiles_
            ("*.pdf", IO.SearchOption.AllDirectories)

        ' Replacing pdf extension with jpg and adding the outputpath,
        Dim inputfilename As String = fileInf.Name
        Dim inputfile As String = fileInf.FullName

        Dim outputFile As String = _
            (jpgOutputPath & inputfilename.Replace(".pdf", ".jpg"))

        ' for all these files found, it renders a thumbnail.
        ' Details of this code is found in part 4

    Next

Catch ex As Exception

    Console.WriteLine(ex)

End Try

Part 4 - The PDF Code

This part of the code has been changed quite a lot from Jonathan's original code. We will take a look at the main blocks needed to generate the thumbnail.

In short, the following steps can be defined:

  • Opening the PDF and acquiring the first page
  • Copying the page to the clipboard and retrieving it for processing
  • Checking for the orientation of the PDF and choosing the right overlay
  • Preparing the overlay
  • Preparing the PDF page
  • Overlaying the PDF page with the overlay image
  • Saving the thumbnail

We'll start with opening the PDF, acquiring the first page and copying it to the clipboard.

VB.NET
' Skipping if thumbnail already exists in output path
Dim thumbnail As New FileInfo(outputFile)
If Not thumbnail.Exists() Then

    ' Create the document
    Dim pdfDoc As New Acrobat.AcroPDDoc

    ' Open the document
    Dim ret As Boolean
    ret = pdfDoc.Open(inputfile)

    ' if opening document fails, give error
    If ret = False Then
    Throw New FileNotFoundException
    End If

    ' Get the first page
    Dim pdfPage As Acrobat.AcroPDPage = pdfDoc.AcquirePage(0)

    ' Get the size of the page
    Dim pdfRectTemp As Object = pdfPage.GetSize

    ' Declare PDFRect to hold dimensions of the page and assign them
    ' The PDFRect you get back from GetSize has properties
    ' x and y, but the PDFRect you have to supply CopyToClipboard
    ' has left, right, top, bottom
    Dim pdfRect As New Acrobat.AcroRect

    pdfRect.Left = 0
    pdfRect.right = pdfRectTemp.x
    pdfRect.Top = 0
    pdfRect.bottom = pdfRectTemp.y

    ' Render to clipboard, scaled by 100 percent, with 0,0 as origin
    Call pdfPage.CopyToClipboard(pdfRect, 0, 0, 100)

After this, we'll retrieve it again from the clipboard to process it. For that, we start by checking the orientation of the file. Here we also request the next part of variables from the config file, and use them to set the width and height of both the portrait and landscape scenarios. Here we also fix the relative sizes for odd-shaped PDFs.

The routine is fairly simple, though due to the many variables it might seem complex.
We start by detecting the sizes of the original PDF page, and check if the width of the page is smaller than the height (that would make it a portrait file). Next, to maximize the shown thumbnail, we check which of the two sizes (height or width) is the one to limit. To this we emulate the 'constrain'-function as seen in e.g. Photoshop. This renders us the optimal values for the thumbnail height and width without deforming it.

After this, we do the same for the landscape files, where depending on the value of the rotate-variable, we either rotate and process a portrait thumb or generate a landscape thumb.

VB.NET
' Retrieving data back from clipboard.
Dim clipboardData As IDataObject = Clipboard.GetDataObject()

If (clipboardData.GetDataPresent(DataFormats.Bitmap)) Then

Dim pdfBitmap As Bitmap = clipboardData.GetData(DataFormats.Bitmap)

' Declaring locations of the overlay images and used variables
Dim templatePortraitFile As String = Application.StartupPath & "\boek.jpg"
Dim templateLandscapeFile As String = _
    Application.StartupPath & "\thumb-landscape.gif"
Dim thumbnailWidth As Integer
Dim thumbnailWidthRelative As Integer
Dim thumbnailHeight As Integer
Dim thumbnailHeightRelative As Integer
Dim templateFile As String
Dim templateWidth As Integer
Dim templateHeight As Integer

' Retrieve width and height of the templatefile
With appSettings
    templateWidth = .GetValue("templateWidth", GetType(Integer))
    templateHeight = .GetValue("templateHeight", GetType(Integer))
End With

' Switch between portrait and landscape.
If (pdfRectTemp.x < pdfRectTemp.y) Then
    templateFile = templatePortraitFile
    Console.WriteLine("Using Portrait template")
    Console.WriteLine("PDF width: {0}", pdfRectTemp.x)
    Console.WriteLine("PDF height: {0}", pdfRectTemp.y)

    With appSettings
        thumbnailWidth = .GetValue("thumbnailWidthPortrait", GetType(Integer))
        thumbnailHeight = .GetValue("thumbnailHeightPortrait", GetType(Integer))
        Dim relative As Decimal = (pdfRectTemp.y / thumbnailHeight)
        thumbnailWidthRelative = (pdfRectTemp.x / relative)

        ' check if the relative thumbnail width exceeds the actual thumbnail size
        ' if so, we set the width at the maximum, and recalculate the height.
        If (thumbnailWidthRelative > thumbnailWidth) Then
            Dim relativelarge As Decimal = thumbnailWidthRelative / thumbnailWidth
            thumbnailWidthRelative = thumbnailWidth
            thumbnailHeightRelative = thumbnailHeight / relativelarge
        Else
            thumbnailHeightRelative = thumbnailHeight
        End If
        Console.WriteLine("Relative height factor is: {0}", relative)
        Console.WriteLine("Relative Thumbnail Height set to: {0}", _
                thumbnailHeightRelative)
        Console.WriteLine("Relative Thumbnail Width set to: {0}", _
                thumbnailWidthRelative)
    End With

Else

    ' If rotate option is enabled, use portrait file with rotate settings
    If (rotateLandscapeFiles) Then
        pdfBitmap.RotateFlip(RotateFlipType.Rotate270FlipNone)
        templateFile = templatePortraitFile
        Console.WriteLine("Using Portrait template")
        Console.WriteLine("PDF width: {0}", pdfRectTemp.x)
        Console.WriteLine("PDF height: {0}", pdfRectTemp.y)

        With appSettings
            thumbnailWidth = .GetValue("thumbnailWidthPortrait", GetType(Integer))
            thumbnailHeight = .GetValue("thumbnailHeightPortrait", GetType(Integer))
            Dim relative As Decimal = (pdfRectTemp.x / thumbnailHeight)
            thumbnailWidthRelative = (pdfRectTemp.y / relative)
            If (thumbnailWidthRelative > thumbnailWidth) Then
                Dim relativelarge As Decimal = _
                    thumbnailWidthRelative / thumbnailWidth
                thumbnailWidthRelative = thumbnailWidth
                thumbnailHeightRelative = thumbnailHeight / relativelarge
            Else
                thumbnailHeightRelative = thumbnailHeight
            End If
            Console.WriteLine("Relative height factor is: {0}", relative)
            Console.WriteLine("Relative Thumbnail Height set to: {0}", _
                thumbnailHeightRelative)
            Console.WriteLine("Relative Thumbnail Width set to: {0}", _
                thumbnailWidthRelative)
        End With

    'Else use the landscape settings
    Else
        templateFile = templateLandscapeFile
        Console.WriteLine("Using Landscape template")
        With appSettings
            thumbnailWidth = .GetValue("thumbnailWidthLandscape", GetType(Integer))
            thumbnailHeight = .GetValue("thumbnailHeightLandscape", GetType(Integer))
            Dim relative As Decimal = (pdfRectTemp.x / thumbnailWidth)
            thumbnailHeightRelative = (pdfRectTemp.y / relative)
            If (thumbnailHeightRelative > thumbnailHeight) Then
                Dim relativelarge As Decimal = _
                    thumbnailHeightRelative / thumbnailHeight
                thumbnailHeightRelative = thumbnailHeight
                thumbnailWidthRelative = thumbnailWidth / relativelarge
            Else
                thumbnailWidthRelative = thumbnailWidth
            End If
            Console.WriteLine("Relative width factor is: {0}", relative)
            Console.WriteLine("Relative Thumbnail Height set to: {0}", _
                thumbnailHeightRelative)
            Console.WriteLine("Relative Thumbnail Width set to: {0}", _
                thumbnailWidthRelative)
        End With
    End If
End If

Next we use the overlay image to generate the final thumbnail. To do this, we first render the large page to a small bitmap. After this, we center the image according to the settings in the config file, and translate it to the right position to fit the mould.

Lastly we transform the image using a matrix function to give it the perception of depth needed to match up with the background image, and paste the layers on top of each other.

VB.NET
' Load the template graphic
Dim templateBitmap As Bitmap = New Bitmap(templateFile)
Dim templateImage As Image = Image.FromFile(templateFile)

' Render to small image using the bitmap class
Dim pdfImage As Image = pdfBitmap.GetThumbnailImage_
    (thumbnailWidthRelative, thumbnailHeightRelative, Nothing, Nothing)

' Create new blank bitmap
Dim thumbnailBitmap As Bitmap = New Bitmap_
    (templateWidth, templateHeight, Imaging.PixelFormat.Format32bppArgb)


' To overlayout the template with the image, we need to set the transparency
templateBitmap.MakeTransparent()

Dim thumbnailGraphics As Graphics = Graphics.FromImage(thumbnailBitmap)
Dim c As System.Drawing.Color = System.Drawing.ColorTranslator.FromHtml("#ced7de")
thumbnailGraphics.Clear(c)

Dim centerHorizontal As Boolean
Dim centerVertical As Boolean
With appSettings
    centerHorizontal = .GetValue("centerHorizontal", GetType(Boolean))
    centerVertical = .GetValue("centerVertical", GetType(Boolean))
End With

'Calculating positioning to center the thumbnail if the options are enabled
Dim horizontalLocation As Integer = 0
Dim verticalLocation As Integer = 0
If (centerHorizontal) Then
    horizontalLocation = ((thumbnailWidth - thumbnailWidthRelative) / 2) + 22
    Console.WriteLine("Centering thumbnail horizontally")
End If

If Not (centerVertical) Then
    verticalLocation = ((thumbnailHeight - thumbnailHeightRelative) / 2) + 63
    Console.WriteLine("Centering thumbnail vertically")
End If

' Transforming the image to fit the new style of thumbnail
Dim matrix As Matrix = New Matrix()
matrix.Translate(0, 0)
matrix.Shear(0.0F, -0.42F)

' Draw rendered pdf image to new blank bitmap
thumbnailGraphics.DrawImage(templateImage, 0, 0)
thumbnailGraphics.InterpolationMode = Drawing2D.InterpolationMode.HighQualityBicubic
thumbnailGraphics.Transform = matrix
thumbnailGraphics.DrawImage(pdfImage, horizontalLocation, _
    verticalLocation, thumbnailWidthRelative, thumbnailHeightRelative)

Next we output the result to an image using the encoding parameters, of which the quality of the output image is configurable in the config file. The encoding parameters use a function called GetEncoderInfo, which assembles all the parameters and uses them as input for the output image.

VB.NET
Dim myImageCodecInfo As ImageCodecInfo
Dim myEncoder As Encoder
Dim myEncoderParameter As EncoderParameter
Dim myEncoderParameters As EncoderParameters
Dim quality As Int32
With appSettings
    quality = .GetValue("quality", GetType(Int32))
End With

Console.WriteLine("Using jpeg compression quality: {0}", quality)

' Get an ImageCodecInfo object that represents the JPEG codec.
myImageCodecInfo = GetEncoderInfo(ImageFormat.Jpeg)

' Create an Encoder object based on the GUID
' for the Quality parameter category.
myEncoder = Encoder.Quality

' Create an EncoderParameters object.
' An EncoderParameters object has an array of EncoderParameter
' objects. In this case, there is only one
' EncoderParameter object in the array.
myEncoderParameters = New EncoderParameters(1)

' Save the bitmap as a JPEG file with quality level from configfile.
myEncoderParameter = New EncoderParameter(myEncoder, quality)
myEncoderParameters.Param(0) = myEncoderParameter

thumbnailBitmap.Save(outputFile, myImageCodecInfo, myEncoderParameters)

' Writing to console if thumbnail is created properly
Console.WriteLine("Generated thumbnail... {0}", outputFile)
Console.WriteLine("--------------------------------------")
Console.WriteLine("")

The function that takes care of the encoder parameters is stated as a separate private function:

VB.NET
Private Function GetEncoderInfo(ByVal format As ImageFormat) As ImageCodecInfo
    Dim j As Integer
    Dim encoders() As ImageCodecInfo
    encoders = ImageCodecInfo.GetImageEncoders()

    j = 0
    While j < encoders.Length
        If encoders(j).FormatID = format.Guid Then
            Return encoders(j)
        End If
        j += 1
    End While
    Return Nothing

End Function

Finally, we'll release all created objects. After this, the next statement starts to process the next file found.

VB.NET
thumbnailGraphics.Dispose()

End If
pdfDoc.Close()
Marshal.ReleaseComObject(pdfPage)
Marshal.ReleaseComObject(pdfRect)
Marshal.ReleaseComObject(pdfDoc)

End If

Points of Interest

After all these code snippets, it's time to take a look at what this project cannot do. Sadly it isn't perfect, and, like all software, it has its limitations.

First of all, it still requires a full Acrobat version. As commented before, the usage of the Acrobat libraries is not allowed for serverbased solutions. Though even Adobe isn't really clear about what is and what is not allowed, we'll consider this a grey area. It would be great if someone is able to remove the dependency to the Acrobat libraries. On the other hand, this way it works like a charm.

Secondly, there's still one strange bug. Somehow the application fails after 256 consecutive files processed if Acrobat 7.0 is used. Version 6 and 8 work fine, and both continue processing even after more than 2000 files. It will work with Acrobat 7 in blocks of 256 files though, but using 6 or 8 is severely recommended.
Note: This bug has been fixed in the 7th October 2007 update. You'll have to make sure though that Acrobat isn't waiting for updates or similar processes. To do this, open Acrobat and see that automatic updates are disabled. Secondly, make sure no popups or informative Windows show up when Acrobat is opened.

History

Updated on 13th December, 2006

  • Implemented several codefixes
  • Added the config file in the source files package

Updated on 7th October 2007

  • Added code to control the quality of the image files
  • Changed to 'deluxe' thumbnail style
  • Several small codefixes (including 256 file-limit fix for Acrobat 7)
  • Fixed deformation of the thumbnails of PDF-docs that use a non-standard page size
  • Updated console output

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)