Introduction
For one of my projects, I needed to extract pages from .pdf files. I needed the images to be as sharp as possible, that means with the original resolution (DPI). There is a lot of software that claims to extract images from pdfs, and I tried several solutions. They do a poor job when it comes to saving an image with the original DPI. One might think that the original size is the same as a 100% zoomlevel view in Adobe Reader, but often that is not true. And when the image is saved as for example .jpg, it becomes even more distorted. So I decided to write my own.
Background
I was looking for a free solution. The Adobe Reader is free, and comes with an ActiveX control that can be embedded in VB6. However, the available functions are very limited (as opposed to the ActiveX that comes with Adobe Pro). So it was a challenge to extract the pages, and I had to turn to API calls to find (child) windows and send them messages. Another thing was that I wanted to hide the ActiveX Reader Window, because it is not pleasant to look at, being selected and deselected, and resized regularly. To send keystrokes and mouseclicks to a hidden window, or get it to repaint, requires extra coding.
Using the Code
The code addresses several topics:
- Find the handle of any window within the application by its Classname and Text
- Find the number of pages in a .pdf
- Get the DPI of any page in a .pdf
- Two methods to extract a page from a .pdf:
-
- Send a mouse click to a hidden window
- Simulate a Control-C input with API function
keybd_event
- Get data from the Clipboard with API functions
-
- Two methods to get another DPI than the original
- Resize a
PictureBox.Picture
to a high-quality image in another PictureBox
- Paint a hidden window's content to a
PictureBox
- Save as image with various image format options (bmp, gif, jpg, png, tif) using GDI+
Download the source code to view all these issues with explanatory comments. The following snippets address two of these issues:
- Find the handle of any window within the application by its
Classname
and Text
:
Private Function FindWindowHandle(ByVal hwnd As Long, _
SelectClass As String, SelectText As String, bSelect As Boolean) As Long
Dim sClass As String, sText As String
Dim sLen As Long
Dim ParentHwnd As Long
Dim FoundHwnd As Long
FoundHwnd = 0
sClass = Space(64)
sLen = GetClassName(hwnd, sClass, 63)
sClass = Left(sClass, sLen)
If StrComp(sClass, SelectClass, 1) = 0 Then
If SelectText <> "" Then
sText = Space(256)
sLen = SendMessageS(hwnd, WM_GETTEXT, 255, sText)
sText = Left(sText, sLen)
If bSelect = True Then
If InStr(sText, SelectText) > 0 Then
FoundHwnd = hwnd
End If
Else
If InStr(sText, SelectText) = 0 Then
FoundHwnd = hwnd
End If
End If
Else
FoundHwnd = hwnd
End If
End If
If FoundHwnd <> 0 Then
FindWindowHandle = FoundHwnd
Exit Function
End If
ParentHwnd = hwnd
hwnd = FindWindowX(hwnd, 0, 0, 0)
Do While hwnd
FoundHwnd = FindWindowHandle(hwnd, SelectClass, SelectText, bSelect)
If FoundHwnd <> 0 Then
Exit Do
End If
hwnd = FindWindowX(ParentHwnd, hwnd, 0, 0)
Loop
FindWindowHandle = FoundHwnd
End Function
The second snippet shows how to send a mouse click to a hidden window:
Private Sub SendLeftClick(ByVal hwnd As Long, ByVal hwnd2 As Long, x As Long, y As Long)
Dim position As Long
Call SetActiveWindow(hwnd)
position = y * &H10000 + x
Call SendMessage(hwnd, WM_MOUSEACTIVATE, ByVal hwnd2, _
ByVal CLng(&H2010001))
Call SendMessage(hwnd, WM_SETCURSOR, ByVal CLng(0), ByVal CLng(&H2010001))
Call SendMessage(hwnd, WM_LBUTTONDOWN, ByVal CLng(1), ByVal position)
Call SendMessage(hwnd, WM_LBUTTONUP, ByVal CLng(0), ByVal position)
End Sub
Points of Interest
The application was written in VB6, I still like it a lot over .NET, and IMHO it shows that anything can be done with VB6 and a few API calls. But of course, if you have another favorite programming language, the source can be rewritten, that should be fairly easy to achieve if you are familiar with API, because the API functions are the core of this application.
History
Update: Adapted the code for Acrobat Reader DC. The DC ActiveX does not work with VB6 (nor with Visual Basic 2015, for that matter). Solution: Rename C:\Program Files (x68)\Common Files\Adobe\Acrobat\ActiveX\ and add a new directory (... )\ActiveX\ with the Acropdf.dll from the download in it. (The code also still works with previous versions of the Reader.) Also added: Save images in various image formats, and a routine to make PrintWindow()
work with Adobe. The API function PrintWindow
is notorious for returning black images with some applications, like Adobe. So I needed to add a check for black results, but more can be done to optimize the result.
Call RedrawWindow(PageViewhWnd, ByVal 0&, ByVal 0&, _
RDW_ERASE Or RDW_INVALIDATE Or RDW_FRAME Or RDW_ALLCHILDREN Or RDW_UPDATENOW)
For i = 1 To 5
PrintWindow picSrc.hWnd, PicTemp.hDC, 0&
For j = 1 To 1000
retVal = PostMessage(PageViewhWnd, WM_PAINT, PicTemp.hDC, 0&)
Next j
Next i