Introduction
My application allows limited editing of HTML pages using MSHTML. Each HTML
page is based on a template file and the range of things the end user can do to
that template file is limited. At no time is the user able to create an empty
HTML page.
So obviously there has to be a mechanism in my application to allow the user
to select which template a new page should be based on.
I wanted to present the user with a list of thumbnail images, each representing
a template page. In order to do that I had to devise a way of taking an HTML page and
converting it to an image. The alternative of presenting the user with a simple
listbox with the names of the templates is a tad too early 90's.
This article is the result.
A false start
Fortunately for me my application sets a specific size limit on page size. The entire
page must fit into an 800 by 600 frame without scrollbars.
My initial approach was to render the page using MSHTML, create a memory bitmap, get a handle to the
MSHTML display window and do a BitBlt
from the display window to my memory bitmap,
then scale and save the results.
It worked well but for one minor detail.
In order to render an HTML page into an image file using the BitBlt
method
the page has to be visible on the screen. BitBlt
can only grab bits from a
device context that's had something drawn on it, and if the device context represents something
that's not actually visible on the screen the Windows WM_PAINT
optimisations kick
in and exclude those areas from the update region. The result is that MSHTML doesn't paint onto
those portions of a device context that aren't visible.
If you want to create images of something that's already on the screen well and good.
Otherwise, to create an image, you have to present that something on the screen. This makes
for an awful lot of flashing as one renders HTML pages to the screen for just long enough to grab their
bits via BitBlt
.
Even so, I was almost happy with the result. The flashing didn't look too awful. I even ran it past
a few people, showing them what it looked like as it updated images and they didn't seem to mind it
too much. But it irked me. There had to be a better way.
A second approach
Some digging around in MSDN revealed the
IHTMLElementRender
interface. Sounds hopeful. It
has a member function called
DrawToDC()
that sounds like a perfect fit. Which it is indeed.
Once you obtain an
IHTMLElementRender
interface you can supply your own device context and
get MSHTML to render the element to it. And once you've done that it's trivial to scale and save to a
file.
As you've probably guessed, it wasn't quite as simple as that.
I'm going to present the class a little differently this time. We'll start with a simple version of the
class (not present in the download) and add complexity to it as we encounter issues.
The simple version of CCreateHTMLImage
looks like this.
class CCreateHTMLImage
{
public:
enum eOutputImageFormat
{
eBMP = 0,
eJPG,
eGIF,
eTIFF,
ePNG,
eImgSize
};
CCreateHTMLImage();
virtual ~CCreateHTMLImage();
BOOL SetSaveImageFormat(eOutputImageFormat format);
BOOL CreateImage(
IHTMLDocument2 *pDoc,
LPCTSTR szDestFilename,
CSize srcSize,
CSize outputSize);
protected:
int GetEncoderClsid(const WCHAR* format, CLSID* pClsid);
private:
static LPCTSTR m_ImageFormats[eImgSize];
CLSID m_encoderClsid;
};
This version of the class creates an image from an existing HTML document. The constructor initialises the saved image
format as a jpeg file (you can override this by calling
SetSaveImageFormat()
passing one of the
eOutputImageFormat
constants). The guts of the work is done in the
CreateImage()
member
function which looks like this.
BOOL CCreateHTMLImage::CreateImage(
IHTMLDocument2 *pDoc,
LPCTSTR szDestFilename,
CSize srcSize,
CSize outputSize)
{
USES_CONVERSION;
ASSERT(szDestFilename);
ASSERT(AfxIsValidString(szDestFilename));
ASSERT(pDoc);
IHTMLElement *pElement = (IHTMLElement *) NULL;
IHTMLElementRender *pRender = (IHTMLElementRender *) NULL;
if (pDoc == (IHTMLElement *) NULL
return FALSE;
pDoc->get_body(&pElement);
if (pElement == (IHTMLElement *) NULL)
return FALSE;
pElement->QueryInterface(IID_IHTMLElementRender, (void **) &pRender);
if (pRender == (IHTMLElementRender *) NULL)
return FALSE;
CFileSpec fsDest(szDestFilename);
CBitmapDC destDC(srcSize.cx, srcSize.cy);
pRender->DrawToDC(destDC);
CBitmap *pBM = destDC.Close();
Bitmap *gdiBMP = Bitmap::FromHBITMAP(HBITMAP(pBM->GetSafeHandle()), NULL);
Image *gdiThumb = gdiBMP->GetimageImage(outputSize.cx, outputSize.cy);
gdiThumb->Save(T2W(fsDest.GetFullSpec()), &m_encoderClsid);
delete gdiBMP;
delete gdiThumb;
delete pBM;
return TRUE;
}
This takes a pointer to an
IHTMLDocument2
interface, an output filename and a couple of
CSize
objects.
You'd have obtained the
IHTMLDocument2
interface from a loaded HTML document in an instance of MSHTML somewhere in
your program. For example, if you wanted to create an image of the document in an app that used
CHtmlView
you'd
obtain the interface by calling
GetHTMLDocument()
on that view.
We do my usual bunch of ASSERT
s on the parameters. Then we get a pointer to an IHTMLElement
interface that represents the body of the HTML document. Once we've got that we can do a QueryInterface()
for
an IHTMLElementRender
interface which represents all the visual aspects of the document. We can't get the
interface directly from the document because the document isn't an element, it contains elements.
If we got this far without encountering an error it's time to create the device context we want to paint the document into.
For this I used Anneke Sicherer-Roetman's excellent CBitmapDC
class which you can find
here[^].
The srcSize
object is used to set
the size of the destination device context. The IHTMLElementRender::DrawToDC()
function doesn't do scaling. If the
source HTML needs 1000 pixels of width to draw the entire horizontal extent but you pass it a device context only 500 pixels
wide you'll get only the left half of the HTML.
Once MSHTML has rendered our IHTMLElement
into the device context we create a GDI+ Bitmap
object
using the contents of the CBitmapDC
and then create an image from the Bitmap
object using the
outputSize
object to specify the image dimensions. GDI+ takes care of scaling the full size image to the size
we want. A save, a bit of cleanup and we're done.
The other members of this class take care of the details of the saved image format and, since they're protected
they're of little interest unless we plan to derive a new class from this one.
The GetEncoderClsid()
function is taken from MSDN documentation and is used to get the correct image codec for
the image format we want.
Hang on a moment!
Surely this class presents exactly the same problems as the false start approach discussed above? It can only create an image
from an existing HTML document already on the screen. That's right. But this is the simple version of the class.
If we want to create an image from a document stored somewhere else (hard disk or intranet or internet) we have to do a
little more work. We have to load the document using MSHTML, get an IHTMLDocument2
interface on the document and
then call our class to create the image.
The full version of CCreateHTMLImage
which is included in the download, looks like this.
class CCreateHTMLImage : public CWnd
{
protected:
DECLARE_DYNCREATE(CCreateHTMLImage)
DECLARE_EVENTSINK_MAP()
enum eEnums
{
CHILDBROWSER = 100,
};
public:
enum eOutputImageFormat
{
eBMP = 0,
eJPG,
eGIF,
eTIFF,
ePNG,
eImgSize
};
CCreateHTMLImage();
virtual ~CCreateHTMLImage();
BOOL Create(CWnd *pParent);
BOOL SetSaveImageFormat(eOutputImageFormat format);
BOOL CreateImage(
IHTMLDocument2 *pDoc,
LPCTSTR szDestFilename,
CSize srcSize,
CSize outputSize);
BOOL CreateImage(
LPCTSTR szSrcFilename,
LPCTSTR szDestFilename,
CSize srcSize,
CSize outputSize);
protected:
CComPtr m_pBrowser;
CWnd m_pBrowserWnd;
virtual BOOL CreateControlSite(
COleControlContainer* pContainer,
COleControlSite** ppSite,
UINT nID,
REFCLSID clsid);
virtual void DocumentComplete(LPDISPATCH pDisp, VARIANT* URL);
int GetEncoderClsid(const WCHAR* format, CLSID* pClsid);
private:
static LPCTSTR m_ImageFormats[eImgSize];
CLSID m_encoderClsid;
};
A few changes should jump out at you. The first is that the full version of the class is derived from
CWnd
whereas
the simple version wasn't. This indicates that at least some of the changes I made to allow the conversion of an HTML document
to an image somehow involve the creation of a window. You don't yet know the half of it!
All functions that were present in the simple version of the class are unchanged in the full version. You'll see that I
added another CreateImage()
overload. This one takes a source document name instead of an
IHTMLDocument2
interface pointer.
This new function is the reason I added all the new stuff to the full version of the class, so let's start with it and
work outwards.
Loading an external document
Initially I started out trying to use the
IHTMLDocument2
interface directly. Something like this.
IHTMLDocument2 *pDoc = (IHTMLDocument2 *) NULL;
if (CoCreateInstance(
CLSID_HTMLDocument,
NULL,
CLSCTX_INPROC_SERVER,
IID_IHTMLDocument2,
(void**) &pDoc) == S_OK)
{
if (pDoc != (IHTMLDocument2 *) NULL)
{
}
}
This works and we get a document interface we can work with. There's one small problem. There's no way to load a document
directly. We can call
IHTMLDocument2::write()
to render a string containing HTML but that means we have to load
our document contents into a string. That'll work just fine with local files but what if you want to image a website on the
net? All I want is to create images - not write a full blown
http:
protocol handler.
Ok, scratch that approach. What about using an IWebBrowser2
interface? The code to instantiate one is almost
identical to the preceding code snippet so I won't repeat it, just substitute IWebBrowser2
wherever you see
IHTMLDocument2
. To load a document we simply navigate to it using either IWebBrowser2::Navigate()
or IWebBrowser2::Navigate2()
.
So I coded it up and tested. The Navigate2()
call returned success but the document didn't load. Or at least,
if it did, the interface's ReadyState
never changed to let me know it had finished. Obviously we can't go rendering
the document into a device context until we know it's loaded and indeed, querying the IWebBrowser2
interface for
the IHTMLDocument2
interface we need always returned a NULL interface pointer, indicating that the document doesn't
yet exist.
Repeating the test on a dummy application based on CHtmlView
reveals what we already knew. The
IWebBrowser2::ReadyState
does change as the document loads and, once the document has finished
loading, we can query the IWebBrowser2
interface for an IHTMLDocument2
interface and get back a
valid interface pointer.
Hmm, so what are we doing differently? Well the first and most obvious difference is that we're instantiating an instance
of IWebBrowser2
without a matching display window. As we'll see a little later, this window is quite
important to the IWebBrowser2
interface even though nowhere in the documentation is this stated.
It was time to investigate how CHtmlView
does things.
CHtmlView
is an MFC class. Fortunately we have the source code to MFC. That means we can go look at a working example of
something and figure out what we're doing wrong or not doing at all.
The first thing we find (in afxhtml.h
) is the class definition. There's a lot of stuff in there, most of which
doesn't concern us. What's of interest is a CWnd
member variable called m_wndBrowser
. Aha. We
know that CHtmlView
is ultimately derived from CWnd
and is therefore already a window. So why does it
need a member variable of type CWnd
? Let's have a look at the relevant code in CHtmlView::Create()
to
see what's going on (viewhtml.cpp
).
BOOL CHtmlView::Create(LPCTSTR lpszClassName, LPCTSTR lpszWindowName,
DWORD dwStyle, const RECT& rect, CWnd* pParentWnd,
UINT nID, CCreateContext* pContext)
{
m_pCreateContext = pContext;
if (!CView::Create(lpszClassName, lpszWindowName,
dwStyle, rect, pParentWnd, nID, pContext))
{
return FALSE;
}
AfxEnableControlContainer();
RECT rectClient;
GetClientRect(&rectClient);
if (!m_wndBrowser.CreateControl(CLSID_WebBrowser, lpszWindowName,
WS_VISIBLE | WS_CHILD, rectClient, this, AFX_IDW_PANE_FIRST))
{
DestroyWindow();
return FALSE;
}
LPUNKNOWN lpUnk = m_wndBrowser.GetControlUnknown();
HRESULT hr = lpUnk->QueryInterface(IID_IWebBrowser2, (void**) &m_pBrowserApp);
if (!SUCCEEDED(hr))
{
m_pBrowserApp = NULL;
m_wndBrowser.DestroyWindow();
DestroyWindow();
return FALSE;
}
return TRUE;
}
The view window creates itself and then creates a child control as an ActiveX object using the
CLSID_WebBrowser
identifier. If that succeeds it queries the child for the Web Browser's
IUnknown
interface and uses that interface
to get an
IWebBrowser2
interface which it caches away for later use.
Ok, things are starting to fall into place. Instead of blindly creating an IWebBrowser2
interface out of
thin air we should create an instance of the Web Browser control and get our IWebBrowser2
interface from it.
First lesson from CHtmlView
Let's duplicate what CHtmlView
does and create our Web Browser control as a child control of our class.
We'll discuss why the extra level of indirection a little later in the article.
Our creation sequence is (if we want to create images for pages we haven't already got loaded in some instance of MSHTML
somewhere):
- Create an instance of the
CCreateHTMLImage
class.
- Call the
Create()
method on the class.
- Call the
CreateImage()
method once for each image we want to create
Once this is done we can call
Navigate2()
on the Web Browser child window and expect the document to load. Which
it does. Let's have a look at the function.
BOOL CCreateHTMLImage::CreateImage(
LPCTSTR szSrcFilename,
LPCTSTR szDestFilename,
CSize srcSize,
CSize outputSize)
{
ASSERT(GetSafeHwnd());
ASSERT(IsWindow(GetSafeHwnd()));
ASSERT(szSrcFilename);
ASSERT(AfxIsValidString(szSrcFilename));
ASSERT(szDestFilename);
ASSERT(AfxIsValidString(szDestFilename));
CRect rect(CPoint(0, 0), srcSize);
MoveWindow(&rect);
m_pBrowserWnd.MoveWindow(&rect);
COleVariant vUrl(szSrcFilename, VT_BSTR),
vFlags(long(navNoHistory |
navNoReadFromCache |
navNoWriteToCache), VT_I4),
vNull(LPCTSTR(NULL), VT_BSTR);
COleSafeArray vPostData;
if (m_pBrowser->Navigate2(&vUrl, &vFlags, &vNull, &vPostData, &vNull) == S_OK)
RunModalLoop();
else
return FALSE;
IDispatch *pDoc = (IDispatch *) NULL;
HRESULT hr = m_pBrowser->get_Document(&pDoc);
if (FAILED(hr))
return FALSE;
return CreateImage((IHTMLDocument2 *) pDoc, szDestFilename, srcSize, outputSize);
}
If we get through the gauntlet of my usual
ASSERT
checks on the input parameters we create a rectangle
with the dimensions implied by the
srcSize
parameter and set the Web Browser to those dimensions. If we don't
set the Web Browser size correctly we won't get an image that accurately reflects the contents of the HTML document.
Then we set up a bunch of
COleVariant
objects with our source document name, some flags and call the
Navigate2()
method. If that method succeeds we fall into a call to the
CWnd::RunModalLoop()
function.
This is very important. My first stabs at this solution used a combination of Sleep()
and some polling to
try and determine when the document had finished loading. The result was deadlock. It turns out that once you've initiated a
Navigate2()
operation (and many other operations on the Web Browser control) you have to let the message pump
run. The message pump can be the applications main pump but if you use that one you have no way to synchronously interact with
the CCreateHTMLImage
class. This won't matter if all you want to create is one image. But if you have a
list of images you want to create, one after the other, you have to wait for the first one to complete before you can even
start on the second one.
So we start the navigation and then drop into a RunModalLoop()
. Some time later the document will finish
loading and fire the DocumentComplete()
event. That event handler in our class does nothing much except call
ExitModalLoop()
, which lets us fall out of the RunModalLoop()
function and allow processing in the
CCreateHTMLImage::CreateImage()
function to continue. That processing consists of obtaining an
IHTMLDocument2
interface and calling code we've already discussed.
So why do we have an embedded Browser Window instead of being a Web Browser ourselves?
If you've made it this far you might be wondering why our class mimics CHtmlView
to the
extent of having an embedded CWnd
variable that is the real Web Browser control.
It wasn't obvious to me as I wrote the code that this would be necessary. I wrote the class so that it, itself, became
an instance of the Web Browser and was able to create images of HTML documents without those documents ever flashing up
on the screen. It all looked good.
Trouble in paradise
But on closer examination of the images I suddenly realised there were a couple of artifacts that shouldn't
have been there. Scrollbars!
If you've used CHtmlView
you know there's a function, OnGetHostInfo()
that gives you the
opportunity to modify visual aspects of the Web Browser, including whether the browser shows scrollbars or not. So now it was
time to dive back into the implementation of CHtmlView
, once again, to see how I could duplicate that
functionality.
It turns out that the reason CHtmlView
embeds an instance of the Web Browser, instead of being an instance itself,
is so that it can act as the Web Browsers parent. This is important because it means that CHtmlView
(or our class)
can create an COleControlSite
to host the Web Browser control and respond to interface queries. If our class was
directly an instance of the Web Browser that would mean that queries from the Web Browser would be directed to our parent,
to code we don't necessarily control, and code that almost certainly doesn't know that we want to respond to the
GetHostInfo
query with the answer that scrollbars oughtn't to be displayed.
I'll concede that you probably do have control over the parent object and could implement a COleControlSite
in
the parent. But that's the definitely the wrong place to do it. Why should the parent have to know that some arbitrary class
it uses needs a COleControlSite
, let alone the specifics of a response to a particular query?
Without going into an extended discussion of how CWnd
derived classes host ActiveX controls let's look at what
we have to implement in our class to make it all work. Our Create()
call creates the child ActiveX control by
calling CreateControl()
specifying the CLSID
for the specific ActiveX control we want.
CreateControl()
does various things including calling the virtual function CreateControlSite()
. The
default implementation in CWnd
does nothing except return a NULL site pointer. When we look at
CHtmlView
s implementation we see that it sets the site pointer to an instance of CHtmlControlSite
.
Ok, let's go look at that class. It turns out to be derived from COleControlSite
(which shouldn't be a surprise)
but it also implements all the member functions of the IDocHostUIHandler
interface. This matters because the way
the Web Browser control interrogates it's container to determine UI states such as whether or not to show scroll bars is via a
QueryInterface()
on it's parents automation interface, requesting an IDocHostUIHandler
interface
Unfortunately we can't use CHtmlControlSite
in our class for two reasons. The first is that the class definition
appears in viewhtml.cpp
rather than in a header file we can include. We could work around that easily enough by
cutting and pasting the definition into our own header file. But it still won't work because the class has builtin knowledge
of CHtmlView
virtual methods. We could mess around in our class trying to duplcate the vtable
structure to reuse CHtmlControlSite
but it's just not worth the trouble (to say nothing of being a maintenance
nightmare). Instead I cut and pasted the entire class definition and implementation, changed the name and modified the member
functions to do what I needed. In fact, the only member function I want doing anything is the GetHostInfo()
function. All the others do nothing (but must be implemented anyway).
Phew! That's a lot of work just to get rid of some scrollbars on a tiny thumbnail image. But remember that the class
can be used to capture full sized images simply by specifying the output image size. On a full sized image the scrollbars are
probably undesirable.
Using the class
Is almost trivial. Include the header file, declare an instance of
CCreateHTMLImage
and use it. Remember that there
are two ways to use it. The first is when you want to capture an image of an existing page already rendered somewhere in your
application.
CCreateHTMLImage cht;
cht.CreateImage(m_pDoc, csOutputFile, CSize(800, 600), CSize(80, 60));
which assumes that
m_pDoc
is a pointer to an
IHTMLDocument2
interface. This example captures the image
at 800 by 600 but saves a thumbnail of 80 by 60.
The second way to use the class is when all you have is a filename or URL to the page you want to capture.
CCreateHTMLImage cht;
cht.Create(this);
cht.CreateImage(csSourceFile, csOutputFile, CSize(800, 600), CSize(80, 60));
which does the same except that it takes care of loading the source file (or URL) and then captures the image output to a
file. The
Create()
function needs a pointer to a
CWnd
derived object which must be a top-level window.
I use my
CMainFrame
window.
Oh, don't forget to initialise GDI+ - you can find out how in this excellent article[^]
Dependencies
The class uses some other code found on CodeProject.
History
4 April 2004 - Initial Version.
4 April 2004 - Updated the download to include the required header files.
9 April 2004 - Added a demo project written by Jubjub[^]
19 May 2004 - Updated the demo project.