Introduction
The purpose of this article is to show how to automate the fully fledged Save As HTML feature from Internet Explorer, which is normally hidden to those using the Internet Explorer API. Saving the current document as MHTML format is just one of the options available, including:
- Save As MHTML (whole web page, images, ... in a single file)
- Save As Full HTML (additional folder for images, ...)
- Save HTML code only
- Save As Text
Saving Silently as HTML Using the Internet Explorer API
In fact, the ability to save the current web page for storage without showing a single dialog box is already available to everyone under C++, using the following code, with an important restriction:
LPDISPATCH lpDispatch = NULL;
IPersistFile *lpPersistFile = NULL;
lpDispatch = m_ctrl.get_Document();
lpDispatch->QueryInterface(IID_IPersistFile, (void**)&lpPersistFile);
lpPersistFile->Save(L"c:\\htmlpage.html",0);
lpPersistFile->Release();
lpDispatch->Release();
(caption for code above) Saving HTML code only, without dialog boxes
The restriction is that we are talking about the HTML code only, not the web page. Of course, what is interesting is to gain access to full HTML archives with images and so on.
Because there is no "public" or known way to ask for this feature without showing one or more dialog boxes from Internet Explorer, what we are going to do is hook the operating system to listen all window creations, including the dialog boxes. Then we'll ask Internet Explorer for the feature and override the file path from the dialog boxes without being seen. Finally, we'll mimic the user clicking on the Save button to validate the dialog box and unhook ourselves. That's done!
Hooking Internet Explorer to Save As HTML without popping the dialog boxes
This was the short workflow, but there are a few tricks to get along and this article is a unique opportunity to go into detail. By the way, the code is rooted by an article from MS about how to customize Internet Explorer Printing by hooking the Print dialog boxes; see here or here. In our app, we have our own Save As feature:
m_wbSaveAs.Config( CString("c:\\htmlpage.mhtml"), SAVETYPE_ARCHIVE );
m_wbSaveAs.SaveAs();
typedef enum _SaveType
{
SAVETYPE_HTMLPAGE = 0,
SAVETYPE_ARCHIVE,
SAVETYPE_HTMLONLY,
SAVETYPE_TXTONLY
} SaveType;
We start the SaveAs()
implementation by installing the hook:
g_hHook = SetWindowsHookEx(WH_CBT, CbtProc, NULL, GetCurrentThreadId());
if (!g_hHook)
return false;
g_bSuccess = false;
g_pWebBrowserSaveAs = this;
HRESULT hr = m_pWebBrowser->ExecWB(OLECMDID_SAVEAS,
OLECMDEXECOPT_PROMPTUSER, NULL, NULL);
UnhookWindowsHookEx(g_hHook);
g_pWebBrowserSaveAs = NULL;
g_hHook = NULL;
The hook callback procedure is just hardcore code; see for yourself:
LRESULT CALLBACK CSaveAsWebbrowser::CbtProc(int nCode,
WPARAM wParam, LPARAM lParam)
{
switch (nCode)
{
case HCBT_CREATEWND:
{
HWND hWnd = (HWND)wParam;
LPCBT_CREATEWND pcbt = (LPCBT_CREATEWND)lParam;
LPCREATESTRUCT pcs = pcbt->lpcs;
if ((DWORD)pcs->lpszClass == 0x00008002)
{
g_hWnd = hWnd;
pcs->x = -2 * pcs->cx;
}
break;
}
case HCBT_ACTIVATE:
{
HWND hwnd = (HWND)wParam;
if (hwnd == g_hWnd)
{
g_hWnd = NULL;
g_bSuccess = true;
if (g_pWebBrowserSaveAs->IsSaveAsEnabled())
{
g_pWebBrowserSaveAs->SaveAsDisable();
CSaveAsThread *newthread = new CSaveAsThread();
newthread->SetKeyWnd(hwnd);
newthread->Config( g_pWebBrowserSaveAs->GetFilename(),
g_pWebBrowserSaveAs->GetSaveAsType() );
newthread->StartThread();
}
}
break;
}
}
return CallNextHookEx(g_hHook, nCode, wParam, lParam);
}
In our thread, we wait until the Internet Explorer Save As dialog is ready with filled data:
switch( ::WaitForSingleObject( m_hComponentReadyEvent, m_WaitTime) )
{
...
if ( ::IsWindowVisible(m_keyhwnd) )
{
bSignaled = TRUE;
bContinue = FALSE;
}
MSG msg ;
while( PeekMessage(&msg, NULL, 0, 0, PM_REMOVE) )
{
if (msg.message == WM_QUIT)
{
bContinue = FALSE ;
break ;
}
TranslateMessage(&msg);
DispatchMessage(&msg);
}
...
}
if (bSignaled)
{
CSaveAsWebbrowser surrenderNow;
surrenderNow.Config( GetFilename(), GetSaveAsType() );
surrenderNow.UpdateSaveAs( m_keyhwnd );
}
delete this;
We can now override the appropriate data:
void CSaveAsWebbrowser::UpdateSaveAs(HWND hwnd)
{
SendMessage(GetDlgItem(hwnd, 0x0470), CB_SETCURSEL,
(WPARAM) m_nSaveType, 0);
SendMessage(hwnd, WM_COMMAND, MAKEWPARAM(0x0470,CBN_CLOSEUP),
(LPARAM) GetDlgItem(hwnd, 0x0470));
SetWindowText(GetDlgItem(hwnd, 0x047c), m_szFilename);
SendMessage(GetDlgItem(hwnd, 0x0001), BM_CLICK, 0, 0);
}
In the code above, it is funny to remark that to select the kind of HTML we want (full HTML, archive, code only or text format), we not only select the adequate entry in the combo-box, we also send Internet Explorer a combo-box CloseUp notification. This is because that's what Internet Explorer has subscribed for to know we want this kind of HTML. This behavior is known by hints-and-trials.
Conclusion
This article describes a technique to gain access to the fully fledged Save As HTML feature exposed by Internet Explorer. I have never seen an article about this topic on the 'net, whereas it's easy to figure out that it is a compelling feature for developers building web applications. Files you may use from the source code provided are:
- SaveAsWebBrowser.h, *.cpp: hook procedure; fill the dialog box data
- SaveAsThread.h, *.cpp: auxiliary thread for synchronization with Internet Explorer
The application is just a simple MFC-based CHtmlView
application embedding the web browser control.