Introduction
An app I was writing needed to store the full HTML of a web page. I looked all over the web and the MSDN library on how to get the complete HTML from a CHtmlView
. I found out how to get the <BODY></BODY>
data, but not how to get the <HTML></HTML>
data. After lots of stumbling, I hit on the following very simple technique.
Examples of getting the outer HTML of the <BODY>
tag abound. While exploring the IHTMLDocument2
interface, I noticed the get_ParentElement
method. I realized that the parent of <BODY>
is <HTML>
.
This function took care of my problem:
bool CMyHtmlView::GetDocumentHTML(CString &str)
{
IHTMLDocument2 *lpHtmlDocument = NULL;
LPDISPATCH lpDispatch = NULL;
lpDispatch = GetHtmlDocument();
if(!lpDispatch)
return false;
lpDispatch->QueryInterface(IID_IHTMLDocument2, (void**)&lpHtmlDocument);
ASSERT(lpHtmlDocument);
lpDispatch->Release();
IHTMLElement *lpBodyElm;
IHTMLElement *lpParentElm;
lpHtmlDocument->get_body(&lpBodyElm);
ASSERT(lpBodyElm);
lpHtmlDocument->Release();
lpBodyElm->get_parentElement(&lpParentElm);
ASSERT(lpParentElm);
BSTR bstr;
lpParentElm->get_outerHTML(&bstr);
str = bstr;
lpParentElm->Release();
lpBodyElm->Release();
return true;
}
Points of Interest
There is bound to be a better way of doing this. If you know it, please share it with me.