|
Great suggestion! I implemented what you suggested and I found out that its not enough just to call attachEvent. This call (or put_onxxxxx) registers a dispatch pointer for a particular event but it does not enable the event. The way in which events are enabled is by adding this code:
HRESULT hr;
IConnectionPointContainer* pCPC = NULL;
IConnectionPoint* pCP = NULL;
DWORD dwCookie;
// Check that this is a connectable object.
hr = pElem->QueryInterface(IID_IConnectionPointContainer, (void**)&pCPC);
// Find the connection point.
hr = pCPC->FindConnectionPoint(DIID_HTMLElementEvents2, &pCP);
// Advise the connection point.
// pUnk is the dispatch pointer you used in attachEvent
hr = pCP->Advise(pUnk, &dwCookie);
When you are finished with events, disable the events by using a call to pCP->Unadvise(dwCookie);
Thanks for you help! I will wrap this code up and submit it for others to use.
|
|
|
|
|
I know it only in theory, I mean only by reading MSDN and stuff, but will be glad to implement it sometimes
TBiker wrote:
I will wrap this code up and submit it for others to use.
If you'll remember about me when you'll submit your article, please post about it here, so I won't pass it. I'd be glad to see it working and to use it too
lol
Philip Patrick
"Two beer or not two beer?" (Shakesbeer)
Web-site: www.saintopatrick.com
|
|
|
|
|
Well, not so great...
After testing some more, it turns out the IHTMLElement2 interface does not control ALL elements as Microsoft may claim (or this is a real possibility that it does but I haven't a clue how its done). The problem is that each element has its own set of connection interfaces (IDispatch, HTMLElementEvents2, HTMLInputElementEvents2, etc.) and HTMLElementEvents2 is not always available for certain element types. I discussed this with Microsoft (yep, used up one of my precious support calls) and they don't seem too knowledgeable either. They suggest creating separate event sinks for each ActiveX control. Problem with this is managing a huge amount of event class instances and determining which event belongs to which element. So I'm still researching this problem. Any ideas would be appreciated.
|
|
|
|
|
Wow! You spent one of those 4-per-year calls to Microsoft? Cool
Well, yeah, I heard about it, about separate class/instances, etc. for every element. Can't try this by myself, I have a big project on my neck right now, LMAO
Philip Patrick
"Two beer or not two beer?" (Shakesbeer)
Web-site: www.saintopatrick.com
|
|
|
|
|
I wrote the same code, and ran under two different version of IE, eg ie5.01P2 and ie6.02, anything is okay under v5.01, but fail under 6.02, I don't know why?
My code is as the following:
MSHTML::IHTMLElementCollection * pAllLink=NULL;
pDoc->get_links(&pAllLink);
if (pAllLink!=NULL)
{
LONG lLinkLen;
pAllLink->get_length(&lLinkLen);
VARIANT varIndex;
varIndex.vt = VT_UINT;
IDispatch* pDisp;
IHTMLElement *pLink;
BSTR bstrLinkAddress,bstrLinkTitle;
int iAddType=-1;
for (int i=0; i<lLinkLen;i++)
{
iAddType=-1;
varIndex.lVal = i;
pDisp = pAllLink->item(varIndex,(long)0);
pDisp->QueryInterface(IID_IHTMLElement,(void **)&pLink);
pLink->toString(&bstrLinkAddress);
CString cstrLinkAddress(bstrLinkAddress);
CString cstrTempLinkAddress;
cstrTempLinkAddress = m_cstrURL + CString("#");
if ((cstrLinkAddress.CompareNoCase(cstrTempLinkAddress) == 0) || (cstrLinkAddress.CompareNoCase(CString("about:blank#")) == 0))
{
VARIANT pLinkVariant;
pLink->get_onclick(&pLinkVariant);
if (pLinkVariant.vt != VT_NULL)
{
bstrLinkAddress = pLinkVariant.bstrVal;
cstrLinkAddress = CString(bstrLinkAddress);
iAddType = 0;
}
else
{
iAddType = 6;
}
}
if (iAddType == -1)
{
ParseURL(cstrLinkAddress,&iAddType);
}
if (iAddType == 4) //Text page and same directorys only...
{
CObject *cObj;
if (g_LinkList.Lookup(LPCTSTR(cstrLinkAddress),(CObject *&)cObj)==0)
{
g_LinkList.SetAt(LPCTSTR(cstrLinkAddress),NULL);
pLink->get_innerText(&bstrLinkTitle);
m_link.csaAddress.Add(cstrLinkAddress);
m_link.csaTitle.Add(CString(bstrLinkTitle));
m_link.cbaAddressType.Add((BYTE)iAddType);
}
}
SysFreeString(bstrLinkTitle);
SysFreeString(bstrLinkAddress);
}
pAllLink->Release();
}
it throws an exception that the memory cann't be accessed...
Can anyone help me, thanks a lot....
|
|
|
|
|
Which line are u getting an exception?
Also u have some leaks. If you are not using smart pointers u have to release all interfaces by yourself.
I suggest to use smart pointers, this will simplify the code also.
Also, as I see you want to get links in document. According to MSDN, the function get_links() will return you only links that HAVE name and/or id.
Philip Patrick
"Two beer or not two beer?" (Shakesbeer)
Web-site: www.saintopatrick.com
|
|
|
|
|
Oh, thanks for your reply... First, I'm not sure which line I get an exception, I try to debug line by line, sometimes it throws an exception on line "SysFreeString", and sometimes it throws an exception on my function called "ParseURL", but it never happens if I try my function in the way of "ParseString" which is shown on the website:http://www.codeguru.com/ieprogram/HTMLParsing.html, but in that way it causes memory leak.
And second, I don't know how to use smart pointers, I try to release any interfaces by myself, if it throws an exception by release function, it shows that 's a smart pointers, right? And if I try MSHTML::IHTMLElementCollection2Ptr, I don't know how to call its function get_links, so I have to use MSHTML::IHTMLElementCollection, can you give me more good suggestions,please??
And the last,as you see, I want to get links in document,if get_links can only return links that have name and/or id, how can I get others? Do you know?
Oh, yes, there's one more important question I want to ask you. When I try to parse html in the ParseString way as being shown on http://www.codeguru.com/ieprogram/HTMLParsing.html, it never open new ie windows, but when I try the way you prefer, and in a multithread subroutine, it news more and more ie windows, I have to close them one by one by click my mouse, I cann't stand, can you also solve this problem? Thanks for you help.....
|
|
|
|
|
Well. About smart pointers in MSHTML.
When you use smart pointer, it will manage memory and interface releasing by itself. Also as a bonus you will get a QueryInterface function nested inside smart pointer. So the code that u have:
IDispatch* pDisp;
IHTMLElement *pLink;
pDisp = pAllLink->item(varIndex,(long)0);
pDisp->QueryInterface(IID_IHTMLElement,(void **)&pLink);
will look like this:
MSHTML::IHTMLElementPtr pLink;
pLink = pAllLink->item(varIndex, (long)0);
What is going on here? Well, item() function still returns IDispatch , but when in code you are trying to assign it to smart pointer (pLink here), it calls to QueryInterface() internally and obtains a pointer to this interface. So if you see that pLink is not NULL, that says that QueryInterface() succeeded.
Also smart pointer is a simple class (that wraps an interface), thus, when it goes out of scope (end of function here), it calls to Release() in its destructor (do not do it by yourself, if you want to release an interface in smart pointer, just assign NULL to it - pLink = NULL )
This is about smart pointers. I personally don't like long messages, so I'll continue in next one
Philip Patrick
"Two beer or not two beer?" (Shakesbeer)
Web-site: www.saintopatrick.com
|
|
|
|
|
Well done. With your help, I change all of my source code to smart pointers now, they're much thinner than before, thanks a lot, and I'm waiting for your further reply, after that, I will take a full test for all...
|
|
|
|
|
Ok, about functions.
As I remember, all MSHTML smart pointers has same function names as raw interfaces, and in addition their own functions, which I suggest to use (but u don't have to of course). The naming convention for such fuctions usually like this:
If you have a function get_links() , the smart pointer's one will be called Getlinks() and so on. I'm not quite sure it is documented in MSDN, I used VisualAssist to get their names
Also in smart pointers all properties is already declared (as bstr_t , so u don't have to use toString() method or any other else. In case of link, u can freely get its href attribute like this:
CString cstrLinkAddress = (LPCTSTR)pLink->href
First, this class - bstr_t - is very useful, it is like smart pointer, releases memory used by itself, u don't have to use SysFreeString() , and second, it has its own conversion routine, from ANSI to UNICODE and vise versa, just remember to apply (LPCTSTR) casting when u want to assign it to CString
Now about links.
If you are working with IE5+, it will be quite easy to get links. Convert your MSHTML::IHTMLDocument2Ptr to MSHTML::IHTMLDocument3Ptr and you'll have a getElementsByTagName() function, which returns MSHTML::IHTMLElementsCollectionPtr of specified tag. The code will look like this (pDoc2 is IHTMLDocument2Ptr here):
CString csLinkHref;
MSHTML::IHTMLAnchorElementPtr pAnchor;
MSHTML::IHTMLDocument3Ptr pDoc3 = pDoc2;
MSHTML::IHTMLElementsCollectionPtr pCollection = pDoc3->getElementsByTagName(L"A");
for(long i=0; i<pCollection->length; i++){
pAnchor = pCollection->item(i, (long)0);
csLinkHref = (LPCTSTR)pAnchor->href;
<font color=green>
}
Look in the demo for this article. What I'm doing there is getting the list of links, I guess this is exactly what you want
-----------
Now about new windows. This is depends on how you are opening the document.. Post the code where you are getting the pointer to pDoc and also where you are writing to it in the way I used in article.
Philip Patrick
"Two beer or not two beer?" (Shakesbeer)
Web-site: www.saintopatrick.com
|
|
|
|
|
Thanks again, I read what you replied twice, and I understood what you said. Indeed, I had tried to use MSHTML::IHTMLAnchorElementPtr to get all links, as you shown in your demo project, but there're two problems I cann't solve in that way: first, I want to get the inner text of each links as I shown in the above message or this:
pLink->get_innerText(&bstrLinkTitle),
in your way, I'm afraid I cann't get it;
Second, there're so many links looked like this:
<a href="#" onclick="javaScript:location.href=...">...</a>
And in my raw way, I can get it by using get_onclick function, but there's no get_onclick function if I use MSHTML::IHTMLAnchorElementPtr.
And about the code where I'm getting the pointer to pDoc, I will show you in the next message, thank you...
|
|
|
|
|
Hmm, well so u can use MSHTML::IHTMLElementPtr instead of Anchor. Just as u do, but instead of raw interface - smart pointer. Remember that smart pointers have same functions as interfaces + more. So if Interface has get_onclick() , so smart pointer will have it too.
You can combine both Anchor and Element. Simple do Anchor = Element or vise versa and you'll have both interfaces So you won't have to parse the tag manually.
Of course, this is your choice, I just hate to parse strings by myself, that's why I so like MSHTML
Philip Patrick
"Two beer or not two beer?" (Shakesbeer)
Web-site: www.saintopatrick.com
|
|
|
|
|
Thank you, I will try it tomorrow, and if still any question, I will post them all here, thanks a lot...
|
|
|
|
|
我也遇到了同样的问题.
MSHTML::IHTMLAnchorElementPtr pAnchor;
...
CString strTitle = pAnchor->name;
ASSERT(strTitle != _T(""));
...
can't get link title. Why?
我想不通是怎么一回事.
那位大哥来讲一讲.
thanks.
|
|
|
|
|
You are trying to get a name attribute, not title. To get a title you should use MSHTML::IHTMLElementPtr
BTW, why I see chineese? lol
Philip Patrick
"Two beer or not two beer?" (Shakesbeer)
Web-site: www.saintopatrick.com
|
|
|
|
|
thank you, philip patrich.
I hope get IHTMLAnchorElementPtr->name.
I am Chinese.My English is pool. Sorry.
其实我想用ie sdk做页面链接分解,可惜太难了.
|
|
|
|
|
You can try the following:
MSHTML::IHTMLAnchorElementPtr pAnchor;
_bstr_t bstrName = pAnchor->Getname();
This can help you get its name, or if you want to get its innertext, you can try:
MSHTML::IHTMLElementPtr pElem;
pElem = pAnchor;
_bstr_t bstrInnerText = pElem->GetinnerText();
//............
|
|
|
|
|
Let me continue to show you my code. First, I want to say, I had try my program before, by getting the same website addresses, it never open new ie windows, but in that way, I didn't use smart pointers, as I shown you in the above messages, and either, I didn't include the following files:
#include <comdef.h>
#include <mshtml.h>
#pragma warning(disable : 4146) //see Q231931 for explaintation
#import <mshtml.tlb> no_auto_exclude
,but the file <mshtml.h> instead.
And all the download codes are the same one, first I call InternetOpen to create a connect session, then I call AfxParseURLEx to parse and check the URL which I want to download, and later, I call InternetConnect to create a connect to that URL, and I call HttpOpenRequest and HttpSendRequest if every function runs okay, finally, I call HttpQueryInfo to check if the return status code is equal to 200, if it does, I call InternetReadFile to get its content and save it to a variable, and close all open variable so that it won't cause any memory leak. The above step is okay when I run it without any smart pointers and only include file <mshtml.h>, so I'm afraid it shouldn't open any new ie windows in the way you prefer.
And the second step, let's assume that I download one URL to a variable called m_cstrContent,and its variable type is CString. And I parse this content as the following:
MSHTML::IHTMLDocument2Ptr pDoc;
hr = CoCreateInstance(CLSID_HTMLDocument, NULL, CLSCTX_INPROC_SERVER,
IID_IHTMLDocument2, (void**)&pDoc);
if (FAILED(hr))
{
return iRet;
}
SAFEARRAY* psa = SafeArrayCreateVector(VT_VARIANT, 0, 1);
VARIANT *param;
bstr_t bsData = (LPCTSTR)m_cstrContent;
hr = SafeArrayAccessData(psa, (LPVOID*)¶m);
param->vt = VT_BSTR;
param->bstrVal = (BSTR)bsData;
hr = pDoc->write(psa);
if (FAILED(hr))
{
return iRet;
}
hr = pDoc->close();
if (FAILED(hr))
{
return iRet;
}
SafeArrayDestroy(psa);
iRet = SUCCESS;
//get title
BSTR bstrTitle;
pDoc->get_title(&bstrTitle);
m_cstrTitle = CString(bstrTitle);
SysFreeString(bstrTitle);
//get content text
MSHTML::IHTMLElementCollectionPtr pAll;
pDoc->get_all(&pAll);
if (pAll!=NULL)
{
VARIANT varIndexAll;
varIndexAll.vt = VT_UINT;
varIndexAll.lVal = 0;
MSHTML::IHTMLElementPtr pElemText;
pElemText = pAll->item(varIndexAll,(long)0);
BSTR bstrContentText;
pElemText->get_outerText(&bstrContentText);
m_cstrText = CString(bstrContentText);
SysFreeString(bstrContentText);
}
//get links
MSHTML::IHTMLElementCollectionPtr pAllLink;
pDoc->get_links(&pAllLink);
if (pAllLink!=NULL)
{
LONG lLinkLen;
pAllLink->get_length(&lLinkLen);
VARIANT varIndex;
varIndex.vt = VT_UINT;
MSHTML::IHTMLElementPtr pLink;
BSTR bstrLinkAddress,bstrLinkTitle;
int iAddType=-1;
for (int i=0; i<lLinkLen;i++)
{
iAddType=-1;
varIndex.lVal = i;
pLink = pAllLink->item(varIndex,(long)0);
bstrLinkAddress = pLink->toString();
CString cstrLinkAddress(bstrLinkAddress);
CString cstrTempLinkAddress;
cstrTempLinkAddress = m_cstrURL + CString("#");
if ((cstrLinkAddress.CompareNoCase(cstrTempLinkAddress) == 0) || (cstrLinkAddress.CompareNoCase(CString("about:blank#")) == 0))
{
VARIANT pLinkVariant;
pLink->get_onclick(&pLinkVariant);
if (pLinkVariant.vt != VT_NULL)
{
bstrLinkAddress = pLinkVariant.bstrVal;
cstrLinkAddress = CString(bstrLinkAddress);
iAddType = 0;
}
else
{
iAddType = 6;
}
}
if (iAddType == -1)
{
ParseURL(cstrLinkAddress,&iAddType);
}
if (iAddType == 4) //Text page and same directorys only...
{
CObject *cObj;
WaitForSingleObject(ghLinkEvent,INFINITE);
if (g_LinkList.Lookup(LPCTSTR(cstrLinkAddress),(CObject *&)cObj)==0)
{
g_LinkList.SetAt(LPCTSTR(cstrLinkAddress),NULL);
pLink->get_innerText(&bstrLinkTitle);
m_link.csaAddress.Add(cstrLinkAddress);
m_link.csaTitle.Add(CString(bstrLinkTitle));
m_link.cbaAddressType.Add((BYTE)iAddType);
}
SetEvent(ghLinkEvent);
}
SysFreeString(bstrLinkTitle);
SysFreeString(bstrLinkAddress);
}
}
And with your help, I will accept your good idea and change some of my code such as using class bstr_t and so on, and I need some time to debug. It's too late here, I'm afraid I can only do that tomorrow.
And so far, the most important thing is disabling every new ie windows when I downloading or parsing any pages, thanks....
|
|
|
|
|
Great discovery. I have failed to do this 3 month ago.
|
|
|
|
|
I created a demo project, which is a simple Dialog application. All it is doing is taking HTML file and displays all links in ListBox.
Remember that you have to have Platform SDK in order to compile the demo
Philip Patrick
"Two beer or not two beer?" (Shakesbeer)
Web-site: www.saintopatrick.com
|
|
|
|
|
If you have any questions, maybe found a bug.. just let me know
Philip Patrick
"Two beer or not two beer?" (Shakesbeer)
Web-site: www.saintopatrick.com
|
|
|
|
|
Always reports that memory cann't be accessed...
I try my subroutine in that so-call 3rd way, but it always throw an exception that the memory reference to the address 0X004015cf and 0X010812d0 cann't be read, I don't have any idea about it, can any one give me a runable demo program in 3rd way??? Thanks a lot...
|
|
|
|
|
Ok, Demo project was added Feel free to ask if you'll have any problems
Philip Patrick
"Two beer or not two beer?" (Shakesbeer)
Web-site: www.saintopatrick.com
|
|
|
|
|
I'm sorry, I cann't find any demo project about this topic, when I try the following address:
http://www.codeproject.com/useritems/parse_html_demo.zip, my explorer only shows me as:
The page you requested cannot be found.
Click here to go to the Code Project home page, or click here to return to the previous page.
Is there anything wrong with my explorer, I 'm longing for your help, and by the way, if there is anything wrong with the codeguru website, please just mail superrg@163.net, I'm waiting for your demo project online,thanks a lot...
|
|
|
|
|
Who knows about codeguru - but CodeProject is fine Patrick had the link a little short of a subdirectory but it's all fixed now.
cheers,
Chris Maunder
|
|
|
|
|