|
Oh, then you must try MSHTML::IHTMLElement2Ptr, and its GetclientHeight functions and so on maybe can help you, I'm not sure about it, you can just try....
|
|
|
|
|
About my source code, you can link to
http://www.codeproject.com/useritems/parse_html.asp?df=100&forumid=3219&select=108390#xx108390xx
, and the difference is I have changed all pointers which is not smart to smart pointers, but when I parse the content of the URL :
http://www.163.net
, it pops up an error warning, and hints that there's a run time error occurs, and I don't want to pop up any message box or ie windows during all the run time, what should I do? Thanks for your help...
What's more, this run time error occurs on the very line :
hr = pDoc->write(psa)
Thanks again...
supplement:
When I try the demo project with the source code of this web page: http://www.163.net, it pops up such an error box,too, and in addiction, I embed pDoc->write function into an exception clause as:
try
{
hr = pDoc->write(psa)
}
catch(...)
{
//...........
}
I can only catch nothing, why why why???
|
|
|
|
|
I'm longing for your reply, thanks a lot...
|
|
|
|
|
Yep, one sec, I'm building an app just like yours to debug it by myself
Philip Patrick
"Two beer or not two beer?" (Shakesbeer)
Web-site: www.saintopatrick.com
|
|
|
|
|
Ok, I see what the problem... and I see what is going on. The problem not the smart pointers of course, but the different way the Document does its job. When u use IMarkupServices (like in the article u mentioned), it just parses the given HTML code, but not processes it, but write() actually executes all script inside it, while parsing HTML.
I'll figure out now how to elliminate this and post it here.
BTW, you can use smart pointers with Markup Services too. The only reason I made the article is the "BODY" tag bug, which I needed in my application...
Well... I'm working, lol
Philip Patrick
"Two beer or not two beer?" (Shakesbeer)
Web-site: www.saintopatrick.com
|
|
|
|
|
Yes, as you said above, there're so many document.write script on that page, that's why it pops up error warning. But I'm afraid I cann't use parseString and smart pointers together, for smart pointers can not be ran under ie v6.02, and ParseString leads to memory leak under ie v5.01p2, that's really a big bug.
I'm waiting for your solution now, thank you...
|
|
|
|
|
What's more, I found the reason of new ie windows, it's almost the same reason with the run time error message box, in fact, when we call pDoc->write, all scripts are executed, and some of them leads to open new ie windows, I don't know whether we can solve this problem or not, can you give me much more suggestion? Thank you...
|
|
|
|
|
So far I found the only way to elliminate popups is to replace "window.open" in all HTML by something like "javascript.void" (or whatever you want). I'm doing this just before pDoc->write() , using Replace() function of CString
This is a weird way though, since "window" object is default one and can be skipped in Javascript
Still looking for better way
Philip Patrick
"Two beer or not two beer?" (Shakesbeer)
Web-site: www.saintopatrick.com
|
|
|
|
|
Wow, I have tried this way before. But it's not a good way, I see it. There're so many scripts to open new ie windows, you know, window.open is one, and document.location.href(target = _blank) is one, and <META HTTP-EQUIV="Refresh" CONTENT="[secs];URL=[RefreshUrl]"> and lots of other ways to new ie windows, I cann't list them all.
Secondly, some of scripts depends on other documents such as .js or its parent document, these scripts can not be executed successfully, I'm afraid it will pop up more and more message box, and if I run my program in multithread mode, it will drive me mad....
By and large, we cann't elliminate popups by replacing strings in all HTML, and I'm afraid there's another better way to solve this problem, microsoft can do it, maybe...
Anyway, thanks for your help.... I would like to find another solutions, maybe we can try it together. If you have much better idea, please let me know, thank you...
|
|
|
|
|
|
I know this question may be out of the scope of this article, but I'll ask it anyway. I have scanned for all elements in an HTML document. I then automatically handle all relevant events for input type controls (text boxes, radio and checkbox buttons, pulldown option lists, etc.). Events are on-click, on-keypress, on-select. Now I want to get events for an ActiveX control (an HTML object). Does anyone know how to do this?
|
|
|
|
|
I didn't do this, but you have an IHTMLObjectElement interface and as I got from MSDN, you can get an IDispatch of nested ActiveX by calling get_object(IDispatch**p) function. So u can gain access to that ActiveX.
If you'll find a better or right way, please post here, I'd glad to know too
Philip Patrick
"Two beer or not two beer?" (Shakesbeer)
Web-site: www.saintopatrick.com
|
|
|
|
|
Here is the code that I use to register event handling for any <object> tag in an HTML document (ActiveX controls):
HRESULT hr = m_pIE->get_Document(&pDisp);
IHTMLDocument3* pDoc;
IHTMLElement* pElem = NULL;
IHTMLElementCollection* pColl;
hr = pDisp->QueryInterface(IID_IHTMLDocument3,(void**)&pDoc);
BSTR str = ::SysAllocString(L"OBJECT");
hr = pDoc->getElementsByTagName(str, &pColl);
long len;
hr = pColl->get_length(&len);
for (long x = 0; x < len; x++)
{
COleVariant index(x);
COleVariant index2((long)0);
IDispatch* spDispatch;
hr = pColl->item(index, index2, &spDispatch);
IHTMLObjectElement* spTempObjEl;
hr = spDispatch->QueryInterface(IID_IHTMLObjectElement,(void**)&spTempObjEl);
spDispatch->Release();
spTempObjEl->get_object(&spDispatch);
LPUNKNOWN pUnkSink = m_pEvent->GetIDispatch(TRUE);
BOOL bAdvised = AfxConnectionAdvise(spDispatch,DIID__ISliderEvents,pUnkSink, FALSE, &m_cookie);
}
In the code, I assume that all <object> tags represent Slider ActiveX controls (there are lots of others like UpDown, etc.). What I don't understand is how to declare the code to trap certain events (on_mousedrag, on_mouseclick, etc.) to callbacks that will figure out which of the many slider controls in the document is generating the event. Any ideas?
Also, how would you rewrite the code above using smart pointers?
Thanks,
|
|
|
|
|
... in theory, 'cause never did this before.
The OBJECT element is also a regular HTML element, so u can get the interface to IHTMLElement by calling QueryInterface . And the IHTMLElement has methods like put_onclick() , where you can pass a VARIANT with your IDispatch inside.
Another way is to get IHTMLElement2 and use its attachEvent(BSTR eventName, IDispatch*, VARIANT_BOOL* pResult) . Looks to me same, just another way.
Now how to determine who actually fired the event. I guess this way:
When your function is called by some event, you can get an IHTMLEventObj interface, from IHTMLWindow2 (get_event() function).
Then, this Event object has a function get_srcElement(IHTMLElemenr**p) , call it and you'll have your element and can obtain any of its attributes
TBiker wrote:
Also, how would you rewrite the code above using smart pointers?
See this post, I talked there about smart pointers. And your code will look like:
MSHTML::IHTMLDocument3Ptr pDoc;
HRESULT hr = m_pIE->get_Document(&pDoc);
MSHTML::IHTMLElementPtr pElem = NULL;
MSHTML::IHTMLObjectElementPtr spTempObjEl;
MSHTML::IHTMLElementCollectionPtr pColl = pDoc->getElementsByTagName(L"OBJECT");
IDispatchPtr pDisp;
for (long x = 0; x < pColl->length; x++)
{
spTempObjEl = pColl->item(x, (long)0);
spTempObjEl->get_object(&pDisp);
<font color=green>
pElem = spTempObjEl;
<font color=green>
LPUNKNOWN pUnkSink = m_pEvent->GetIDispatch(TRUE);
BOOL bAdvised = AfxConnectionAdvise(pDisp,DIID__ISliderEvents,pUnkSink, FALSE, &m_cookie);
}
Philip Patrick
"Two beer or not two beer?" (Shakesbeer)
Web-site: www.saintopatrick.com
|
|
|
|
|
Great suggestion! I implemented what you suggested and I found out that its not enough just to call attachEvent. This call (or put_onxxxxx) registers a dispatch pointer for a particular event but it does not enable the event. The way in which events are enabled is by adding this code:
HRESULT hr;
IConnectionPointContainer* pCPC = NULL;
IConnectionPoint* pCP = NULL;
DWORD dwCookie;
// Check that this is a connectable object.
hr = pElem->QueryInterface(IID_IConnectionPointContainer, (void**)&pCPC);
// Find the connection point.
hr = pCPC->FindConnectionPoint(DIID_HTMLElementEvents2, &pCP);
// Advise the connection point.
// pUnk is the dispatch pointer you used in attachEvent
hr = pCP->Advise(pUnk, &dwCookie);
When you are finished with events, disable the events by using a call to pCP->Unadvise(dwCookie);
Thanks for you help! I will wrap this code up and submit it for others to use.
|
|
|
|
|
I know it only in theory, I mean only by reading MSDN and stuff, but will be glad to implement it sometimes
TBiker wrote:
I will wrap this code up and submit it for others to use.
If you'll remember about me when you'll submit your article, please post about it here, so I won't pass it. I'd be glad to see it working and to use it too
lol
Philip Patrick
"Two beer or not two beer?" (Shakesbeer)
Web-site: www.saintopatrick.com
|
|
|
|
|
Well, not so great...
After testing some more, it turns out the IHTMLElement2 interface does not control ALL elements as Microsoft may claim (or this is a real possibility that it does but I haven't a clue how its done). The problem is that each element has its own set of connection interfaces (IDispatch, HTMLElementEvents2, HTMLInputElementEvents2, etc.) and HTMLElementEvents2 is not always available for certain element types. I discussed this with Microsoft (yep, used up one of my precious support calls) and they don't seem too knowledgeable either. They suggest creating separate event sinks for each ActiveX control. Problem with this is managing a huge amount of event class instances and determining which event belongs to which element. So I'm still researching this problem. Any ideas would be appreciated.
|
|
|
|
|
Wow! You spent one of those 4-per-year calls to Microsoft? Cool
Well, yeah, I heard about it, about separate class/instances, etc. for every element. Can't try this by myself, I have a big project on my neck right now, LMAO
Philip Patrick
"Two beer or not two beer?" (Shakesbeer)
Web-site: www.saintopatrick.com
|
|
|
|
|
I wrote the same code, and ran under two different version of IE, eg ie5.01P2 and ie6.02, anything is okay under v5.01, but fail under 6.02, I don't know why?
My code is as the following:
MSHTML::IHTMLElementCollection * pAllLink=NULL;
pDoc->get_links(&pAllLink);
if (pAllLink!=NULL)
{
LONG lLinkLen;
pAllLink->get_length(&lLinkLen);
VARIANT varIndex;
varIndex.vt = VT_UINT;
IDispatch* pDisp;
IHTMLElement *pLink;
BSTR bstrLinkAddress,bstrLinkTitle;
int iAddType=-1;
for (int i=0; i<lLinkLen;i++)
{
iAddType=-1;
varIndex.lVal = i;
pDisp = pAllLink->item(varIndex,(long)0);
pDisp->QueryInterface(IID_IHTMLElement,(void **)&pLink);
pLink->toString(&bstrLinkAddress);
CString cstrLinkAddress(bstrLinkAddress);
CString cstrTempLinkAddress;
cstrTempLinkAddress = m_cstrURL + CString("#");
if ((cstrLinkAddress.CompareNoCase(cstrTempLinkAddress) == 0) || (cstrLinkAddress.CompareNoCase(CString("about:blank#")) == 0))
{
VARIANT pLinkVariant;
pLink->get_onclick(&pLinkVariant);
if (pLinkVariant.vt != VT_NULL)
{
bstrLinkAddress = pLinkVariant.bstrVal;
cstrLinkAddress = CString(bstrLinkAddress);
iAddType = 0;
}
else
{
iAddType = 6;
}
}
if (iAddType == -1)
{
ParseURL(cstrLinkAddress,&iAddType);
}
if (iAddType == 4) //Text page and same directorys only...
{
CObject *cObj;
if (g_LinkList.Lookup(LPCTSTR(cstrLinkAddress),(CObject *&)cObj)==0)
{
g_LinkList.SetAt(LPCTSTR(cstrLinkAddress),NULL);
pLink->get_innerText(&bstrLinkTitle);
m_link.csaAddress.Add(cstrLinkAddress);
m_link.csaTitle.Add(CString(bstrLinkTitle));
m_link.cbaAddressType.Add((BYTE)iAddType);
}
}
SysFreeString(bstrLinkTitle);
SysFreeString(bstrLinkAddress);
}
pAllLink->Release();
}
it throws an exception that the memory cann't be accessed...
Can anyone help me, thanks a lot....
|
|
|
|
|
Which line are u getting an exception?
Also u have some leaks. If you are not using smart pointers u have to release all interfaces by yourself.
I suggest to use smart pointers, this will simplify the code also.
Also, as I see you want to get links in document. According to MSDN, the function get_links() will return you only links that HAVE name and/or id.
Philip Patrick
"Two beer or not two beer?" (Shakesbeer)
Web-site: www.saintopatrick.com
|
|
|
|
|
Oh, thanks for your reply... First, I'm not sure which line I get an exception, I try to debug line by line, sometimes it throws an exception on line "SysFreeString", and sometimes it throws an exception on my function called "ParseURL", but it never happens if I try my function in the way of "ParseString" which is shown on the website:http://www.codeguru.com/ieprogram/HTMLParsing.html, but in that way it causes memory leak.
And second, I don't know how to use smart pointers, I try to release any interfaces by myself, if it throws an exception by release function, it shows that 's a smart pointers, right? And if I try MSHTML::IHTMLElementCollection2Ptr, I don't know how to call its function get_links, so I have to use MSHTML::IHTMLElementCollection, can you give me more good suggestions,please??
And the last,as you see, I want to get links in document,if get_links can only return links that have name and/or id, how can I get others? Do you know?
Oh, yes, there's one more important question I want to ask you. When I try to parse html in the ParseString way as being shown on http://www.codeguru.com/ieprogram/HTMLParsing.html, it never open new ie windows, but when I try the way you prefer, and in a multithread subroutine, it news more and more ie windows, I have to close them one by one by click my mouse, I cann't stand, can you also solve this problem? Thanks for you help.....
|
|
|
|
|
Well. About smart pointers in MSHTML.
When you use smart pointer, it will manage memory and interface releasing by itself. Also as a bonus you will get a QueryInterface function nested inside smart pointer. So the code that u have:
IDispatch* pDisp;
IHTMLElement *pLink;
pDisp = pAllLink->item(varIndex,(long)0);
pDisp->QueryInterface(IID_IHTMLElement,(void **)&pLink);
will look like this:
MSHTML::IHTMLElementPtr pLink;
pLink = pAllLink->item(varIndex, (long)0);
What is going on here? Well, item() function still returns IDispatch , but when in code you are trying to assign it to smart pointer (pLink here), it calls to QueryInterface() internally and obtains a pointer to this interface. So if you see that pLink is not NULL, that says that QueryInterface() succeeded.
Also smart pointer is a simple class (that wraps an interface), thus, when it goes out of scope (end of function here), it calls to Release() in its destructor (do not do it by yourself, if you want to release an interface in smart pointer, just assign NULL to it - pLink = NULL )
This is about smart pointers. I personally don't like long messages, so I'll continue in next one
Philip Patrick
"Two beer or not two beer?" (Shakesbeer)
Web-site: www.saintopatrick.com
|
|
|
|
|
Well done. With your help, I change all of my source code to smart pointers now, they're much thinner than before, thanks a lot, and I'm waiting for your further reply, after that, I will take a full test for all...
|
|
|
|
|
Ok, about functions.
As I remember, all MSHTML smart pointers has same function names as raw interfaces, and in addition their own functions, which I suggest to use (but u don't have to of course). The naming convention for such fuctions usually like this:
If you have a function get_links() , the smart pointer's one will be called Getlinks() and so on. I'm not quite sure it is documented in MSDN, I used VisualAssist to get their names
Also in smart pointers all properties is already declared (as bstr_t , so u don't have to use toString() method or any other else. In case of link, u can freely get its href attribute like this:
CString cstrLinkAddress = (LPCTSTR)pLink->href
First, this class - bstr_t - is very useful, it is like smart pointer, releases memory used by itself, u don't have to use SysFreeString() , and second, it has its own conversion routine, from ANSI to UNICODE and vise versa, just remember to apply (LPCTSTR) casting when u want to assign it to CString
Now about links.
If you are working with IE5+, it will be quite easy to get links. Convert your MSHTML::IHTMLDocument2Ptr to MSHTML::IHTMLDocument3Ptr and you'll have a getElementsByTagName() function, which returns MSHTML::IHTMLElementsCollectionPtr of specified tag. The code will look like this (pDoc2 is IHTMLDocument2Ptr here):
CString csLinkHref;
MSHTML::IHTMLAnchorElementPtr pAnchor;
MSHTML::IHTMLDocument3Ptr pDoc3 = pDoc2;
MSHTML::IHTMLElementsCollectionPtr pCollection = pDoc3->getElementsByTagName(L"A");
for(long i=0; i<pCollection->length; i++){
pAnchor = pCollection->item(i, (long)0);
csLinkHref = (LPCTSTR)pAnchor->href;
<font color=green>
}
Look in the demo for this article. What I'm doing there is getting the list of links, I guess this is exactly what you want
-----------
Now about new windows. This is depends on how you are opening the document.. Post the code where you are getting the pointer to pDoc and also where you are writing to it in the way I used in article.
Philip Patrick
"Two beer or not two beer?" (Shakesbeer)
Web-site: www.saintopatrick.com
|
|
|
|
|
Thanks again, I read what you replied twice, and I understood what you said. Indeed, I had tried to use MSHTML::IHTMLAnchorElementPtr to get all links, as you shown in your demo project, but there're two problems I cann't solve in that way: first, I want to get the inner text of each links as I shown in the above message or this:
pLink->get_innerText(&bstrLinkTitle),
in your way, I'm afraid I cann't get it;
Second, there're so many links looked like this:
<a href="#" onclick="javaScript:location.href=...">...</a>
And in my raw way, I can get it by using get_onclick function, but there's no get_onclick function if I use MSHTML::IHTMLAnchorElementPtr.
And about the code where I'm getting the pointer to pDoc, I will show you in the next message, thank you...
|
|
|
|
|