The BHO running in Internet Explorer 8
Contents
Introduction
Browser Helper Objects (also called BHOs) are COM components that act as Internet Explorer plug-ins. BHOs can be used for customizing Internet Explorer to any extent: from user-interface modifications to web filters to download managers.
In this article, we will learn how to write and install a simple BHO in C++ without the use of extraneous frameworks like ATL or MFC. This example BHO can be used as the starting point for your own BHO.
Background
COM — or Component Object Model — is a language-neutral technology used extensively in Windows to enable the componentization and re-use of software modules.
Most COM code written in C++ (and hence in examples on the Web) uses the Active Template Library (ATL) and/or Microsoft Foundation Classes (MFC) frameworks to help get the job done. However, learning to use ATL and MFC effectively can often be yet another barrier to creating COM objects, specially simple ones like BHOs.
This article will teach you what you need to know about COM and BHOs to start writing your own BHOs, and do it using just C++ and the Windows API without being forced into a more complex framework like ATL or MFC.
Getting Started
The best way to understand this article is to download the source code from the link above and follow along. The source code is well-documented, and should be easy to understand.
Understanding the COM Code
COM Terms
First, let's get some COM terminology out of the way:
- interface — A set of methods that is visible to other objects. It is the equivalent of public methods in a C++ pure-virtual class.
- coclass — Derives from one or more interfaces, and implements all their methods with concrete functionality. It is the equivalent of an instantiable C++ class derived from one or more pure-virtual classes.
- object — An instance of a coclass.
- GUID — Globally Unique IDentifier — A GUID is a 128-bit unique number. It can be generated via the guidgen.exe utility.
- IID — Interface IDentifier — A GUID that identifies an interface.
- CLSID — CLaSs IDentifier — A GUID that identifies a COM component. You can find the example BHO's CLSID in the common.h file as
CLSID_IEPlugin_Str
. Every COM component has a different identifier, and you should generate a new one if you decide to build your own BHO from the example BHO code.
The IUnknown Interface
In order for us to create COM objects, we must write coclasses which implement interfaces. All COM objects must implement an interface known as the IUnknown
interface. This interface has three very basic methods which allow other objects to memory-manage objects of a coclass as well as ask objects for other interfaces. These three methods are known as QueryInterface
, AddRef
, and Release
. Since all the various coclasses we'll be implementing derive from IUnknown
, it makes sense to create a coclass of IUnknown
which implements the three IUnknown
methods. We'll call this coclass CUnknown
and have all our other coclasses derive from it so we don't have to write the implementation of the IUnknown
methods separately for every coclass.
Note: You can find the definition and implementation of CUnknown
in the unknown.h file.
COM DLL Exports
Every COM DLL exports four functions that are used by the COM system to create and manage COM objects from the DLL as well as to install and uninstall the DLL. These functions are:
DllGetClassObject
DllCanUnloadNow
DllRegisterServer
DllUnregisterServer
Note: You can find these functions in the main.cpp file, and they are declared as exports in the dll.def file.
Our DLL must have a coclass of the IClassFactory
interface. We'll call this coclass CClassFactory
. The DllGetClassObject
function creates CClassFactory
objects and returns interface pointers to them. The IClassFactory
interface is explained in more detail shortly.
The DllCanUnloadNow
function is called by COM to determine if our DLL can be unloaded from a process. All we have to do is check whether our DLL is currently managing any object, and return S_OK
if we aren't, or S_FALSE
if we are. We can do this by incrementing a DLL-global reference counter DllRefCount
in the constructors of our coclasses and decrementing the counter in their destructors. If the reference counter is non-zero, it means that instances of our coclasses still exist and the DLL should not be unloaded at the moment.
DllRegisterServer
is called by a program that wants our DLL to install itself. We have to register our COM component in the system and also as a BHO. We do this by creating the following Registry entries:
HKEY_CLASSES_ROOT\CLSID\<CLSID_IEPlugin_Str>
— The default value of this key should be set to a human-readable description of the COM component. In this case, it's "CodeProject Example BHO".HKEY_CLASSES_ROOT\CLSID\<CLSID_IEPlugin_Str>\InProcServer32
— The existence of this key identifies that this COM component can be loaded as a DLL into the process that wants to use it. There are two values we need to set for this key, as below:
(default)
— A REG_SZ
or REG_EXPAND_SZ
value specifying the path to the DLL which contains the COM component.ThreadingModel
— This specifies the threading model of the COM component. It's a more advanced concept, and we don't need to worry about it. We just set it to "Apartment".
HKEY_LOCAL_MACHINE\Software\Microsoft\Windows\CurrentVersion\ Explorer\Browser Helper Objects\<CLSID_IEPlugin_Str>
— The existence of this key registers our COM component as a BHO. We create one value under this key named NoExplorer
and set it as a REG_DWORD
with a value of 1. Normally, BHOs are also loaded by explorer.exe, but this value prevents explorer.exe from unnecessarily loading our BHO.
DllUnregisterServer
is called to do the exact opposite of DllRegisterServer
— unregister our COM component and also remove it as a BHO. In order to do this, we just need to delete the Registry keys we create in DllRegisterServer
.
COM uses an IClassFactory
object created by our DLL's DllGetClassObject
function to get instances of other interface-implementations supported by the DLL. The IClassFactory
interface defines the methods CreateInstance
and LockServer
. We call our coclass implementing the IClassFactory
interface CClassFactory
.
Note: You can find the definition of CClassFactory
in ClassFactory.h, and the implementation in ClassFactory.cpp.
CreateInstance
does exactly what it says — creates an instance of a coclass in our DLL that supports a given IID. Since we are a BHO, we only need to create instances of a coclass that supports the IObjectWithSite
interface.
LockServer
is called to either lock or unlock our DLL in memory. Depending on whether it is locking or unlocking, our implementation of LockServer
simply increments or decrements the DLL-global DllRefCount
variable.
Understanding the BHO Code
How Internet Explorer Loads the BHO
When a BHO is to be loaded by Internet Explorer, it calls a COM function called CoCreateInstance
, passing it the CLSID of our BHO and the IID of an interface called IObjectWithSite
. IObjectWithSite
is explained in more detail below. COM in-turn loads our DLL into the Internet Explorer process by looking up our CLSID in the Registry, and then calls our exported DllGetClassObject
function to get an instance of our CClassFactory
coclass. Once COM has a pointer to our CClassFactory
object, COM calls its CreateInstance
method, passing it the IID that Internet Explorer supplied. Our implementation of CreateInstance
creates an instance of our implementation of IObjectWithSite
, which is called CObjectWithSite
, and gets the interface pointer of the requested IID from it, and gives it to back to COM, which passes it on to Internet Explorer. Internet Explorer then uses the IObjectWithSite
interface pointer to interact with our BHO.
The IObjectWithSite Interface
BHOs are required to implement the IObjectWithSite
interface, which is how Internet Explorer communicates with a BHO. This interface has two methods, SetSite
and GetSite
. The coclass in our DLL which implements the IObjectWithSite
interface is known as CObjectWithSite
.
Note: You can find the definition of CObjectWithSite
in ObjectWithSite.h, and the implementation in ObjectWithSite.cpp.
SetSite
is called by Internet Explorer to give us a pointer to a site object. A site object is merely a COM object which is created by Internet Explorer and can be used by our BHO to communicate with Internet Explorer. Our implementation of SetSite
gets an interface pointer to the IConnectionPointContainer
interface from the site object. We then use a method FindConnectionPoint
in IConnectionPointContainer
to get an IConnectionPoint
interface pointer to an object that supports the DWebBrowserEvents2
dispatch interface. A dispatch interface is a special type of interface which is derived from IDispatch
and is used to receive event notifications through its Invoke
method. Our implementation of DWebBrowserEvents2
is known as CEventSink
, and is explained in more detail in the next section. We use the Advise
method of the IConnectionPoint
interface to tell Internet Explorer to pass events on to our CEventSink
object.
Our implementation of SetSite
also gets an interface pointer to the IWebBrowser2
interface from the site object. The IWebBrowser2
interface is implemented by Internet Explorer's site object, and has methods which can be used by us to interact with Internet Explorer.
Note: Since this example BHO only receives event notifications from Internet Explorer, and does not actually control Internet Explorer, we don't make use of any of the methods in IWebBrowser2
. However, I have included the code to get the IWebBrowser2
interface so that you can use it in your own BHO if needed. You can find the documentation for IWebBrowser2
here.
GetSite
is called by Internet Explorer to know which object we have currently set as the site object. Internet Explorer passes us an IID for an interface it wants from our currently set site object, and we simply get that interface from our site object and give it back to Internet Explorer.
The CEventSink Coclass
The CEventSink
coclass is our implementation of the DWebBrowserEvents2
dispinterface. DWebBrowserEvents2
derives from IDispatch
but does not implement any methods of its own. Instead, DWebBrowserEvents2
exists only so that its unique DIID (dispatch IID) can exist. This DIID identifies the events that a coclass of DWebBrowserEvents2
can receive on its implementation of the IDispatch
method Invoke
.
Note: You can find the definition of CEventSink
in EventSink.h, and the implementation in EventSink.cpp.
When Internet Explorer wants to notify us of an event, it will call the Invoke
method of CEventSink
, passing the event ID in the dispIdMember
parameter and other information about the event in the pDispParams
parameter. IDispatch
also has three other methods besides Invoke
(GetTypeInfoCount
, GetTypeInfo
, and GetIDsOfNames
), but we don't need to implement any functionality for them because we only receive events.
Another difference that you might notice about CEventSink
from our other coclasses is that it does not derive from CUnknown
. This is because we only need one DLL-global statically-allocated instance of CEventSink
, called EventSink
. Because of this, we don't need to implement any reference counting, and hence don't need the reference counting and memory-management functionality of CUnknown
.
Handling Events
Internet Explorer calls CEventSink::Invoke
to notify us of events. The dispIdMember
parameter contains an ID identifying which event is being fired, and the pDispParams->rgvarg[]
array contains the arguments of the event itself as an array of VARIANT
s. You can see what arguments an event takes by looking it up in the documentation of DWebBrowserEvents2
, which can be found here. The arguments are passed in the pDispParams->rgvarg[]
array in the opposite order they are listed in the event's documentation. In order to convert these arguments from a VARIANT
to a more usable type, we first declare a VARIANT
for every argument we will use and initialize it using the VariantInit
API function. Then, we can use the VariantChangeType
API function to convert the VARIANT
s in the pDispParams->rgvarg[]
array to a VARIANT
of a more usable type. Once we have done this for all the arguments we will need, we can pass the value of the arguments to our own Event_*
method by using an appropriate member of our converted VARIANT
variables. After the event handler method returns, we free any resources used by our converted VARIANT
s by calling the VariantClear
method on each of them. A concrete example of how to do this is given below.
Our example BHO just handles one event, the BeforeNavigate2
event. The documentation for BeforeNavigate2
can be found here. This event is fired just before Internet Explorer navigates to a new location.
Taking a look at the documentation, we see that BeforeNavigate2
gives us seven arguments. We will not concern ourselves with the pDisp
argument, which is just an IDispatch
interface pointer to the site object.
The event arguments are stored in the pDispParams->rgvarg[]
as variants. Before we can use these arguments, though, we need to convert the variants to different variant types which are easier to use. In order to do this, we have an array of variants in the Invoke
method which stores the converted variants. We first need to initialize each of these variants using the VariantInit
function, and when we are done, we free any used resources by using the VariantClear
function.
The variant array in the Invoke
method
VARIANT v[5];
...
for(n=0;n<5;n++) VariantInit(&v[n]); ... for(n=0;n<5;n++) VariantClear(&v[n]);
The url
argument contains the URL that is being navigated to. It is the fifth argument from last (the last one being the 0th), so we access it as pDispParams->rgvarg[5]
. We convert this variant to a variant type of VT_BSTR
as it may not be in that format, and store the converted variant into v[0]
. We can then access the URL as a BSTR
string by using v[0].bstrVal
. A BSTR
string is commonly used in COM to pass string data. It consists of a 4-byte prefix indicating the length of the string, followed by the string data as a double-byte character NULL-terminated Unicode string. A BSTR
variable always points to the string data, and not to the 4-byte prefix before it, which conveniently allows us to use it as a C-style string. A double-byte character string type is declared in the Windows API headers as LPOLESTR
, regardless of whether the program is using the wide-character set by default. So, we can pass the string v[0].bstrVal
as an LPOLESTR
parameter to the Event_BeforeNavigate2
event-handling method.
How to access the url
argument
VariantChangeType(&v[0],&pDispParams->rgvarg[5],0,VT_BSTR);
...
Event_BeforeNavigate2( (LPOLESTR) v[0].bstrVal , ... );
The Flags
argument contains information about whether the navigation is a result of an external window or tab. It is the fourth argument from last, so we access it as pDispParams->rgvarg[4]
. We convert this variant to a variant type of VT_I4
, and store the converted variant into v[1]
. VT_I4
means a 4-byte signed integer, so we can then access the value as a long
through v[1].lVal
, and pass it on to Event_BeforeNavigate2
.
How to access the Flags
argument
VariantChangeType(&v[1],&pDispParams->rgvarg[4],0,VT_I4);
...
Event_BeforeNavigate2( ... , v[1].lVal , ... );
The TargetFrameName
argument contains the name of the frame in which the navigation is happening. It is the third argument from last, so we access it as pDispParams->rgvarg[3]
. We convert this variant to a variant type of VT_BSTR
and store the converted variant into v[2]
. We can then access the string value as a LPOLESTR
through v[2].bstrVal
, just like with the url
argument, and similarly pass it on to Event_BeforeNavigate2
.
How to access the TargetFrameName
argument
VariantChangeType(&v[2],&pDispParams->rgvarg[3],0,VT_BSTR);
...
Event_BeforeNavigate2( ... , (LPOLESTR) v[2].bstrVal , ... );
The PostData
argument contains POST data if the navigation is due to an HTTP POST request. It is the second argument from last, so we access it as pDispParams->rgvarg[2]
. The documentation states that PostData
is of the variant type VT_BYREF|VT_VARIANT
. This means that PostData
is actually a pointer to another variant. Reading further in the Remarks section of the documentation, we can see that the variant that is pointed to contains a SAFEARRAY
. A SAFEARRAY
is often used in COM to contain array data. We convert the PostData
variant to a variant of type VT_UI1|VT_ARRAY
and store the converted variant into v[3]
. VT_UI1|VT_ARRAY
means that after the conversion, v[3]
is a variant that points to a SAFEARRAY
which has a 1-dimensional array of 1-byte unsigned integers. Before accessing the SAFEARRAY
data in v[3]
, we first need to check if the conversion to VT_UI1|VT_ARRAY
was successful. If the navigation did not contain any POST data, then the conversion would not have been successful and the variant type of v[3]
would be VT_EMPTY
. On the other hand, if the data exists, we can access the SAFEARRAY
the variant points to by using v[3].parray
, and we can access the data within that SAFEARRAY
using the SafeArray*()
API functions.
First, we get the size of the data in the array using the functions SafeArrayGetLBound
and SafeArrayGetUBound
. These functions retrieve the lower and upper bounds of the array, respectively. Subtracting the lower bound from the upper bound and adding 1 gives us the number of elements in the array. We then access the actual data by using the function SafeArrayAccessData
, which gives us a pointer to the data and also locks the array. Since the array's elements are of type 1-byte unsigned integer, we can access the data as a C-style array of unsigned char
s. We pass the data pointer and data size to Event_BeforeNavigate2
. Afterwards, we unlock the array by calling SafeArrayUnaccessData
.
How to access the PostData
argument
PVOID pv;
LONG lbound,ubound,sz;
...
VariantChangeType(&v[3],&pDispParams->rgvarg[2],0,VT_UI1|VT_ARRAY);
if(v[3].vt!=VT_EMPTY) {
SafeArrayGetLBound(v[3].parray,0,&lbound); SafeArrayGetUBound(v[3].parray,0,&ubound); sz=ubound-lbound+1; SafeArrayAccessData(v[3].parray,&pv); } else {
sz=0; pv=NULL; }
...
Event_BeforeNavigate2( ... , (PUCHAR) pv , sz , ... );
...
if(v[3].vt!=EMTPY) SafeArrayUnaccessData(v[3].parray);
The Headers
argument contains any additional HTTP headers that were sent for the navigation. It is the next-to-last argument, so we access it as pDispParams->rgvarg[1]
. We convert this variant to a variant of type VT_BSTR
and store the converted variant into v[4]
. We pass the data on to Event_BeforeNavigate2
in the same fashion as the url
and TargetFrameName
arguments.
How to access the Headers
argument
VariantChangeType(&v[4],&pDispParams->rgvarg[4],0,VT_BSTR);
...
Event_BeforeNavigate2( ... , (LPOLESTR) v[4].bstrVal , ... );
The Cancel
argument is a return value that indicates to Internet Explorer whether it should continue or cancel the navigation. It is a variant of type VT_BYREF|VT_BOOL
, which means it contains a pointer to a VARIANT_BOOL
type, which takes a value of either VARIANT_TRUE
or VARIANT_FALSE
. It is the last argument, so we access it as pDispParams->rgvarg[0]
. We can access the VARIANT_BOOL
that this argument points to via *(pDispParams->rgvarg[0].pboolVal)
. If we set the VARIANT_BOOL
pointed to by the Cancel
argument to VARIANT_TRUE
, Internet Explorer cancels the navigation. If we set it to VARIANT_FALSE
, Internet Explorer continues the navigation as normal. Because more than one BHO can be handling the BeforeNavigate2
event, the pre-existing value of the VARIANT_BOOL
pointed to by the Cancel
argument can correspond to a value set by a BHO that handled BeforeNavigate2
before our BHO. We pass the existing value to Event_BeforeNavigate2
as a bool
value, so Event_BeforeNavigate2
can decide whether to override the existing value or keep it. The return value of Event_BeforeNavgiate2
is a bool
that indicates the new value of the Cancel
argument.
How to access the Cancel
argument
bool b;
...
b = Event_BeforeNavigate2( ... ,
( (*(pDispParams->rgvarg[0].pboolVal)) != VARIANT_FALSE ) );
...
if(b) *(pDispParams->rgvarg[0].pboolVal)=VARIANT_TRUE;
else *(pDispParams->rgvarg[0].pboolVal)=VARIANT_FALSE;
Note: You can find the code for the BeforeNavigate2
event handler, Event_BeforeNavigate2
, in the EventSink.cpp file.
Installing the BHO
Installing the BHO merely requires that a process calls our BHO's DllRegisterServer
function. This is made simple via the regsvr32.exe utility. Simply run the command regsvr32.exe <path to the BHO dll>, and the BHO should be registered. To uninstall the BHO, a process needs to call our BHO's DllUnregisterServer
function. This can also be accomplished via regsvr32.exe by running the command regsvr32.exe /u <path to the BHO dll>.
Using the Code
Use the source code for this example BHO as a starting point for your own BHO.
Note: Don't forget to generate a new CLSID_IEPlugin
and CLSID_IEPlugin_Str
for your own BHO! You can generate a new CLSID by using the guidgen.exe utility. You can find CLSID_IEPlugin
defined in main.cpp and CLSID_IEPlugin_Str
in common.h.
You can start by customizing the Event_BeforeNavigate2
event handler for your own needs. You can also handle more events by adding new event handler methods to the CEventSink
class and calling them from CEventSink
's Invoke
method.
Note: You can find the definition of CEventSink
in the EventSink.h file, and the implementation of CEventSink
and Event_BeforeNavigate2
in the EventSink.cpp file.
Conclusion
I hope you have enjoyed reading this article and learned some new stuff. Feel free to leave comments with questions, corrections, and suggestions!
Revision History