Introduction
Thunk is a very useful technology. I will talk about three typical uses of thunks in this article:
- Turning callbacks into member functions of classes.
- Providing an interface proxy.
- Supporting virtual functions under multiple inheritance in C++.
Before we start, let's have a general idea of what a thunk is. Thunk is generally a piece of machine code that intercepts a client call and modifies the call stack before jumping to the real implementation of the client call.
Turning callbacks into member functions of classes
Libraries often require callbacks. The problem with callbacks is that they need to be implemented as global or static functions, which may be inconvenient in an OO development environment. For instance, a Win32 program requires us to write a WNDPROC
callback function in which there is usually a big switch/case
block, and what is worse is that we also need to define some static variables inside the WNDPROC
to keep track of states between calls. If we could turn callbacks into member functions of classes, then we would be able to use member functions instead of a big switch/case
block, and further, we would be able to use member variables instead of static variables of functions to keep track of states.
A thunk can do this magic. The problem of callbacks is that they do not have the this
pointer, so the main job of a thunk is to add a this
pointer to the call stack and then call the callback. Once inside the callback, we can fetch the this
pointer from the call stack and call member functions using the this
pointer. But wait, here the callback is called by the thunk, not by any library, so we still need to provide the library with an address which will be called whenever the library calls the callback. And, this address is the address of our thunk. So, the procedure can be summarized as:
- The library calls the thunk.
- The thunk adds a
this
pointer to the call stack. - The thunk forwards the call to the actual callback.
- The callback fetches the
this
pointer from the call stack and calls member functions using the this
pointer.
This technology is used in the implementation of ATL's CWindowImpl
, so I just stole the code from ATL and modified/simplified it. By the way, you don't have to have any knowledge of ATL to read this sample. But, if you are already familiar with ATL, you can skip this sample.
Using the code: The WindowWithThunk project
The code relating to step 1 is as follows:
LRESULT CALLBACK StartWindowProc(HWND hWnd, UINT uMsg, WPARAM wParam, LPARAM lParam)
{
WNDPROC pProc = (WNDPROC)pThis->m_pThunk;
::SetWindowLong(hWnd, GWL_WNDPROC, (LONG)pProc);
}
The code relating to step 2 is as follows:
LRESULT CALLBACK StartWindowProc(HWND hWnd, UINT uMsg, WPARAM wParam, LPARAM lParam)
{
CWindowWithThunk* pThis = (CWindowWithThunk*)g_ModuleData.ExtractWindowObj();
pThis->m_pThunk->Init((DWORD)TurnCallbackIntoMember, pThis);
}
void _stdcallthunk::Init(DWORD proc, void* pThis)
{
m_mov = 0x042444C7; m_this = PtrToUlong(pThis);
}
Figure 1: The left side shows the original call stack, the right side shows the call stack after "mov dword ptr[esp+0x4], pThis"
The code relating to step 3 is as follows:
LRESULT CALLBACK StartWindowProc(HWND hWnd, UINT uMsg, WPARAM wParam, LPARAM lParam)
{
pThis->m_pThunk->Init((DWORD)TurnCallbackIntoMember, pThis);
}
void _stdcallthunk::Init(DWORD_PTR proc, void* pThis)
{
m_jmp = 0xe9;
m_relproc = DWORD((INT_PTR)proc - ((INT_PTR)this+sizeof(_stdcallthunk)));
}
The code relating to the final step is as follows:
LRESULT CALLBACK TurnCallbackIntoMember(HWND hWnd, UINT message,
WPARAM wParam, LPARAM lParam)
{
CWindowWithThunk* pThis = (CWindowWithThunk*)hWnd;
pThis->OnPaint();
}
Providing an interface proxy
In C++, an interface is a collection of method declarations without implementations. An interface pointer is a pointer to a vptr, which in turn points to an array of functions that implement those methods declared inside the interface. An interface proxy (hereinafter referred to as proxy) is the same as an interface as far as the client using the interface is concerned. When a client calls methods using a proxy pointer, the client ends up calling the implementation of the proxy. The implementation of a proxy can do anything it wants, such as fetching the arguments that are pushed onto the call stack by the client and then forwarding them to the real implementation of the interface. For instance, in COM Marshaling, or any other RPC environment, when a client requests an interface pointer from COM, COM just returns a proxy pointer to the client. The client then calls methods using the proxy pointer, this ends up calling the implementation of the proxy. The implementation of the proxy then fetches the arguments from the call stack, packs them, and sends them to a remote machine or another apartment where the real implementation of the interface is called. Now, the question is how we should write proxies for interfaces. One answer would be that we write a separate proxy for each interface. But this is tedious. One better solution would be to have a single proxy for all interfaces. In COM, there is such a Universal Marshaler (or Type Library Marshaler) which can provide a single proxy for all interfaces with the help of type libraries.
One single proxy for all interfaces means we use one single proxy implementation (method definition) to handle all method calls from all interfaces made by the client. So, this single proxy implementation should know which method of which interface is being called by the client, because only by knowing this can we determine what arguments to expect on the call stack. When a client requests an interface pointer using an IID (interface ID), the client is requesting a pointer to a vptr, which in turn points to a vtable. So, we can create a vtable and associate it with the IID (provided by the client) and method indexes. A method index is the index of the method within an interface, and it is also the index of the method within the vtable. We can know the index of a method if we know the total number of methods in an interface, and we can know the total number of methods in an interface by querying a type library using the IID. But, a vtable is just an array of DWORD
s on the x86 platform, so we can't simply fill the vtable with both the IID and method indexes. Here, we can use thunks again. We prepare a thunk for each method, initialize each thunk with the IID and a method index, then fill the vtable with the address of each thunk. The main job of the thunk is to push both the IID and the method index onto the call stack, and then forward the call to the single proxy implementation. The single proxy implementation can now determine which method of which interface is being called using the IID and the method index. The procedure can be summarized as follows:
- The client requests an interface pointer using an IID.
- We initialize each thunk with the IID and a method index; fill the vtable with the address of each thunk. See figure 2.
- Return the proxy pointer (a pointer to the vptr which points to the vtable created in step 2) to the client.
- The client calls a method using the proxy pointer (the client ends up calling a thunk initialized in step 2).
- The thunk pushes both the IID and the method index onto the stack and calls the single proxy implementation. See figure 3.
- The single proxy implementation determines what arguments to expect by querying a type library using the IID and the method index. Now, the single proxy implementation can do whatever it wants.
Figure 2: The relationship between a vtable and its associated thunks. Here the interface ID is 1234.
Figure 3: The left side shows the original call stack, the right side shows the call stack after the IID and the method index are pushed.
Using the code: The UniversalProxy project
The code relating to step 1 is as follows:
int _tmain(int argc, _TCHAR* argv[])
{
IInterface_Zero* pI0;
ProxyProvider(0, (void**)&pI0);
}
The code relating to steps 2 and 3 is as follows:
void ProxyProvider(DWORD iid, void** ppv)
{
DWORD methods = FakeTypeLibrary::GetNumOfMethods(iid);
DWORD** vptr = new DWORD*;
DWORD* vtable = new DWORD[methods];
for(DWORD midx = 0; midx < methods; ++midx)
{
thunk* pThunk = new thunk();
WORD bytes_to_pop = FakeTypeLibrary::GetAugumentStackSize(iid, midx) + 4;
pThunk->Init(iid, midx, bytes_to_pop);
vtable[midx] = (DWORD)pThunk;
}
(*vptr) = vtable;
*ppv = vptr;
}
The code relating to step 4 is as follows:
int _tmain(int argc, _TCHAR* argv[])
{
pI0->DoSomething(3,'a');
}
The code relating to step 5 is as follows:
void Init(DWORD iid, DWORD midx, WORD bytes)
{
push_interface = 0x68;
interface_id = iid;
push_method = 0x68;
method_idx = midx;
call = 0xe8;
func_offset = (DWORD)&ProxyImplementation - (DWORD)&add_esp;
}
The code relating to the final step is as follows:
static void _cdecl ProxyImplementation(DWORD midx, DWORD iid, DWORD client_site_addr,
void* pThis )
{
if(iid == 0)
{
if(midx == 0)
{
BYTE* arg_addr = (BYTE*)&pThis + 4;
int arg0 = *(int*)arg_addr;
arg_addr += 4;
char arg1 = *(char*)arg_addr;
pRealImpl_Zero->DoSomething(arg0,arg1);
}
}
}
Supporting virtual functions under multiple inheritance in C++
Consider the following code:
class Base1
{
public:
virtual ~Base1(){}
private:
int Base1Data;
};
class Base2
{
public:
virtual ~Base2()
{
cout << this->Base2Data;
}
private:
int Base2Data;
};
class Derived : public Base1, public Base2
{
public:
virtual ~Derived()
{
cout << this->DerivedData;
}
private:
int DerivedData;
};
void DeleteObj(Base2* pObj)
{
delete pObj;
}
int main()
{
Base2* pB2 = new Derived();
DeleteObj(pB2);
return 0;
}
The Base2
pointer pB2
is assigned the address of a Derived
object. But, the address of the new Derived
object must be adjusted to address its Base2
subobject before it can be saved to pB2
. The code to do this is generated by the compiler:
Derived* temp = new Derived;
Base2 *pB2 = temp? temp+sizeof(Base1) : 0;
Now, let's take a look at the statement "delete pObj
" inside the DeleteObj()
function. At this point, the compiler has no idea what object pObj
points to. If pObj
points to a Base2
object, pObj
(as the this
pointer) should be pushed onto the call stack and Base2::~Base2()
should be called. If pObj
points to a Derived
object, pObj
should be readjusted to address the beginning of the complete Derived
object before it is pushed onto the call stack and Derived::~Derived()
is called. But, because the compiler does not know what object pObj
points to, it cannot determine whether to readjust pObj
or not. So, this decision and readjustment can only be made at runtime.
Here, thunks can help again. We can create a thunk for each virtual function that requires adjustment/readjustment of the this
pointer, and then fill the vtable slot with the address of the thunk. The main job of the thunk is to adjust the this
pointer and then jump to the actual virtual function. The thunk looks like:
Base2_destructor_thunk:
this -= sizeof(base1);
Derived::~Derived(this);
Now, let's look at the DeleteObj()
function again. When pObj
points to a Base2
object, the vtable slot for the destructor contains the address of Base2::~Base2()
, so "delete pObj
" simply calls Base2::~Base2()
. When pObj
points to a Derived
object, the vtable slot for the destructor contains the address of the thunk (Base2_destructor_thunk
, in this case), so "delete pObj"
calls the thunk, which adjusts the this
pointer and then jumps to Derived::~Derived()
.
In conclusion
There are other uses of thunks, such as API hooking, message filtering, and so on. But, the idea behind is the same: intercepting the call and modifying the call stack. The WindowWithThunk sample inserts the this
pointer into the call stack; the UniversalProxy sample pushes two extra arguments onto the call stack; the MultipleInheritance sample modifies the this
pointer already on the call stack.
Acknowledgements and references
- ATL Internals: Working with ATL 8, Second Edition by Christopher Tavares, Kirk Fertitta, Brent Rector, Chris Sells. Published by Addison Wesley Professional.
- Inside the C++ Object Model by Stanley B. Lippman. Published by Addison Wesley.