Introduction
A useful skill to get rid of ATL, also providing a core similar to ATL, you will be able to quickly starting from a lite Framework written by yourself, don't need to write procedure c++ code based on windows SDK if you grasp this new thunk technique. it makes meaningful that your code is more clearly and not looking like c++ only.
Background
Most of windows developers knew how to work on ATL framework, a fewer peoples would knew the core mechanism that ATL used a thunk techniques to pass This pointer of a class instance in the WindowProc callback function, the This pointer can be fetched by the 1st parameter - hWnd substituted by assemble code, this techniques is not like MFC that it takes a mapping table to look for the This pointer, so it has higher performance rather than MFC.
Well, I'm not a guru in C++ even not in ATL, in spare time, I like to develop some smaller window apps. and be want a high performance for it. so I choose ATL framework as a base of windows app., but the base is still too big, in other words, the base still has complex encapsulations by C++ template feature, in my case, it's not much better specially for a smaller windows app., there is another choice - directly coding on windows SDK architectures, but it's not able to effectively utilize the class feature to wrap many methods as a component.
Be in a dilemma, After I did some investigations and dived into the core of ATL framework, Now, I knew how ATL grabs the This pointer to a class instance, and passing the This pointer in WindowProc, it's the thunk techniques that we mentioned before.
The principle is that it pushes the pThis and the related Thread ID to create window in a global list maintained by _ATLModule, then pop the pThis form the global list in WindowProc, by my understanding and searching some view of points on google, the principle should be reliable since a thread is not able to create many window at same time, and the calling WindowProc has the FIFO feature, i.e, the first created window will take the first This pointer in the global list.
In my case, I'm planning to get rip of ATL, but also want to use its thunk techniques in my code, so make code more effectively wrapped in class. the question is If I'm not intending to use ATL, how do I fetch the thunk mechanism? as you knew, the 'thunk' details is maintained by _ATLModule
, so looking like we're not able to get rid of it, must finding a good way..
Using the code
After fighting over a night, I found the key point - the 'thunk' data structure where aims to passing This pointer of a class in WindowProc
, I did some changes based on original one (you can find it in <atlstdthunk.h>), the below new 'thunk' data structure is what I did changes, you can keep in mind first, thought it's as a new thunk technique!
#pragma pack(push,1)
struct _stdcallthunk
{
DWORD m_mov; DWORD m_this; BYTE m_jmp; DWORD m_relproc; BOOL Init(DWORD_PTR proc, void* pThis)
{
m_this = PtrToUlong(pThis);
m_jmp = 0xe9;
m_relproc = DWORD((INT_PTR)proc - ((INT_PTR)this+sizeof(_stdcallthunk)));
FlushInstructionCache(GetCurrentProcess(), this, sizeof(_stdcallthunk));
return TRUE;
}
void* GetCodeAddress()
{
return this;
}
};
#pragma pack(pop)
If you compare the changed code to original one in <atlstdthunk.h>, you will find out m_mov
is missing in the Init
function.
Yes, I removed it already, I will explain why it is later on,
For conveniently, I did a dialog demo project where's using the new 'thunk' technique, In the CTestDlg
class , I placed two instances of the new 'thunk' data structure. Please following the below code:
Note, the below sections will involve with a little assemble language knowledge, but it's not required, you can attempt to read it anyway, or just skip it, then check for the dialog demo project directly.
If there is no special declaration, we'll only concern in X86 thunks, i.e, where the thunk codes are following the directive #if defined(_M_IX86)
. (Remark 1.)
HANDLE CTestDlg::s_hPrivHeap = NULL;
CTestDlg::CTestDlg(void)
{
if (!s_hPrivHeap)
{
s_hPrivHeap = ::HeapCreate(HEAP_CREATE_ENABLE_EXECUTE, 0, 0);
if (!s_hPrivHeap) throw "error: failed to create private heap!";
}
m_thunk = (_stdcallthunk*)::HeapAlloc(s_hPrivHeap, HEAP_ZERO_MEMORY, sizeof(_stdcallthunk)); if (!m_thunk) throw "error: m_thunk cannot be allocated by HeapAlloc";
#if defined(_M_IX86)
m_thunk->m_mov = 0x142444C7;
#elif defined (_M_AMD64)
m_thunk->m_mov = 0xb949; #endif
m_thunk2 = (_stdcallthunk*)::HeapAlloc(s_hPrivHeap, HEAP_ZERO_MEMORY, sizeof(_stdcallthunk)); if (!m_thunk2) throw "error: m_thunk2 cannot be allocated by HeapAlloc";
#if defined(_M_IX86)
m_thunk2->m_mov = 0x042444C7;
#elif defined (_M_AMD64)
m_thunk2->m_mov = 0xb948;
#endif
}
In the constructor of the class - CTestDialog
, I used HeapCreate
to allocate memory for the two instances of struct _stdcallthunk
, and flagged the memory page as HEAP_CREATE_ENABLE_EXECUTE
(Remark 2. ), this will avoid DEP (Data Execution Prevention) issue, i.e, if the thunk instances are initialized normally, the memory page(s) to the thunk won't be marked as executable, once the DEP is enabled in system advanced settings, the thunk will be crashing!
Take your time to understand DEP issue, Let's focus on the key codes where I placed the different assemble instruction for m_mov
respectively for the two thunk instances.
With the first instance - m_thunk
, we just get rid of the the global list in ATL where to grab the This pointer, instead of appending an extra parameter on stack frame through mov instruction:
m_thunk.m_mov = 0x142444C7
The corresponds to -
mov dword ptr [esp+0x14], pThis
Now, we're able to retrieve the This
in StartDialogProc
:
INT_PTR CALLBACK CTestDlg::StartDialogProc(HWND hwndDlg, UINT uMsg, WPARAM wParam, LPARAM lParam, DWORD_PTR This
)
{
CTestDlg* pThis = (CTestDlg*)This;
pThis->m_hDlg = hwndDlg;
pThis->m_thunk2->Init((DWORD_PTR)pThis->DialogProc, pThis);
DLGPROC pProc = (DLGPROC)pThis->m_thunk2->GetCodeAddress();
DLGPROC pOldProc = (DLGPROC)::SetWindowLongPtr(hwndDlg, DWLP_DLGPROC, (LONG_PTR)pProc);
return pProc(hwndDlg, uMsg, wParam, lParam);
}
Once we retrieved the This pointer, we assign the current HWND
handle to pThis->m_hDlg
, then we further initialize the next thunk - m_thunk2
, this will be forwarding to the real window procedure - DialogProc
, and the first parameter hwndDlg
in DialogProc
will be substituted by the This
pointer.
INT_PTR CALLBACK CTestDlg::DialogProc(HWND hwndDlg, UINT uMsg, WPARAM wParam, LPARAM lParam )
{
CTestDlg* pThis = (CTestDlg*)hwndDlg; ...
The all above processing is similar to the original ATL thunk mechanism, the difference is that we used two different thunks(i.e, different mov instruction) independing on a global list where to get This pointer back , so we get rid of ATL completely!
Points of Interest
This new thunk technique is really not a creative, I just applied a new approach to retrieve the This pointer, thus, you're not only getting rid of ATL , you also get a chance to keep tidiness code as me mentioned at beginning, another hand, it's absolute benefit for enhance performance when you only write a smaller windows app.
B.t.w, the new thunk technique is only working on X86, I would like to make a little of investigation on X64 after soon.
Remarks
1. The demo project also supports X64 thunks, the assemble code for X64 thunks has a little of changes, it's not like X86 thunks, it does pass the This pointer in the 4th parameter - lParam
in StartDialogProc , on X64, it manages stack frame by the caller not the callee, if we place the This pointer on stack frame [rsp+xxh]
, then we could still have to manually write more assemble codes to restore stack frame(the thunk code is the caller), this is too complex to me, so I pass the This pointer to the 4th parameter - lParam only, basically, when the StartDialogProc
is called initially, lParam is nothing, so why I choose this parameter to save This pointer. I'm not sure that I'm understanding correctly.
2. Earlier, I used VirtualAlloc to construct thunk instances dynamically, but it was too wasting memory space that I ever created each thunk in a new memory page(4k), even you want to room other thunks in same one memory page, you will still have to write extra code to create a list then detecting which thunk is freed or not.., HeapCreate / HeapAlloc / HeapFree will be more easily auto-manage memory behind a heap, it already realized the intelligence.
History
2012-7-28: Fixing DEP (Data Execution Prevention) issue.
2012-7-29: Thunks can be auto-managed by private heap in process. and new demo project supports X64 thunks.