Introduction
This article introduces the readers to detours via an example: Memory Analyzer, memory analyzer is a simple tool to detect memory leaks and incorrect memory de allocations (i.e. free called in place of delete etc.).
Detours are a step up from API hooking, they support trapping of recursive functions and are thread safe. They are more complicated since 2 JMP instructions have to be placed intricately and introduce a concept of a return function, this function is returned by the method MyAttach (refer to code) and is also maintained by member variable : m_Return (this may be referred to as a trampoline function).
(I call them detours since I am trying to do the same thing as MS detours).
As with my previous article writing method, it aims to introduce detours and not memchecker.
Also an OutputDebugString
is used to communicate between the process and the memchecker (we build a debugger), debug symbol files are used to show call stack and line number in code regarding memory operations .
Background
My previous articles are a must read: http://www.codeproject.com/Articles/163408/APIHooking , http://www.codeproject.com/Articles/189711/Write-your-own-Debugger-to-handle-Breakpoints .
Basic windows programming is also required and a tad bit of assembly (as mentioned in my previous article APIHooking) .
Using the code
The attached code is built using VS2012 and must be referred to at all times while reading this article. Attached code only supports 32-bit detours, and injects DLL into running code, 32-bit DLL can only be injected in 32-bit process.
Since we are going to analyze the memory, we must trap the following APIs:
HeapAlloc
HeapReAlloc
(not implemented) HeapFree
VirtualAlloc
and associated APIs are not being considered, the readers are free to post their implementations.
These APIs are chosen since malloc, new, free, delete etc make calls to one of the above mentioned APIs.
Lets Hook
Most of the code is similar to API hooking i.e. we add a 5 byte JMP in the original function to direct the call to the trap. Additional return (m_Return
) function must be built since we are not going to re-patch the function when calling the original function from our trap API as we did in API hooking. Notice (refer code below) that I add the Jmp instructions on and after m_Return[5]
, the bytes from m_Return[0]
to m_Return[4]
hold the original instructions (3 complete Opcodes).
memcpy(m_Return,m_Original,sizeof(m_Original));
DWORD JmpDiff1 = ((DWORD)func - (DWORD)m_Return-5);
memcpy(&TrapJmp[1], &JmpDiff1, 4);
memcpy(&m_Return[5],TrapJmp,sizeof(TrapJmp));
m_pReturnedFunc=m_Return;
We will call m_Return
instead of re-patching as we did in API hooking (refer code below).
LPVOID WINAPI MyHeapAlloc(
__in HANDLE hHeap,
__in DWORD dwFlags,
__in SIZE_T dwBytes
)
{
void* pRet=((void* (__stdcall*)( __in HANDLE hHeap, __in DWORD dwFlags, __in SIZE_T dwBytes))g_myDetour_HeapAlloc.m_pReturnedFunc)(hHeap,dwFlags,dwBytes);
::OutputDebugStringA("you are trapped");
return (void*)pRet;
}
Notice that calling convention and parameters are the same, (please refer to my previous article on APIHooking).
Maintain the JMP
Now that we are not restoring the original function, we must make sure that the instructions (Opcode) are executed in the correct sequence and more importantly, we do not jump in between OPCodes, the processor might interpret them to be some other opcode. Notice the variable m_Return
, it holds the opcode of the original function (at least the first 5 bytes which holds the first 3 complete instructions) and then adds a jump to HeapAlloc+5 bytes, we add these 5 bytes so that it does not end up re-executing the JMP added in the first 5 bytes of the original function.
The attached code is hardcoded for assuming HeapAlloc +5 Byte offset, (which is okay for HeapAlloc , HeapFree, MessageBoxA , but may not be for other APIs).
Your code will call HeapAlloc
, in turn MyHeapAlloc
(the trap) via an immediate jmp, then use m_Return to jump to HeapAlloc+5 bytes, and return accordingly. m_Return will execute the first 5 bytes before it jumps to HeapAlloc+5.
Now to understand this using Opcodes, I have trapped MessageBoxA.
without Detours
<a href="mailto:MessageBoxA@16">MessageBoxA@16
7526FD1E 8B FF mov edi,edi
7526FD20 55 push ebp
7526FD21 8B EC mov ebp,esp
7526FD23 6A 00 push 0
7526FD25 FF 75 14 push dword ptr [ebp+14h]
7526FD28 FF 75 10 push dword ptr [ebp+10h]
7526FD2B FF 75 0C push dword ptr [ebp+0Ch]
7526FD2E FF 75 08 push dword ptr [ebp+8]
7526FD31 E8 A0 FF FF FF call MessageBoxExA
With Detours
7526FD1E E9 A5 13 10 8B jmp MyMessageBoxA (03710C8h)
7526FD23 6A 00 push 0
7526FD25 FF 75 14 push dword ptr [ebp+14h]
7526FD28 FF 75 10 push dword ptr [ebp+10h]
7526FD2B FF 75 0C push dword ptr [ebp+0Ch]
7526FD2E FF 75 08 push dword ptr [ebp+8]
7526FD31 E8 A0 FF FF FF call <a href="mailto:MessageBoxExA@20">MessageBoxExA@20</a>
Notice the Jmp instruction in the first 5 bytes of the original function MessageBoxA
, m_Return
will have to execute these 5 bytes (3 instructions when called) before it jumps back to MessageBoxA+5.
the execution flow will be:
00379171 8B FF mov edi,edi
00379173 55 push ebp
00379174 8B EC mov ebp,esp
back to MessageBoxA
7526FD23 6A 00 push 0
7526FD25 FF 75 14 push dword ptr [ebp+14h]
7526FD28 FF 75 10 push dword ptr [ebp+10h]
7526FD2B FF 75 0C push dword ptr [ebp+0Ch]
7526FD2E FF 75 08 push dword ptr [ebp+8]
7526FD31 E8 A0 FF FF FF call <a href="mailto:MessageBoxExA@20">MessageBoxExA@20
As you can see that this completes the flow exactly the way it would without Detours .
Do not print anything
You cannot use any function that will call HeapAlloc, like printf (since it will be caught in an infinite loop) I have chosen to use OutputDebugStringA
.
Since you are writing a debugger (in attached code, CrashAnalyzer_v2
serves as the debugger) I recommend you go through my previous article on writing a debugger.
You can use the debug symbol table (http://en.wikipedia.org/wiki/Debug_symbol) to determine the line in code and call stack (a list of function names in order that determine the flow of function calls) from where the allocation function was called ( as a proper memory analyzer should).
New function added to get call stack: GetStack
, this function calls StackWalk64
(look up MSDN) in a loop to get the call stack function (addresses only). We can convert these addresses (one at a time) to function name using PDB (debug symbol) file by calling SymFromAddr
, make sure you call SymInitialize
to initialize the symbol handler.
BOOL b=SymInitialize(hProcess,NULL,TRUE);
SymFromAddr(hProcess,cc.Rip,&d64,s);
SymCleanup (hProcess);
All this is fine, but how do we inject our code?
Unlike my previous article on API hooking, we cannot use Windows Hook, our application may not have a message loop so SetWindowsHookEx
will not work.
We must use CreateRemoteThread
{
String strPath;
GetCurrentDirectoryA(sizeof(strPath),strPath.string);
strcat(strPath,"<a href="file:
HANDLE hProcess=::OpenProcess(PROCESS_ALL_ACCESS ,false,pid);
if(hProcess)
{
void *p=VirtualAllocEx (hProcess,NULL,strlen(strPath.string)+10,MEM_COMMIT,PAGE_EXECUTE_READWRITE);
SIZE_T size=0;
WriteProcessMemory(hProcess,p,strPath.string,sizeof(strPath)+1,&size);
CreateRemoteThread(hProcess,0,0,( LPTHREAD_START_ROUTINE)LoadLibraryA,p,0,0);
CloseHandle(hProcess);
}
else
{
printf("PID invalid / access denied");
return 1;
}
}
We must create memory in the remote process by calling VirtualAllocEx
, load it up with the required arguments to be used by LoadLibraryA by calling WriteProcessMemory
(this API is used by all debuggers to change the object code to add a break point).
We then call CreateRemoteThread
and provide function address of LoadLibraryA
to call on separate thread.
LoadLibraryA
function exists in all processes since its associated module is loaded at process startup.
DLL process attach in DllMain will do the rest , refer attached code.
Our custom 64-bit detour will be a bit different and is not implemented in the current project
Code below will help you understand the 64-bit detour, it is also implemented in the attached code (MemoryCheckerModule_with32_&_64Bit.zip).
The below code was written sometime back and is not thoroughly tested but feel free to get in touch with me for any problems.
64-bit code uses registers to pass parameters (so its best not to tamper with them).
Here we DO NOT use the JMP instruction, we use the stack to pass the address and RET
#include<Windows.h>
#include<process.h>
BYTE TrapJmp[] = {0x68,0x78,0x56,0x34,0x12,0xc7,0x44,0x24,0x04,0x54,0x63,0x72,0x81,0xc3};
BYTE TrapJmp_FromTrampoline[] = {0x68,0x78,0x56,0x34,0x12,0xc7,0x44,0x24,0x04,0x54,0x63,0x72,0x81,0xc3};
BYTE StoreOriginal[sizeof(TrapJmp)];
BYTE OriginalOpCode[sizeof(TrapJmp)*2];
typedef int (WINAPI *pMessageBox)(HWND, LPCWSTR, LPCWSTR, UINT);
pMessageBox pOriginal = NULL;
int
WINAPI
myMessageBoxW(
__in_opt HWND hWnd,
__in_opt LPCWSTR lpText,
__in_opt LPCWSTR lpCaption,
__in UINT uType)
{
printf("you are trapped\n");
pMessageBox org=(pMessageBox)(void*)OriginalOpCode;
org(hWnd,L"asif",L"asiasdas",uType);
return 0;
}
int _tmain(int argc, _TCHAR* argv[])
{
MessageBoxW((HWND)123,L"asif",L"asif",789);
{
pOriginal = MessageBoxW;
DWORD dPermission=0;VirtualProtect(TrapJmp,sizeof(TrapJmp),PAGE_EXECUTE_READWRITE,&dPermission);
VirtualProtect(pOriginal,sizeof(TrapJmp),PAGE_EXECUTE_READWRITE,&dPermission);
memcpy(StoreOriginal,pOriginal,sizeof(TrapJmp));
DWORD64 JmpDiff =(DWORD64)myMessageBoxW;
memcpy(&TrapJmp[1],&JmpDiff,sizeof(DWORD));
memcpy(&TrapJmp[9],(char*)&JmpDiff+4,sizeof(DWORD));
memcpy(pOriginal,TrapJmp,sizeof(TrapJmp));
FlushInstructionCache(GetCurrentProcess(),pOriginal,sizeof(TrapJmp));
memcpy(OriginalOpCode,StoreOriginal,sizeof(StoreOriginal));
VirtualProtect(OriginalOpCode,sizeof(OriginalOpCode),PAGE_EXECUTE_READWRITE,&dPermission);
DWORD64 JmpDiff_trampoline =(DWORD64)pOriginal;
JmpDiff_trampoline+=sizeof(TrapJmp);
memcpy(&TrapJmp_FromTrampoline[1],&JmpDiff_trampoline,sizeof(DWORD));
memcpy(&TrapJmp_FromTrampoline[9],(char*)&JmpDiff_trampoline+4,sizeof(DWORD));
memcpy(&OriginalOpCode[sizeof(TrapJmp_FromTrampoline)],TrapJmp_FromTrampoline,sizeof(TrapJmp_FromTrampoline));
memset(&OriginalOpCode[7],0x90,7);
}
MessageBoxW(0,L"asif",L"asif",0);
MessageBoxW(0,L"asif",L"asif",0);
MessageBoxW(0,L"asif",L"asif",0);
return 0;
}
If you do use our 32/64-bit detour implementation please ensure that the EIP does not jump to an inbetween opcode as the processor will interpret it differently
eg:-
sub rsp,38h
opcode is
48 83 EC 38 at address 00007FF73378A180
Jumping to an adress 00007FF73378A181 will result in opcode
83 EC 38
Points of Interest
Apart from memory analysis, detour allows user to log API calls for logging or to reverse engineer applications (the Hacker in me says he can).
You may use CrashAnalyzer_v2
to attach to any running process (in our case Debuggee_TestApp
) to detect memory leaks and study memory allocations by other APIs.