Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / ASM

Memory Analyzer x86, 32/64-bit & a Free Detour

4.79/5 (25 votes)
29 Aug 2014CPOL6 min read 65.9K   1.3K  
Detect memory leaks

 

Introduction

This article introduces the readers to detours via an example: Memory Analyzer, memory analyzer is a simple tool to detect memory leaks and incorrect memory de allocations (i.e. free called in place of delete etc.).

Detours are a step up from API hooking, they support trapping of recursive functions and are thread safe. They are more complicated since 2 JMP instructions have to be placed intricately and introduce a concept of a return function, this function is returned by the method MyAttach (refer to code) and is also maintained by member variable : m_Return (this may be referred to as a trampoline function).

(I call them detours since I am trying to do the same thing as MS detours).

As with my previous article writing method, it aims to introduce detours and not memchecker.

Also an OutputDebugString is used to communicate between the process and the memchecker (we build a debugger), debug symbol files are used to show call stack and line number in code regarding memory operations .

Background

My previous articles are a must read: http://www.codeproject.com/Articles/163408/APIHooking , http://www.codeproject.com/Articles/189711/Write-your-own-Debugger-to-handle-Breakpoints .

Basic windows programming is also required and a tad bit of assembly (as mentioned in my previous article APIHooking) .

Using the code

The attached code is built using VS2012 and must be referred to at all times while reading this article. Attached code only supports 32-bit detours, and injects DLL into running code, 32-bit DLL can only be injected in 32-bit process.

Since we are going to analyze the memory, we must trap the following APIs:

  • HeapAlloc
  • HeapReAlloc (not implemented)
  • HeapFree
  • VirtualAlloc and associated APIs are not being considered, the readers are free to post their implementations.

These APIs are chosen since malloc, new, free, delete etc make calls to one of the above mentioned APIs.

Lets Hook

Most of the code is similar to API hooking i.e. we add a 5 byte JMP in the original function to direct the call to the trap. Additional return (m_Return) function must be built since we are not going to re-patch the function when calling the original function from our trap API as we did in API hooking. Notice (refer code below) that I add the Jmp instructions on and after m_Return[5], the bytes from m_Return[0] to m_Return[4] hold the original instructions (3 complete Opcodes).

memcpy(m_Return,m_Original,sizeof(m_Original)); //build the return function
DWORD JmpDiff1 = ((DWORD)func - (DWORD)m_Return-5);
memcpy(&TrapJmp[1], &JmpDiff1, 4);
memcpy(&m_Return[5],TrapJmp,sizeof(TrapJmp));

m_pReturnedFunc=m_Return;

We will call m_Return instead of re-patching as we did in API hooking (refer code below).

 LPVOID WINAPI MyHeapAlloc(
  __in  HANDLE hHeap,
  __in  DWORD dwFlags,
  __in  SIZE_T dwBytes
)
{
 //striclty do not use any function that may call internally HeapAlloc
 void* pRet=((void* (__stdcall*)(  __in  HANDLE hHeap, __in  DWORD dwFlags, __in  SIZE_T dwBytes))g_myDetour_HeapAlloc.m_pReturnedFunc)(hHeap,dwFlags,dwBytes);
 ::OutputDebugStringA("you are trapped");
 return (void*)pRet;
}

Notice that calling convention and parameters are the same, (please refer to my previous article on APIHooking).

Maintain the JMP

Now that we are not restoring the original function, we must make sure that the instructions (Opcode) are executed in the correct sequence and more importantly, we do not jump in between OPCodes, the processor might interpret them to be some other opcode. Notice the variable m_Return, it holds the opcode of the original function (at least the first 5 bytes which holds the first 3 complete instructions) and then adds a jump to HeapAlloc+5 bytes, we add these 5 bytes so that it does not end up re-executing the JMP added in the first 5 bytes of the original function.

The attached code is hardcoded for assuming HeapAlloc +5 Byte offset, (which is okay for HeapAlloc , HeapFree, MessageBoxA , but may not be for other APIs).

Your code will call HeapAlloc, in turn MyHeapAlloc (the trap) via an immediate jmp, then use m_Return to jump to HeapAlloc+5 bytes, and return accordingly. m_Return will execute the first 5 bytes before it jumps to HeapAlloc+5.

Now to understand this using Opcodes, I have trapped MessageBoxA.

without Detours

<a href="mailto:MessageBoxA@16">MessageBoxA@16
7526FD1E 8B FF                mov         edi,edi  
7526FD20 55                   push        ebp  
7526FD21 8B EC                mov         ebp,esp  
7526FD23 6A 00                push        0  
7526FD25 FF 75 14             push        dword ptr [ebp+14h]  
7526FD28 FF 75 10             push        dword ptr [ebp+10h]  
7526FD2B FF 75 0C             push        dword ptr [ebp+0Ch]  
7526FD2E FF 75 08             push        dword ptr [ebp+8]  
7526FD31 E8 A0 FF FF FF       call        MessageBoxExA

With Detours

7526FD1E E9 A5 13 10 8B       jmp         MyMessageBoxA (03710C8h)  
7526FD23 6A 00                push        0  
7526FD25 FF 75 14             push        dword ptr [ebp+14h]  
7526FD28 FF 75 10             push        dword ptr [ebp+10h]  
7526FD2B FF 75 0C             push        dword ptr [ebp+0Ch]  
7526FD2E FF 75 08             push        dword ptr [ebp+8]  
7526FD31 E8 A0 FF FF FF       call        <a href="mailto:MessageBoxExA@20">MessageBoxExA@20</a>

Notice the Jmp instruction in the first 5 bytes of the original function MessageBoxA, m_Return will have to execute these 5 bytes (3 instructions when called) before it jumps back to MessageBoxA+5.

the execution flow will be:

00379171 8B FF                mov         edi,edi  
00379173 55                   push        ebp  
00379174 8B EC                mov         ebp,esp  
back to MessageBoxA
7526FD23 6A 00                push        0  
7526FD25 FF 75 14             push        dword ptr [ebp+14h]  
7526FD28 FF 75 10             push        dword ptr [ebp+10h]  
7526FD2B FF 75 0C             push        dword ptr [ebp+0Ch]  
7526FD2E FF 75 08             push        dword ptr [ebp+8]  
7526FD31 E8 A0 FF FF FF       call        <a href="mailto:MessageBoxExA@20">MessageBoxExA@20

As you can see that this completes the flow exactly the way it would without Detours .

Do not print anything

You cannot use any function that will call HeapAlloc, like printf (since it will be caught in an infinite loop) I have chosen to use OutputDebugStringA.

Since you are writing a debugger (in attached code, CrashAnalyzer_v2 serves as the debugger) I recommend you go through my previous article on writing a debugger.

You can use the debug symbol table (http://en.wikipedia.org/wiki/Debug_symbol) to determine the line in code and call stack (a list of function names in order that determine the flow of function calls) from where the allocation function was called ( as a proper memory analyzer should).

New function added to get call stack: GetStack, this function calls StackWalk64 (look up MSDN) in a loop to get the call stack function (addresses only). We can convert these addresses (one at a time) to function name using PDB (debug symbol) file by calling SymFromAddr, make sure you call SymInitialize to initialize the symbol handler.

BOOL b=SymInitialize(hProcess,NULL,TRUE);  
 
// make sure that PDB file exists, it is generated by the compiler in the same folder as the exe

SymFromAddr(hProcess,cc.Rip,&d64,s);
SymCleanup (hProcess);

All this is fine, but how do we inject our code?

Unlike my previous article on API hooking, we cannot use Windows Hook, our application may not have a message loop so SetWindowsHookEx will not work.

We must use CreateRemoteThread

//injecting DLL
{
 String strPath;
 GetCurrentDirectoryA(sizeof(strPath),strPath.string);  //get the directory
 strcat(strPath,"<a href="file://\\MemoryCheckerModule.dll">\\MemoryCheckerModule.dll");
 HANDLE hProcess=::OpenProcess(PROCESS_ALL_ACCESS ,false,pid);
 if(hProcess)
 {
  void *p=VirtualAllocEx (hProcess,NULL,strlen(strPath.string)+10,MEM_COMMIT,PAGE_EXECUTE_READWRITE);
  SIZE_T size=0;
  WriteProcessMemory(hProcess,p,strPath.string,sizeof(strPath)+1,&size);
  CreateRemoteThread(hProcess,0,0,( LPTHREAD_START_ROUTINE)LoadLibraryA,p,0,0);
  CloseHandle(hProcess);
 }
 else
 {
  printf("PID invalid / access  denied");
  return 1; //cannot find process
 }
}

We must create memory in the remote process by calling VirtualAllocEx, load it up with the required arguments to be used by LoadLibraryA by calling WriteProcessMemory (this API is used by all debuggers to change the object code to add a break point).

We then call CreateRemoteThread and provide function address of LoadLibraryA to call on separate thread.

LoadLibraryA function exists in all processes since its associated module is loaded at process startup.

DLL process attach in DllMain will do the rest , refer attached code.

 

Our custom 64-bit detour will be a bit different and is not implemented in the current project

Code below will help you understand the 64-bit detour, it is also implemented in the attached code (MemoryCheckerModule_with32_&_64Bit.zip).
The below code was written sometime back and is not thoroughly tested but feel free to get in touch with me for any problems.
64-bit code uses registers to pass parameters (so its best not to tamper with them).
Here we DO NOT use the JMP instruction, we use the stack to pass the address and RET

#include<Windows.h>
#include<process.h>

//68 78 56 34 12 c7 44 24 04 54 63 72 81 c3
// PUSH 12345678h
// mov [rsp+4],81726354
// ret //will cause the jump

BYTE TrapJmp[] = {0x68,0x78,0x56,0x34,0x12,0xc7,0x44,0x24,0x04,0x54,0x63,0x72,0x81,0xc3};
BYTE TrapJmp_FromTrampoline[] = {0x68,0x78,0x56,0x34,0x12,0xc7,0x44,0x24,0x04,0x54,0x63,0x72,0x81,0xc3};

BYTE StoreOriginal[sizeof(TrapJmp)];
BYTE OriginalOpCode[sizeof(TrapJmp)*2]; //this will maintain the trampoline of the missing opcode;

typedef int (WINAPI *pMessageBox)(HWND, LPCWSTR, LPCWSTR, UINT);

pMessageBox pOriginal = NULL;

int //this function is only used for testing the hook concept
WINAPI
myMessageBoxW(
__in_opt HWND hWnd,
__in_opt LPCWSTR lpText,
__in_opt LPCWSTR lpCaption,
__in UINT uType)
{
printf("you are trapped\n");

//retore it (for API hooking, not thread safe/recursive safe)
/*memcpy(pOriginal,StoreOriginal,sizeof(StoreOriginal));
FlushInstructionCache(GetCurrentProcess(),pOriginal,sizeof(TrapJmp));
MessageBoxW(hWnd,lpText,lpCaption,uType);
memcpy(pOriginal, TrapJmp, sizeof(TrapJmp)); //repatch
FlushInstructionCache(GetCurrentProcess(),pOriginal,sizeof(TrapJmp));
*/

//lets call the originalOpCode...this is more like Detours excepts its free :-))
pMessageBox org=(pMessageBox)(void*)OriginalOpCode;
org(hWnd,L"asif",L"asiasdas",uType);

return 0;
}

int _tmain(int argc, _TCHAR* argv[])
{
MessageBoxW((HWND)123,L"asif",L"asif",789);

{
pOriginal = MessageBoxW;
DWORD dPermission=0;VirtualProtect(TrapJmp,sizeof(TrapJmp),PAGE_EXECUTE_READWRITE,&dPermission);
VirtualProtect(pOriginal,sizeof(TrapJmp),PAGE_EXECUTE_READWRITE,&dPermission);
memcpy(StoreOriginal,pOriginal,sizeof(TrapJmp)); //copy the original to be called later

DWORD64 JmpDiff =(DWORD64)myMessageBoxW;
memcpy(&TrapJmp[1],&JmpDiff,sizeof(DWORD)); //put the first half of the 8 byte address to jump to
memcpy(&TrapJmp[9],(char*)&JmpDiff+4,sizeof(DWORD)); //put the remaing 4 byte address

memcpy(pOriginal,TrapJmp,sizeof(TrapJmp)); //set the hook
FlushInstructionCache(GetCurrentProcess(),pOriginal,sizeof(TrapJmp));

//lets set up the trampoline to the original function
memcpy(OriginalOpCode,StoreOriginal,sizeof(StoreOriginal));
VirtualProtect(OriginalOpCode,sizeof(OriginalOpCode),PAGE_EXECUTE_READWRITE,&dPermission);

DWORD64 JmpDiff_trampoline =(DWORD64)pOriginal;
JmpDiff_trampoline+=sizeof(TrapJmp);
memcpy(&TrapJmp_FromTrampoline[1],&JmpDiff_trampoline,sizeof(DWORD)); //put the first half of the 8 byte address to jump to
memcpy(&TrapJmp_FromTrampoline[9],(char*)&JmpDiff_trampoline+4,sizeof(DWORD)); //put the remaing 4 byte address

memcpy(&OriginalOpCode[sizeof(TrapJmp_FromTrampoline)],TrapJmp_FromTrampoline,sizeof(TrapJmp_FromTrampoline));

/*
Change
00007FF73378A180 48 83 EC 38 sub rsp,38h
00007FF73378A184 45 33 DB xor r11d,r11d
00007FF73378A187 44 39 1D 36 5B 01 00 cmp dword ptr [7FF73379FCC4h],r11d //we change certain instructions to NOP (this is specific to some APIs only)

To
00007FF73378A180 48 83 EC 38 sub rsp,38h
00007FF73378A184 45 33 DB xor r11d,r11d
00007FF73378A187 44 39 1D 36 5B 01 00 NOP
NOP
NOP
NOP
NOP
NOP
NOP
*/

memset(&OriginalOpCode[7],0x90,7);
}

MessageBoxW(0,L"asif",L"asif",0);
MessageBoxW(0,L"asif",L"asif",0);
MessageBoxW(0,L"asif",L"asif",0);

return 0;
}

If you do use our 32/64-bit detour implementation please ensure that the EIP does not jump to an inbetween opcode as the processor will interpret it differently

eg:-
sub         rsp,38h  
opcode is
48 83 EC 38 at address 00007FF73378A180

Jumping to an adress 00007FF73378A181 will result in opcode
83 EC 38 //this will be interpreted to be another instruction

 

Points of Interest

Apart from memory analysis, detour allows user to log API calls for logging or to reverse engineer applications (the Hacker in me says he can).

You may use CrashAnalyzer_v2 to attach to any running process (in our case Debuggee_TestApp) to detect memory leaks and study memory allocations by other APIs.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)