PolyHook - The C++11 x86/x64 Hooking Library

stevemk14ebr

4.98/5 (35 votes)

17 May 2016MIT9 min read

76.3K

A modern, universal, c++ hooking library.

https://github.com/stevemk14ebr/PolyHook

Background

It's often useful to modify the behavior of an application at runtime when access to the source code is not available. To do this people have traditionally relied on libraries such as Microsoft Detours, Minhook, and a few others. Each of these libraries has significant drawbacks however. Detours is x86 only unless a 'Professional' liscense is used, but that costs USD $10000 and even then the pro version maps a dll into a new code section making the application very bloated. Minhook is pretty good but i relies on pre crafted trampoline routines, sometimes fails to hook, and the source code is again bloated. To me there was only one real solution, write my own library, on my own terms, with the goal of being the smallest, cleanest, easiest hooking library in existance!

Features

PolyHook exposes 6 seperate ways to hook a function (all of them are x86/x64 compatible). Every method exposed has the same interface, Setup(), Hook(), and Unhook() methods. I'll describe what each hooking method does, how it works, when to use it, and provide a code example in the following sections. It *should* be thread-safe, although i may have missed something.

1) Standard Detour

The is the pseudo-standard way to hook a function, this is what both microsoft detours, and minhook implement. It works by writing a JMP assembly instruction to the prologue of a function that redirects the code flow to a custom handler. In x86 mode the instruction used is:

C++

0x00000000 0xE9DEADBEEF JMP 0xDEADBEF4 (EIP+DEADBEEF)

This is a 32 bit relative instruction, meaning where it goes to jumps to is dependant on where the instruction itself is located in memory. In my example the instruction is located at the location 0x00000000, the offset is 0xDEADBEEF, and then you include the size of the instruction with is 5 bytes (E9 +DE+AD+BE+EF) to calculate its final location which is 0xDEADBEF4.

In x64 it's a bit more complicated because there is no single instruction that can jump the entire x64 address range. So instead I use two different assembly snippets, and choose which to use based on the size of the prologue:

C++

0xFF25DEADBEEF JMP [DEADBEF4] ([RIP+DEADBEEF]) //6 bytes total

or when the prologue is greater than 6 bytes in size:

push rax
mov rax, 0xDEADBEEFDEADBEEF
xchg qword ptr ss:[rsp], rax
ret

The first snippet is special because it actually jumps to the location pointed TO by (RIP+DEADBEEF), it DOES NOT go to (RIP+DEADBEEF). In my implementation i write this jump to point to the end of a trampoline that i allocate within +-2GB of the prologue, then at this location i write the memory location of the handler which is where the jump actually goes to. You might be wondering why the trampoline is allocated within +-2GB and that's because the instruction can only encode an offset up to 32bits in size, the instruction is 6 bytes in size, -2 bytes for the 0xFF, 0x25, leaves us with 4 bytes to write in a displacement value.

The second jump type is preffered as the trampoline can be allocated anywhere in the entire x64 address range. It works by saving the value of the rax register on the stack, moving a full x64 adress into rax, then switching the stack to hold the value of rax, and restoring rax to the original saved on the stack, then ret-ing which just jmps to the first value on the stack, effectively doing a jump!

There are alot of other VERY important nitty gritty details to correctly implementing detours but i'll explain those later in the section title "Detours Tricky Bits" for breivety. The rest of the logic is very simple, the jump we wrote takes us to our handler and we do what we want in our handler, then we return back to the original function by first executing what's know as a trampoline which simply executes the bytes we overwrote with the jump we wrote in earlier, then the trampoline jmps back to the memory location directly after the jmp we placed in the prologue. Complete code sample:

C++

typedef int(__stdcall* tMessageBoxA)(HWND hWnd,LPCSTR lpText,LPCSTR lpCaption,UINT uType);
tMessageBoxA oMessageBoxA;

int __stdcall hkMessageBoxA(HWND hWnd, LPCSTR lpText, LPCSTR lpCaption, UINT uType)
{
   //You can modify any parameters you want here
   return oMessageBoxA(hWnd,"Hooked", lpCaption, uType);
}

void main()
{
    std::shared_ptr<PLH::Detour> Detour_Ex(new PLH::Detour);
    Detour_Ex->SetupHook((BYTE*)&MessageBoxA,(BYTE*) &hkMessageBoxA);
    Detour_Ex->Hook();
    oMessageBoxA = Detour_Ex->GetOriginal<tMessageBoxA>(); 
}

2) Virtual Function Detour

In c++ virtual methods are placed into a table pointed to by a pointer at 0x00 of the that class in memory.

C++

//In source code you would have this class
class AClass
{
public:
   virtual void Method1();
   virtual void Method2();
   int m_ExampleMemberOne;
   int m_ExampleMemberTwo
}

//In memory it gets laid out like this
class AClass
{
public:
   VTABLE* m_VirtualFunctions
   int m_ExampleMemberOne;
   int m_ExampleMemberTwo;
}

class VTABLE
{
public:
   void* m_FunctionPointer //First virtual method
   void* m_FunctionPointer //Second virtual method
   //etc
}

This hooking method dereferences the function pointer in the Vtable, then hooks the function it points to with the detour above.

C++

class VirtualTest
{
public:
    virtual int NoParamVirt()
    {
        return 4;
    }
    virtual int NoParamVirt2()
    {
        return 1;
    }
};

typedef int(__thiscall* tVirtNoParams)(DWORD_PTR pThis);
tVirtNoParams oVirtNoParams;

int __fastcall hkVirtNoParams(DWORD_PTR pThis)
{
    return oVirtNoParams(pThis);
}

void main()
{
    std::shared_ptr<VirtualTest> ClassToHook(new VirtualTest);
    std::shared_ptr<PLH::VFuncDetour> VFuncDetour_Ex(new PLH::VFuncDetour);

    VFuncDetour_Ex->SetupHook(*(BYTE***)ClassToHook.get(), 0, (BYTE*)&hkVirtNoParams);
    VFuncDetour_Ex->Hook();
    oVirtNoParams = VFuncDetour_Ex->GetOriginal<tVirtNoParams>();
}

Note the usage of DWORD_PTR pThis. All virtual methods have a hidden "this" parameter which is a pointer to the class, the typedef must take this into account.

3) Virtual Table Pointer Swap

This method swaps the actual VTABLE* to a deep copy we create of the original. We then swap one of the function pointers in the copy to point to our hook handler. This is a very stealthy hook method.

C++

//Original
class AClass
{
public:  
    VTABLE* m_VirtualFunctions
    int m_ExampleMemberOne;  
    int m_ExampleMemberTwo;
}

class VTABLE
{
public:
    void* m_FunctionPointer
    void* m_FunctionPointer
}

//Our hook changes it to
class AClass
{
public:
  VTABLECOPY* m_VirtualFunctions //This pointer value is changed
  int m_ExampleMemberOne;
  int m_ExampleMemberTwo;
}

class VTABLECOPY
{
public:
    void* m_FunctionPointer //Change the pointer value to our handler
    void* m_FunctionPointer //Unchanged pointer is copied from original VTABLE
}

Complete code:

C++

void main()
{
    std::shared_ptr<VirtualTest> ClassToHook(new VirtualTest);
    std::shared_ptr<PLH::VTableSwap> VTableSwap_Ex(new PLH::VTableSwap); 
    VTableSwap_Ex->SetupHook((BYTE*)ClassToHook.get(), 0, (BYTE*)&hkVirtNoParams); 
    VTableSwap_Ex->Hook(); oVirtNoParams = VTableSwap_Ex->GetOriginal<tVirtNoParams>(); 

    //Once Hook() is called, you can optionally hook aditional virtual functions in the swapped vtable
    oVirtNoParams2 = VTableSwap_Ex->HookAdditional<tVirtNoParams>(1, (BYTE*)&hkVirtNoParams2);
}

4) Virtual Function Pointer swap

This is a simplification of the above. Instead of copying the vtable we instead just change the value of the virtual function inside the original vtable to point to our handler. This method is easier to detect than a VTABLE swap but it's also much simpler.

C++

//Original 
class AClass 
{ 
public: 
    VTABLE* m_VirtualFunctions 
    int m_ExampleMemberOne;  
    int m_ExampleMemberTwo; 
} 

class VTABLE 
{ 
public: 
    void* m_FunctionPointer //Hook changes this value
    void* m_FunctionPointer 
}

Full code:

C++

std::shared_ptr<PLH::VFuncSwap> VFuncSwap_Ex(new PLH::VFuncSwap);
VFuncSwap_Ex->SetupHook(*(BYTE***)ClassToHook.get(), 0, (BYTE*)&hkVirtNoParams);
VFuncSwap_Ex->Hook();
oVirtNoParams = VFuncSwap_Ex->GetOriginal<tVirtNoParams>();

5) Import Address Table Hook

When any API is called in a C or C++ program on windows the location of that API is placed into a table called the IMPORT_ADDRESS_TABLE. At compile time this is simply a table of API names, at runtime the windows loader finds the memory location of the API's and writes them into an identical table next to the name table, all further calls to any API is first looked up in this table and then the function pointer in the table is called. This hook method swaps the pointer value in this table to point to our own handler so that when the target calls the API our handler is instead called. The IAT is advanced and better writeups than my own exist: http://sandsprite.com/CodeStuff/Understanding_imports.html

Full code example:

C++

typedef DWORD(__stdcall* tGetCurrentThreadId)();
tGetCurrentThreadId oGetCurrentThreadID;

DWORD __stdcall hkGetCurrentThreadId()
{
    return oGetCurrentThreadID();
}

void main()
{
    std::shared_ptr<PLH::IATHook> IATHook_Ex(new PLH::IATHook);
    IATHook_Ex->SetupHook("kernel32.dll", "GetCurrentThreadId", (BYTE*)&hkGetCurrentThreadId);
    IATHook_Ex->Hook();
    oGetCurrentThreadID = IATHook_Ex->GetOriginal<tGetCurrentThreadId>();
}

If you want to hook an API located in a dll that is linked to your target application you can write this instead

C++

IATHook_Ex->SetupHook("kernel32.dll", "GetCurrentThreadId", (BYTE*)&hkGetCurrentThreadId,"Dependancy.dll");

This would hook GetCurrentThread for the dll named "Dependancy".

6) Vectored Exception Handler Hooks

The final hooking method is a neat one, and one of the stealthiest. By generating an exception we can trap into an exception handler. Inside of that exception handler we can then change the value of RIP/EIP (the instruction ponter) to the location of our hook handler. Once we remove our exception generating method and return EXCEPTION_CONTINUE_EXECUTION from the exception handler our hook handler will begin executing and we have effectively performed a jump to our handler. The exception generating methods can be either a hardware breakpoint, software breakpoint, or guard page

There is an interesting quirk however. In order to execute our original function we have to remove the exception generating mechanism to avoid calling our exception handler again. This leaves us stuck in figuring out how to restore the exception generating mechanism after we are done executing the original, as we want to be able to intercept the function more than once!

The solution is simple is it turns out, C++ destructors! Since destructores of an object are guaranteed to be executed AFTER we leave the scope of an object we can leverage them to have the compiler automatically place a stub that restores the protection after we are done executing the original in our handler! This code example from stackoverflow will execute a std::function on object destruction:

C++

template<typename Func>
class FinalAction {
public:
    FinalAction(Func f) :FinalActionFunc(std::move(f)) {}
    ~FinalAction()
    {
        FinalActionFunc();
    }
private:
    Func FinalActionFunc;

    /*Uses RAII to call a final function on destruction
    C++ 11 version of java's finally (kindof)*/
};

template <typename F>
FinalAction<F> finally(F f) {
    return FinalAction<F>(f);
}

Then Internally i call this object like so:

C++

auto GetProtectionObject()
{
    return finally([&]() {
        //Some fancy code to restore the exception generating methods
    });
}

And then when used like so, we get a fully working auto-restoring exception hook:

C++

typedef int(__stdcall* tVEH)(int intparam);
tVEH oVEHTest;
int __stdcall VEHTest(int param)
{
    return 3;
}

std::shared_ptr<PLH::VEHHook> VEHHook_Ex;
int __stdcall hkVEHTest(int param)
{
    //Protection object auto-magically restores protection once our handler exits!
    auto ProtectionObject = VEHHook_Ex->GetProtectionObject();

    return oVEHTest(param);//Original is unprotected so we can call it
    //Compiler places a stub right here that restores protection!
}

void main()
{
    VEHHook_Ex->SetupHook((BYTE*)&VEHTest, (BYTE*)&hkVEHTest, PLH::VEHHook::VEHMethod::INT3_BP);
    VEHHook_Ex->Hook();
    oVEHTest = VEHHook_Ex->GetOriginal<tVEH>();\
}

Detours Tricky Bits

Instruction Splitting:

When writing our jump instruction into the prologue we have to make sure we do not split any instructions. In x86/x64 instructions are always of a fixed size, so the jmp we use is always 5 bytes in size. If we were to for example attempt to hook the following function we would generate an exception (just an example prologue, this would never be found in real compiler generated code):

//Before our hook
0x50   push eax
0xFFD0 call eax
0x51   push ecx
0xFFD1 call ecx

//After our hook
0xE9DEADBEEF JMP 0xDEADBEF4
0xD1         'JUNK

As you can see the 0xD1 byte would be left over, creating junk code which would and most likely will cause undefined behavior down the line, leading to hard to find crashes.

The solution is to use a disassembly engine to make sure we never split instructions. I use capstone for the polyhook project as it's well maintained and very powerful. Internally i measure the size of each assembly instruction i am forced to overwrite and then nop out any extra bytes of that instruction, so we would be left with:

C++

0xE9DEADBEEF JMP 0xDEADBEF4
0x90         NOP

The 0xD1 byte is overwritten with a NOP and we have fixed our problem.

Code relocation:

As mentioned before some instructions are relative to their location in memory. When we write the jump instruction into the function prologue we are actually overwritting the code that was there before. So what we do is we first copy the original code into our trampoline so that it gets executed when it's time to call the original again. But this trampoline is in a different memory location than the prologue and it's possible some instruction could be of the relative types and thus moving them would change their meaning! An real world example of this is the messagebox prologue:

7FFC0EC2E190: 40 53             push rbx
7FFC0EC2E192: 48 83 EC 30       sub rsp, 0x30
7FFC0EC2E196: 33 DB             xor ebx, ebx
7FFC0EC2E198: 39 1D 7A B7 02 00 cmp dword ptr [rip + 0x2b77a], ebx //This is relative
7FFC0EC2E19E: 74 31             je 0x7ffc0ec2e1d1                  //And so is this

The first relocation is really easy to fix, we simply re-calculate the displacement required to get it to point to the original location. It originally points to 0x7FFC0EC2E198+0x2B77A = 0x7FFC0EC59912. The trampoline with the first relocation fixed is:

7FFC0EB60000: 40 53                         push rbx
7FFC0EB60002: 48 83 EC 30                   sub rsp, 0x30
7FFC0EB60006: 33 DB                         xor ebx, ebx
7FFC0EB60008: 39 1D 0A 99 0F 00             cmp dword ptr [rip + 0xf990a], ebx //The encoded displacement is changed
7FFC0EB6000E: 74 31                         je //Some wrong location
7FFC0EB60010: 50                            push rax                           //This is just part of the trampoline, ignore it
7FFC0EB60011: 48 B8 A0 E1 C2 0E FC 7F 00 00 movabs rax, 0x7ffc0ec2e1a0
7FFC0EB6001B: 48 87 04 24                   xchg qword ptr [rsp], rax
7FFC0EB6001F: C3                            ret

0x7FFC0EB60008+0xf990a = 0x7FFC0EC59912, as you can see we changed the instruction address while preserving what it meant!

The second relocation however is hard. The je instruction only has a single byte to encode it's displacement, meaning we can only move it by up to 0xFF or 255 bytes (0x74 means je,then next byte is the displacement). The difference between our function prologue and our trampoline is way bigger than that however 7FFC0EC2E190 - 7FFC0EB60000 = CE190. So we have to instead redirect the je to an absolute x64 jmp (credit to minhook for this). The fixed trampoline then becomes:

C++

7FFC0EB60000: 40 53                         push rbx 
7FFC0EB60002: 48 83 EC 30                   sub rsp, 0x30 
7FFC0EB60006: 33 DB                         xor ebx, ebx 
7FFC0EB60008: 39 1D 0A 99 0F 00             cmp dword ptr [rip + 0xf990a], ebx //The encoded displacement is changed 
7FFC0EB6000E: 74 10                         je 0x7ffc0eb60020
7FFC0EB60010: 50                            push rax                           //This is just part of the trampoline, ignore it
7FFC0EB60011: 48 B8 A0 E1 C2 0E FC 7F 00 00 movabs rax, 0x7ffc0ec2e1a0
7FFC0EB6001B: 48 87 04 24                   xchg qword ptr [rsp], rax
7FFC0EB6001F: C3                            ret
7FFC0EB60020: 50                            push rax                          //JE POINTS HERE
7FFC0EB60021: 48 B8 D1 E1 C2 0E FC 7F 00 00 movabs rax, 0x7ffc0ec2e1d1
7FFC0EB6002B: 48 87 04 24                   xchg qword ptr [rsp], rax
7FFC0EB6002F: C3                            ret

From this we see that when the je path is taken it redirects it to the fancy jmp i showed earlier.

After Thoughts

All of the hooking methods require you create a typedef for the function, if you get a crash it's more likely that you have an incorrect typedef than a bug in the library, please post in the comments your issue and i'll look at it. If you have any suggestions just message me and i'll take a look!

As a bonus you can use decltype to define the typedef for API's and such, an example for messagebox:

C++

decltype(&MessageBoxA) oMessageBoxA;

//Other code

oMessageBoxA = Detour_Ex->GetOriginal<decltype(&MessageBoxA)>();

Future Development

In the not so distant future i will make this library cross platform.

License

This article, along with any associated source code and files, is licensed under The MIT License