https://github.com/stevemk14ebr/PolyHook
Background
It's often useful to modify the behavior of an application at runtime when access to the source code is not available. To do this people have traditionally relied on libraries such as Microsoft Detours, Minhook, and a few others. Each of these libraries has significant drawbacks however. Detours is x86 only unless a 'Professional' liscense is used, but that costs USD $10000 and even then the pro version maps a dll into a new code section making the application very bloated. Minhook is pretty good but i relies on pre crafted trampoline routines, sometimes fails to hook, and the source code is again bloated. To me there was only one real solution, write my own library, on my own terms, with the goal of being the smallest, cleanest, easiest hooking library in existance!
Features
PolyHook exposes 6 seperate ways to hook a function (all of them are x86/x64 compatible). Every method exposed has the same interface, Setup(), Hook(), and Unhook() methods. I'll describe what each hooking method does, how it works, when to use it, and provide a code example in the following sections. It *should* be thread-safe, although i may have missed something.
1) Standard Detour
The is the pseudo-standard way to hook a function, this is what both microsoft detours, and minhook implement. It works by writing a JMP assembly instruction to the prologue of a function that redirects the code flow to a custom handler. In x86 mode the instruction used is:
0x00000000 0xE9DEADBEEF JMP 0xDEADBEF4 (EIP+DEADBEEF)
This is a 32 bit relative instruction, meaning where it goes to jumps to is dependant on where the instruction itself is located in memory. In my example the instruction is located at the location 0x00000000, the offset is 0xDEADBEEF, and then you include the size of the instruction with is 5 bytes (E9 +DE+AD+BE+EF) to calculate its final location which is 0xDEADBEF4.
In x64 it's a bit more complicated because there is no single instruction that can jump the entire x64 address range. So instead I use two different assembly snippets, and choose which to use based on the size of the prologue:
0xFF25DEADBEEF JMP [DEADBEF4] ([RIP+DEADBEEF])
or when the prologue is greater than 6 bytes in size:
push rax
mov rax, 0xDEADBEEFDEADBEEF
xchg qword ptr ss:[rsp], rax
ret
The first snippet is special because it actually jumps to the location pointed TO by (RIP+DEADBEEF), it DOES NOT go to (RIP+DEADBEEF). In my implementation i write this jump to point to the end of a trampoline that i allocate within +-2GB of the prologue, then at this location i write the memory location of the handler which is where the jump actually goes to. You might be wondering why the trampoline is allocated within +-2GB and that's because the instruction can only encode an offset up to 32bits in size, the instruction is 6 bytes in size, -2 bytes for the 0xFF, 0x25, leaves us with 4 bytes to write in a displacement value.
The second jump type is preffered as the trampoline can be allocated anywhere in the entire x64 address range. It works by saving the value of the rax register on the stack, moving a full x64 adress into rax, then switching the stack to hold the value of rax, and restoring rax to the original saved on the stack, then ret-ing which just jmps to the first value on the stack, effectively doing a jump!
There are alot of other VERY important nitty gritty details to correctly implementing detours but i'll explain those later in the section title "Detours Tricky Bits" for breivety. The rest of the logic is very simple, the jump we wrote takes us to our handler and we do what we want in our handler, then we return back to the original function by first executing what's know as a trampoline which simply executes the bytes we overwrote with the jump we wrote in earlier, then the trampoline jmps back to the memory location directly after the jmp we placed in the prologue. Complete code sample:
typedef int(__stdcall* tMessageBoxA)(HWND hWnd,LPCSTR lpText,LPCSTR lpCaption,UINT uType);
tMessageBoxA oMessageBoxA;
int __stdcall hkMessageBoxA(HWND hWnd, LPCSTR lpText, LPCSTR lpCaption, UINT uType)
{
return oMessageBoxA(hWnd,"Hooked", lpCaption, uType);
}
void main()
{
std::shared_ptr<PLH::Detour> Detour_Ex(new PLH::Detour);
Detour_Ex->SetupHook((BYTE*)&MessageBoxA,(BYTE*) &hkMessageBoxA);
Detour_Ex->Hook();
oMessageBoxA = Detour_Ex->GetOriginal<tMessageBoxA>();
}
2) Virtual Function Detour
In c++ virtual methods are placed into a table pointed to by a pointer at 0x00 of the that class in memory.
class AClass
{
public:
virtual void Method1();
virtual void Method2();
int m_ExampleMemberOne;
int m_ExampleMemberTwo
}
class AClass
{
public:
VTABLE* m_VirtualFunctions
int m_ExampleMemberOne;
int m_ExampleMemberTwo;
}
class VTABLE
{
public:
void* m_FunctionPointer void* m_FunctionPointer }
This hooking method dereferences the function pointer in the Vtable, then hooks the function it points to with the detour above.
class VirtualTest
{
public:
virtual int NoParamVirt()
{
return 4;
}
virtual int NoParamVirt2()
{
return 1;
}
};
typedef int(__thiscall* tVirtNoParams)(DWORD_PTR pThis);
tVirtNoParams oVirtNoParams;
int __fastcall hkVirtNoParams(DWORD_PTR pThis)
{
return oVirtNoParams(pThis);
}
void main()
{
std::shared_ptr<VirtualTest> ClassToHook(new VirtualTest);
std::shared_ptr<PLH::VFuncDetour> VFuncDetour_Ex(new PLH::VFuncDetour);
VFuncDetour_Ex->SetupHook(*(BYTE***)ClassToHook.get(), 0, (BYTE*)&hkVirtNoParams);
VFuncDetour_Ex->Hook();
oVirtNoParams = VFuncDetour_Ex->GetOriginal<tVirtNoParams>();
}
Note the usage of DWORD_PTR pThis. All virtual methods have a hidden "this" parameter which is a pointer to the class, the typedef must take this into account.
3) Virtual Table Pointer Swap
This method swaps the actual VTABLE* to a deep copy we create of the original. We then swap one of the function pointers in the copy to point to our hook handler. This is a very stealthy hook method.
class AClass
{
public:
VTABLE* m_VirtualFunctions
int m_ExampleMemberOne;
int m_ExampleMemberTwo;
}
class VTABLE
{
public:
void* m_FunctionPointer
void* m_FunctionPointer
}
class AClass
{
public:
VTABLECOPY* m_VirtualFunctions int m_ExampleMemberOne;
int m_ExampleMemberTwo;
}
class VTABLECOPY
{
public:
void* m_FunctionPointer void* m_FunctionPointer }
Complete code:
void main()
{
std::shared_ptr<VirtualTest> ClassToHook(new VirtualTest);
std::shared_ptr<PLH::VTableSwap> VTableSwap_Ex(new PLH::VTableSwap);
VTableSwap_Ex->SetupHook((BYTE*)ClassToHook.get(), 0, (BYTE*)&hkVirtNoParams);
VTableSwap_Ex->Hook(); oVirtNoParams = VTableSwap_Ex->GetOriginal<tVirtNoParams>();
oVirtNoParams2 = VTableSwap_Ex->HookAdditional<tVirtNoParams>(1, (BYTE*)&hkVirtNoParams2);
}
4) Virtual Function Pointer swap
This is a simplification of the above. Instead of copying the vtable we instead just change the value of the virtual function inside the original vtable to point to our handler. This method is easier to detect than a VTABLE swap but it's also much simpler.
class AClass
{
public:
VTABLE* m_VirtualFunctions
int m_ExampleMemberOne;
int m_ExampleMemberTwo;
}
class VTABLE
{
public:
void* m_FunctionPointer void* m_FunctionPointer
}
Full code:
std::shared_ptr<PLH::VFuncSwap> VFuncSwap_Ex(new PLH::VFuncSwap);
VFuncSwap_Ex->SetupHook(*(BYTE***)ClassToHook.get(), 0, (BYTE*)&hkVirtNoParams);
VFuncSwap_Ex->Hook();
oVirtNoParams = VFuncSwap_Ex->GetOriginal<tVirtNoParams>();
5) Import Address Table Hook
When any API is called in a C or C++ program on windows the location of that API is placed into a table called the IMPORT_ADDRESS_TABLE. At compile time this is simply a table of API names, at runtime the windows loader finds the memory location of the API's and writes them into an identical table next to the name table, all further calls to any API is first looked up in this table and then the function pointer in the table is called. This hook method swaps the pointer value in this table to point to our own handler so that when the target calls the API our handler is instead called. The IAT is advanced and better writeups than my own exist: http://sandsprite.com/CodeStuff/Understanding_imports.html
Full code example:
typedef DWORD(__stdcall* tGetCurrentThreadId)();
tGetCurrentThreadId oGetCurrentThreadID;
DWORD __stdcall hkGetCurrentThreadId()
{
return oGetCurrentThreadID();
}
void main()
{
std::shared_ptr<PLH::IATHook> IATHook_Ex(new PLH::IATHook);
IATHook_Ex->SetupHook("kernel32.dll", "GetCurrentThreadId", (BYTE*)&hkGetCurrentThreadId);
IATHook_Ex->Hook();
oGetCurrentThreadID = IATHook_Ex->GetOriginal<tGetCurrentThreadId>();
}
If you want to hook an API located in a dll that is linked to your target application you can write this instead
IATHook_Ex->SetupHook("kernel32.dll", "GetCurrentThreadId", (BYTE*)&hkGetCurrentThreadId,"Dependancy.dll");
This would hook GetCurrentThread for the dll named "Dependancy".
6) Vectored Exception Handler Hooks
The final hooking method is a neat one, and one of the stealthiest. By generating an exception we can trap into an exception handler. Inside of that exception handler we can then change the value of RIP/EIP (the instruction ponter) to the location of our hook handler. Once we remove our exception generating method and return EXCEPTION_CONTINUE_EXECUTION from the exception handler our hook handler will begin executing and we have effectively performed a jump to our handler. The exception generating methods can be either a hardware breakpoint, software breakpoint, or guard page
There is an interesting quirk however. In order to execute our original function we have to remove the exception generating mechanism to avoid calling our exception handler again. This leaves us stuck in figuring out how to restore the exception generating mechanism after we are done executing the original, as we want to be able to intercept the function more than once!
The solution is simple is it turns out, C++ destructors! Since destructores of an object are guaranteed to be executed AFTER we leave the scope of an object we can leverage them to have the compiler automatically place a stub that restores the protection after we are done executing the original in our handler! This code example from stackoverflow will execute a std::function on object destruction:
template<typename Func>
class FinalAction {
public:
FinalAction(Func f) :FinalActionFunc(std::move(f)) {}
~FinalAction()
{
FinalActionFunc();
}
private:
Func FinalActionFunc;
};
template <typename F>
FinalAction<F> finally(F f) {
return FinalAction<F>(f);
}
Then Internally i call this object like so:
auto GetProtectionObject()
{
return finally([&]() {
});
}
And then when used like so, we get a fully working auto-restoring exception hook:
typedef int(__stdcall* tVEH)(int intparam);
tVEH oVEHTest;
int __stdcall VEHTest(int param)
{
return 3;
}
std::shared_ptr<PLH::VEHHook> VEHHook_Ex;
int __stdcall hkVEHTest(int param)
{
auto ProtectionObject = VEHHook_Ex->GetProtectionObject();
return oVEHTest(param); }
void main()
{
VEHHook_Ex->SetupHook((BYTE*)&VEHTest, (BYTE*)&hkVEHTest, PLH::VEHHook::VEHMethod::INT3_BP);
VEHHook_Ex->Hook();
oVEHTest = VEHHook_Ex->GetOriginal<tVEH>();\
}
Detours Tricky Bits
Instruction Splitting:
When writing our jump instruction into the prologue we have to make sure we do not split any instructions. In x86/x64 instructions are always of a fixed size, so the jmp we use is always 5 bytes in size. If we were to for example attempt to hook the following function we would generate an exception (just an example prologue, this would never be found in real compiler generated code):
0x50 push eax
0xFFD0 call eax
0x51 push ecx
0xFFD1 call ecx
0xE9DEADBEEF JMP 0xDEADBEF4
0xD1 'JUNK
As you can see the 0xD1 byte would be left over, creating junk code which would and most likely will cause undefined behavior down the line, leading to hard to find crashes.
The solution is to use a disassembly engine to make sure we never split instructions. I use capstone for the polyhook project as it's well maintained and very powerful. Internally i measure the size of each assembly instruction i am forced to overwrite and then nop out any extra bytes of that instruction, so we would be left with:
0xE9DEADBEEF JMP 0xDEADBEF4
0x90 NOP
The 0xD1 byte is overwritten with a NOP and we have fixed our problem.
Code relocation:
As mentioned before some instructions are relative to their location in memory. When we write the jump instruction into the function prologue we are actually overwritting the code that was there before. So what we do is we first copy the original code into our trampoline so that it gets executed when it's time to call the original again. But this trampoline is in a different memory location than the prologue and it's possible some instruction could be of the relative types and thus moving them would change their meaning! An real world example of this is the messagebox prologue:
7FFC0EC2E190: 40 53 push rbx
7FFC0EC2E192: 48 83 EC 30 sub rsp, 0x30
7FFC0EC2E196: 33 DB xor ebx, ebx
7FFC0EC2E198: 39 1D 7A B7 02 00 cmp dword ptr [rip + 0x2b77a], ebx
7FFC0EC2E19E: 74 31 je 0x7ffc0ec2e1d1
The first relocation is really easy to fix, we simply re-calculate the displacement required to get it to point to the original location. It originally points to 0x7FFC0EC2E198+0x2B77A = 0x7FFC0EC59912. The trampoline with the first relocation fixed is:
7FFC0EB60000: 40 53 push rbx
7FFC0EB60002: 48 83 EC 30 sub rsp, 0x30
7FFC0EB60006: 33 DB xor ebx, ebx
7FFC0EB60008: 39 1D 0A 99 0F 00 cmp dword ptr [rip + 0xf990a], ebx
7FFC0EB6000E: 74 31 je
7FFC0EB60010: 50 push rax
7FFC0EB60011: 48 B8 A0 E1 C2 0E FC 7F 00 00 movabs rax, 0x7ffc0ec2e1a0
7FFC0EB6001B: 48 87 04 24 xchg qword ptr [rsp], rax
7FFC0EB6001F: C3 ret
0x7FFC0EB60008+0xf990a = 0x7FFC0EC59912, as you can see we changed the instruction address while preserving what it meant!
The second relocation however is hard. The je instruction only has a single byte to encode it's displacement, meaning we can only move it by up to 0xFF or 255 bytes (0x74 means je,then next byte is the displacement). The difference between our function prologue and our trampoline is way bigger than that however 7FFC0EC2E190 - 7FFC0EB60000 = CE190. So we have to instead redirect the je to an absolute x64 jmp (credit to minhook for this). The fixed trampoline then becomes:
7FFC0EB60000: 40 53 push rbx
7FFC0EB60002: 48 83 EC 30 sub rsp, 0x30
7FFC0EB60006: 33 DB xor ebx, ebx
7FFC0EB60008: 39 1D 0A 99 0F 00 cmp dword ptr [rip + 0xf990a], ebx 7FFC0EB6000E: 74 10 je 0x7ffc0eb60020
7FFC0EB60010: 50 push rax 7FFC0EB60011: 48 B8 A0 E1 C2 0E FC 7F 00 00 movabs rax, 0x7ffc0ec2e1a0
7FFC0EB6001B: 48 87 04 24 xchg qword ptr [rsp], rax
7FFC0EB6001F: C3 ret
7FFC0EB60020: 50 push rax 7FFC0EB60021: 48 B8 D1 E1 C2 0E FC 7F 00 00 movabs rax, 0x7ffc0ec2e1d1
7FFC0EB6002B: 48 87 04 24 xchg qword ptr [rsp], rax
7FFC0EB6002F: C3 ret
From this we see that when the je path is taken it redirects it to the fancy jmp i showed earlier.
After Thoughts
All of the hooking methods require you create a typedef for the function, if you get a crash it's more likely that you have an incorrect typedef than a bug in the library, please post in the comments your issue and i'll look at it. If you have any suggestions just message me and i'll take a look!
As a bonus you can use decltype to define the typedef for API's and such, an example for messagebox:
decltype(&MessageBoxA) oMessageBoxA;
oMessageBoxA = Detour_Ex->GetOriginal<decltype(&MessageBoxA)>();
Future Development
In the not so distant future i will make this library cross platform.