(untagged)

API Hooking Revealed Part 2 - Useful tips

xryl669

0.00/5 (No votes)

10 Mar 2005

Some useful tricks and tips before being able to detect a deadlock.

Introduction

This is the second part for building a thread deadlock detector. Please see the first article to understand what is going on: A (working) implementation of API hooking (Part I). The next part (with a working deadlock detector) is also here: Thread Deadlock detector

The API hooked functions for a thread deadlock detector

What should be intercepted for the purpose of detecting a deadlock:

Thread functions (Create[Remote]Thread, [Suspend/Resume]Thread, ExitThread, TerminateThread, OpenThread).
Synchronization functions (WaitFor[Single/Multiple]Object[Ex], SignalObjectAndWait, [Set/Reset/Pulse]Event).
Synchronization objects creation (Create/Open[Mutex/Semaphore/Event], DuplicateHandle).
Synchronization objects deletion (CloseHandle).

To intercept the code, I simply added to the previous code the required functions. Please have a look at the previous article about how to add functions to be hooked. The main idea is to have a hook structure declared like:

C++

typedef struct
{
    char        szDLLName[MAX_PATH]; // The DLL name
    char        szFuncName[MAX_PATH];// The function name
    void *        pNewFunc;          // The new function pointer
    void *        pPrevFunc;         // The previous function pointer
    Flags        flags;              // The flags (hooked or not, etc...)
} HookStruct;

When hooking the function in all modules, the previous (and true) function pointer is saved in pPrevFunc, while the new function pointer (set to our hooking function) replaces the module IAT.

Then in our function, we can simply call the previous function by converting the pPrevFunc pointer to the correct function's pointer signature. I defined a useful macro for that (the source code will be in part III).

C++

#define CallFunction(X)    
((Signature_##X)GetPreviousFunctionAddress(Index_##X))
// with signature defined like this
#define Signature_CreateThread                   HANDLE (FAR PASCAL *)\
(LPSECURITY_ATTRIBUTES lpThreadAttributes, DWORD dwStackSize, \
LPTHREAD_START_ROUTINE lpStartAddress, LPVOID lpParameter, DWORD \
dwCreationFlags, LPDWORD lpThreadId)
// And index like this
#define Index_CreateThread             3

Okay, we can now call the good function, so let's start the real work.

The communication process

So, how do we inform the server that the "hooked" application is currently performing a monitored action? The simple answer is to create a structure to send to the server through a WM_COPYDATA message, with the needed information. The needed information is application-dependant, and in our case, it consists of:

The current thread handle (who is calling this function)..
The current thread ID (the handle is not enough, see below).
The manipulated object ID/handle (what are we touching).
An additional argument (if any).
The current command (what function are we calling).
The object name, if any (useful to debugging only).
The call stack (required to find from where the call came from).
The current timestamp (needed to check for the deadlocks).

The structure is then defined as:

C++

typedef struct 
{
    void *            lAddress;   // The address pointer in the stack
    unsigned int    lFlags;       // Some flags
} StackPointer; 

typedef struct 
{ 
    HANDLE            hObjectID;  // The waiting object ID
    HANDLE            hThread;    // The current thread ID
    DWORD            dwThreadID;  // The current thread ID
    unsigned int    lState;       // The current state, count, etc...
    HANDLE            hFlag;      // The stack here
    Commands        Command;      // The current command
    char            sName[256];   // The object name (if any)
    LARGE_INTEGER   llTimestamp;  // The message timestamp
    StackPointer    xPointers[10];// The stack pointers 
                                  // (only 10 pointers are saved)
} CommunicationObject;
// with Commands being an enum like 
// CmdCreateThread = 0, CmdExitThread = 1, etc...

A hooking function will then look like:

C++

// Declare the signature for waiting functions
THREADSPY_API HANDLE WINAPI MyCreateThread(LPSECURITY_ATTRIBUTES 
lpThreadAttributes, DWORD dwStackSize, LPTHREAD_START_ROUTINE lpStartAddress, 
LPVOID lpParameter, DWORD dwCreationFlags, LPDWORD lpThreadId)
{
    DWORD dwID;
    HANDLE hHandle =  CallFunction(CreateThread)(lpThreadAttributes, 
dwStackSize, lpStartAddress, lpParameter, dwCreationFlags, &dwID);
    if (hHandle != NULL)
    {
        CommunicationObject xObj;
        memset(&xObj, 0, sizeof(xObj));
                // Get the true current thread handle
        xObj.hThread   = GetTrueCurrentThread();
                // Get the manipulated handle (it is a thread id here)
        xObj.hObjectID = hHandle;
                // And other data
        xObj.lState    = dwID;
                // Save the caller thread ID
        xObj.dwThreadID = GetCurrentThreadId();
        if (lpThreadId != NULL) 
        {
            *lpThreadId = dwID;
        }
                // Save the current thread
        xObj.Command = CmdCreateThread;
                // Then send the structure (but fill it 
                // with timestamp and call stack before)
        Communicate(&xObj, sizeof(xObj));
                // Send another command if the thread is suspended
        if (dwCreationFlags & CREATE_SUSPENDED)
        {
            xObj.lState = 1;
            xObj.Command = CmdSuspendThread;
            CommunicateWithoutTime(&xObj, sizeof(xObj));
        }

        // We don't need the handle anymore
        if (xObj.hThread != NULL) CallFunction(CloseHandle)(xObj.hThread);
    }

    return hHandle;
}

The tricky part

Okay, now we simply have to fill in the structure. Even if it looks easy, it is not because of any problem due to shortage in Win32 API. For example, if you use the GetCurrentThread function, the Windows API will return a special handle CURRENT_THREAD_HANDLE. This information is, of course, not useful here. This is how Windows handles HANDLE. For the same kernel object, one can have multiple handles on it, on different memory space. So with a HANDLE it is not possible to uniquely identify a thread, we need its ID. While it is easy to store the thread handle and ID in a structure in any program, it is not obvious that the debuggee program will have such a mapping. That's why we need to find out how to get the thread ID given its HANDLE. For example, when a thread call TerminateThread to kill another thread is called, it only uses the killed thread handle, not its ID. The server will never be able to match which thread was killed (or this will require a kind of matching algorithm etc., etc...). Windows Server 2003 provides a function called GetThreadId to get the thread ID, but because it is only in Win2K3, it is not useful.

The solution to this issue is to use the NtQueryInformationThread hidden function from NTDLL.DLL (NTDLL.DLL is mapped in every process memory) like Visual Studio debugger. This function can return a CLIENT_ID structure with the thread ID in it. For more information, please see the Undocumented NT Internal.

Now that we can identify the object being manipulated, we need to get the stack trace. This can be done by using the StackWalk function from Win32 API. Usually, this function is used by stopping the debuggee thread, retrieving its context, and then resuming the thread. As we don't want to stop the thread (because it will change scheduling order while being debugged), we need to fill up a context structure by ourselves. The trick is to read the EIP register before using StackWalk, and this can be done easily with a few ASM commands like:

C++

CONTEXT c;

_EnterCriticalSection(&mhSection);
// This is a ugly code to get the stack trace
__asm
{
    call GetEIP
    GetEIP:
    pop eax
    mov c.Eip, eax
    mov c.Ebp, ebp
};
    _LeaveCriticalSection(&mhSection);

The last trick needed is to detect when a thread has stopped. Because the injected DLL cannot create a thread without disturbing the program, we are supplying a function called CheckRunningThread that uses the same signature as a Thread Start routine, taking a thread handle in client memory space, and returning a DWORD for the thread state. By using the same method for injecting the DLL (thanks to CreateRemoteThread), the server can stop the process being debugged, create a remote thread starting on the CheckRunningThread function, and reading the thread status before resuming the debuggee. This way, the server can check when a thread finishes and detect for never-released objects.

To conclude

In this article there are answers to some impossible things (from MSDN) like:

Get a thread ID from its thread handle (Google around, and you'll see it is a real issue).
Get the stack trace of a running thread (again it is not a usual practice).
Get a real thread handle instead of the default generic value (using DuplicateHandle).
Spy when a thread in a remote process has finished.
Communicate with a server about any action.
Hook any Win32 API function.

The drawback is that it requires WinNT kernel (like XP, 2K and 2K3), but I'm sure it is not an issue nowadays.

This is for the client part. I will provide the source code for both the server and client in the next part (Part III). We will then see how to get the entire log of synchronization function call in a process, and how to map the call stack value to function names.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here