Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / C++

Rabbit Threads: Making Threads Jump

4.96/5 (38 votes)
14 Nov 2007CPOL16 min read 1   1.4K  
Compel threads to execute out of context code using inline assembly.

Introduction

A firm understanding of multithreaded programming is crucial for successful Windows programming. While Microsoft provides information for standard threaded programming through MSDN topics such as Processes and Threads [1], at times, more information is desired so that one may take advantage of non-procedural programming techniques.

Jeffrey Richter discusses when and why to create a thread, and when not to create a thread [2]. This article will discuss how one can misuse threads while exploring Operating System data structures associated with threads. The misuse will be coined 'Rabbit Threads' (a neologism introduced by Peter Szor) in spirit with the use of the term rabbit (which hops around) to describe a computer virus which lives on the host in favor of another.

Definitions

The following is a brief summary of the terms used throughout this article. Anyone who has attended a Mathematics conference should appreciate the brevity.

Program

A program is a static on-disk representation of a process. When a program is executed, it becomes a process [3]. Russinovich and Solomon provide a detailed discussion of process creation in Microsoft Windows Internals, Fourth Edition: Windows Server 2003, Windows XP, and Windows 2000 [4].

Process

Jeffrey Richter describes a process as an instance of a running program [5]. Each process provides the resources needed to execute a program. A process has a virtual address space, executable code, open handles to system objects, a security context, a unique process identifier, environment variables, a priority class, minimum and maximum working set sizes, and at least one thread of execution [6].

Thread

A thread is the entity within a process that can be scheduled for execution by the Operating System [7]. Each thread maintains user and kernel stack areas, exception handlers, a scheduling priority, thread local storage, a unique thread identifier, and a set of structures the system will use to save the thread context until it is scheduled [8]. On x86 Windows NT, the kernel stack size is 12 KB, while x64 platforms enjoy a 24 KB kernel stack [9].

Fiber

A fiber is a unit of execution that must be manually scheduled by the application. Fibers run in the context of the threads that schedule them [10]. Fibers were added to Windows to assist in porting UNIX server applications to Windows [21].

PCB

Process Control Block or Kernel Process Block. The PCB is a KPROCESS structure, which is part of the EPROCESS data structure. The structure is used by the kernel to schedule threads.

EPROCESS

Executive Process Block is the Windows kernel data structure which represents a process [11]. This structure is accessible through the PsLookupProcessByProcessId function [12].

ETHREAD

Executive Thread Block is the Windows kernel data structure which represents a thread [13]. This structure is accessible through the PsLookupProcessByThreadId function [14].

PEB

The Process Environment Block contains information such as environmental variables [15] and the TLS (Thread Local Storage) array [16]. However, because the PEB contains information which may be modified by the program, it lies in the process address space [17]. A pointer to the PEB may be obtained through the EPROCESS block.

TEB

The Thread Environment Block is part of the ETHREAD data structure [15]. Because the TEB contains information which may be modified by the program, it lies in the process address space [16].

WinDbg

Visual Studio 8.0 exception while attempting to trace a rabbit thread

WinDbg and kd will be used to investigate the kernel's management of threads. A kernel debugger such as windbg or kd is required because it is desired to view objects and structures which are not available in the user area. Because these objects are kernel objects, Visual Studio does not know how to interpret their data. In addition, Visual Studio 2005 does not cope well with the techniques that follow. Finally, the method results in a "sp-analysis failed" from IDA Pro when building a call graph for analysis.

Basic WinDbg use was introduced in An Analysis of the Windows PE Checksum Algorithm [19]. The following will extend the discussion. To examine the kernel object associated with processes and threads, start WinDbg. Accept the default parameters as they do not affect the local debug session. Before clicking OK, select the Local tab. Otherwise, WinDbg will attempt to establish a connection on a COM port. Refer to figures 2 and 3. Note that initiating a Kernel Debug session will allow the use of the debugger. Otherwise, one would have to load an executable to have access to the Command window.

Starting a Kernel Debug Session

Local Debugging

Figure 2: Starting a kKernel Debug Session
Figure 3: Local Debugging

The structures of interest for this article are EPROCESS and ETHREAD. To view the ETHREAD structure, issue the dt _ethread command. dt is the Display Types command. Refer to Figure 4.

ETHREAD Structure

Figure 4: ETHREAD Structure

The values to the left of the members are the hexadecimal offsets of the members. Some members, such as the Tcb, are structures. To recurse a substructure, issue the same command with recursion: dt -r _ethread. To specify levels of recursion, provide an argument with the switch. For example, -r1 or -r3. Refer to Figure 5.

Substructure Enumeration

Figure 5: Substructure Enumeration

Multithreaded Programming

Image 6

The following example is a typical multithreaded program. The program creates a worker thread which simply exits. It also demonstrates basic synchronization and the proper release of the acquired thread resource handle. This will provide a baseline for investigating threading behavior. The CretateThread() documentation may be found in MSDN [17].

For those who are not interested in WinDbg exploration or already understand the basic example, please skip to 'Rabbit Threads 2' below.

C++
int main( ) {

    HANDLE hWorkerThread = NULL;
    DWORD dwWorkerThreadID = 0;
 
    hWorkerThread = CreateThread( NULL, 0,
        reinterpret_cast< LPTHREAD_START_ROUTINE >( ThreadProc ),
        NULL, CREATE_SUSPENDED, &dwWorkerThreadID );
 
    // Sanity Check

    if( hWorkerThread == NULL ) { return -1; }
 
    // Start Worker

    ResumeThread( hWorkerThread );
 
    // Synchronization

    WaitForSingleObject( hWorkerThread, INFINITE );
 
    // Cleanup

    CloseHandle( hWorkerThread );
 
    return 0;
}

And the corresponding worker thread procedure:

C++
DWORD WINAPI ThreadProc( LPVOID lpParameter ) {

    return 0;
}

Conceptually, the program flow can be depicted as in Figure 6.

Program Flow

Figure 6: Program Flow

Rabbit Threads 1

Though example one is trivial, it will be used to get familiar with a live target under WinDbg, and provide background for later examples.

To examine the program flow in WinDbg, open the executable from the File menu. Next, set a breakpoint on main to skip nonessential initializations. To determine the proper function, issue dt RabbitThreads!*main*. From Figure 7, it is apparent the breakpoint should be set at wmain, so issue bp RabbitThreads!wmain. Alternately, one can issue either bp 0x0041000 or bp wmain.

Determining Breakpoint Location for main()

Figure 7: Determining Breakpoint Location for main()

Finally, breakpoints can be set as follows on thread entrancy points:

0:000> bp wmain
0:000> bp ThreadProc
0:000> bl
0 e 00401000 0001 (0001) 0:**** RabbitThreads!wmain
1 e 00401540 0001 (0001) 0:**** RabbitThreads!ThreadProc

Pressing F5 or typing 'g' in the Command Window will cause the program to run until the breakpoint at wmain is reached. Once the breakpoint fires, WinDbg will also display the source listing, if available. Refer to Figure 8.

wmain() Breakpoint and Source Code

Figure 8: wmain() Breakpoint and Source Code

PEB

With the program execution stopped at wmain(), issue !peb to view the Process Environment Block. Figure 9 is the output of the command issued from the Command Browser (Command Browser).

Process Environment Block

Figure 9: Process Environment Block

For a verbose output with offsets, one can issue dt -r _PEB 0x7FFDC000 to dump the memory at 0x7FFDF000 as the Process Environment Block (see Figure 10).

Raw Process Environment Block

Figure 10: Raw Process Environment Block

TEB

The Thread Environment Block can be examined using !teb. Notice the Self field specifying 0x7FFDF000. Refer to Figure 11.

Raw Process Environment Block

Figure 11: Thread Environment Block

To view the raw memory, issue dt -r _TEB 0x7FFDF000. Refer to Figure 12.

Raw Thread Environment Block

Figure 12: Raw Thread Environment Block

Restarting the program and breaking after thread creation, two threads exist: the primary thread (executing wmain()), and the worker thread (executing ThreadProc()). The Thread Status Command ('~') displays information related to the two threads which are suspended due to breakpoints. Note that Thread 1 has a period ('.') to the left. This signifies the active thread, so commands issued which affect a thread will use the thread 0x84C. This is known as the thread context. Additionally, the context will be displayed at the Command Window prompt: 0:000> signifies the thread 0, while 0:001> signifies thread 1. Refer to Figure 13.

WinDbg Thread Status Command

Figure 13: WinDbg Thread Status Command

0x8F4 (2292) is the Process ID. The first thread's ID is 0xDE4, and the second thread's ID is 0x84C. This is confirmed by using the Process Explorer as below (Figure 14). Note that 0x84C = 2124 while 0xDE4 = 3556. When observing the PID and TID in the various kernel structures, they are internally referred to as a ClientID (CID). According to Russinovich and Solomon, this is because the same namespace is used to generate both PIDs and TIDs [20].

Process Explorer View

Figure 14: Process Explorer View

At this point, issuing !handle shows the handle table of the process (Figure 15).

Thread Handle Table

Figure 15: Process Handle Table

After the thread which is executing main() calls CloseHandle( hWorkerThread ), the usage count on the worker thread is decremented to 0, allowing the Operating System to reclaim the resources. Refer to Figure 16.

Thread Handle Table

Figure 16: Process Handle Table

Rabbit Thread 2

The second sample will demonstrate a single threaded application accessing the next instruction by way of a push/ret pair. The push/ret pair is depicted by the loop below. The pair keeps the stack balanced so that an additional adjustment is not required. Conceptually, Figure 17 depicts the program flow.

Image 19

Figure 17: Program Flow

Below is the C++ source code with inline assembly which accomplishes the task. LOCATION is resolved to an address. When the compiler encounters the label LOCATION, its name (LOCATION) and address are added to a table for code generation.

C++
int main( )
{
    DWORD dwProcessID = 0;
    DWORD dwPrimaryThreadID = 0;
    DWORD dwLocation = NULL;
 
    dwProcessID = GetCurrentProcessId();
    cout << _T("Process ID = 0x");
    HEXADECIMAL_OUTPUT(4);
    std::tcout << dwProcessID << endl;
 
    dwPrimaryThreadID = GetCurrentThreadId();
    cout << _T("Primary thread ID = 0x");
    HEXADECIMAL_OUTPUT(4);
    cout << dwPrimaryThreadID << endl;
 
    __asm {
 
        push eax
        mov eax, LOCATION
        mov dwLocation, eax
        pop eax
    }
 
    cout << _T("Target return address = 0x");
    HEXADECIMAL_OUTPUT(8);
    cout << dwLocation << endl;
 
    __asm {
 
        push LOCATION
        ret
    }
 
LOCATION:
 
    return 1;
}

Using a CONTEXT structure with GetThreadContext() and SetThreadContext() proved to be more difficult than inline assembly using labels. When fetching or setting a context, the thread must be suspended. In addition, there was no comfort in using the functions since portability was abandoned implicitly by using the CONTEXT structure.

The assembler code below is not required for program execution. It is required because the compiler issues a C2451 error when attempting to send LOCATION directly to cout.

ASM
__asm {
 
    push eax
    mov eax, LOCATION
    mov dwLocation, eax
    pop eax
}

Verifying correct program execution using WinDbg is shown in Figure 18. The upper right window is the Source Code window. Under the debugger, the ret instruction of the inline assembly will be the next instruction executed. The lower Command Windows displays the address, which the program will begin executing upon transfer of control (0x004011C7). The final window is the Disassembly window. Note from the Disassembly window that the program begins preparing the return value of 1 at location 0x004011C7.

Program Execution with WinDbg

Figure 18: Program Execution with WinDbg

IDA Pro successfully graphs this thread when analyzing the program flow. However, IDA Pro incorrectly labels the code segment loc_4011C9, rather than loc_4011C7. In Figure 19, the larger node is main(), and the smaller node is the code equating to return 1. The right hand figure is the improper labeling of the code.

IDA Pro Analysis

IDA Pro Analysis

Figure 19: IDA Pro Analysis

Rabbit Threads 3

Example 3

The third example combines the first two examples. The sample will demonstrate a worker thread which rabbits out to execute the primary thread's code. It will also exit by way of the primary thread's code (through main()), rather than its own ThreadProc() shown to the right. Conceptually, this is presented in Figure 20.

Program Flow

Figure 20: Program Flow
C++
int main( )
{
    HANDLE hWorkerThread = NULL;
    DWORD dwProcessID = 0;
    DWORD dwPrimaryThreadID = 0;
    DWORD dwWorkerThreadID = 0;
    DWORD dwExitArea = NULL;
 
    __asm {
 
        push eax
        mov eax, EXITAREA
        mov dwExitArea, eax
        pop eax
    }
 
    hWorkerThread = CreateThread( NULL, 0,
        reinterpret_cast< LPTHREAD_START_ROUTINE >( ThreadProc ),
        reinterpret_cast< LPVOID> ( &dwExitArea ),
        CREATE_SUSPENDED, &dwWorkerThreadID );
 
    // Sanity Check

    if( hWorkerThread == NULL ) { return -1; }
 
    // Start Worker

    ResumeThread( hWorkerThread );
 
    // Synchronization

    WaitForSingleObject( hWorkerThread, INFINITE );
 
    // Cleanup

    CloseHandle( hWorkerThread );
 
EXITAREA:
 
    return 1;
}

The noteworthy addition to this code is the inline assembly to determine the location for execution before the call to CreateThread(). This was required since the location is being passed as a parameter to the worker thread's ThreadProc. In keeping with the prototype of ThreadProc, a pointer to the location is being passed to ThreadProc, rather than the location itself. Shown below is the worker thread's code:

C++
DWORD WINAPI ThreadProc( LPVOID lpParameter )
{
    DWORD dwLocation = NULL;
  
    if( NULL == lpParameter ) { return -2; }
 
    dwLocation = * ( reinterpret_cast< DWORD* > ( lpParameter ) );
 
    __asm {
 
        push dwLocation
        ret
    }
 
    return 1;
}

Execution of the code is as expected - the worker thread begins executing code located in main() after the ret. Below is the graph of main() from IDA. The additional labels were added to aid in visualization. Again, a slight mislabeling occurred. Refer to Figure 21.

IDA Pro Analysis

Figure 21: IDA Pro Analysis of main()

IDA Pro claims SP-Analysis failed when generating the graph of ThreadProc (Figure 22).

Image 26

Figure 22: IDA Pro Analysis of ThreadProc()

Rabbit Threads 4

Demonstration four will build upon three so that the worker thread exits using ThreadProc after being mildly promiscuous. While the worker is in main(), it will print a message stating such, and then hop back into its code context. Its depiction is shown in Figure 22.

Image 27

Figure 22: Program Flow

Below is the primary thread's code. There is one addition: a test to determine which thread is executing. If the currently executing thread is the worker thread, a ret is executed.

C++
if( GetCurrentThreadId() == dwWorkerThreadID )
{
    _asm ret
}

The return is examined in detail in ThreadProc.

C++
int main( )
{
    HANDLE hWorkerThread = NULL;
    DWORD dwProcessID = 0;
    DWORD dwPrimaryThreadID = 0;
    DWORD dwWorkerThreadID = 0;
    DWORD dwCurrentThreadID = 0;
    DWORD dwLandingArea = NULL;
 
    __asm {
 
        push eax
        mov eax, LANDINGAREA
        mov dwLandingArea, eax
        pop eax
    }
 
    hWorkerThread = CreateThread( NULL, 0,
        reinterpret_cast< LPTHREAD_START_ROUTINE >( ThreadProc ),
        reinterpret_cast< LPVOID> ( &dwLandingArea ),
        CREATE_SUSPENDED, &dwWorkerThreadID );
 
    // Sanity Check

    if( hWorkerThread == NULL ) { return -1; }
 
    // Start Worker

    ResumeThread( hWorkerThread );
 
    // Synchronization

    WaitForSingleObject( hWorkerThread, INFINITE );
 
    // Cleanup

    CloseHandle( hWorkerThread );
 
LANDINGAREA:
 
    dwCurrentThreadID = GetCurrentThreadId();
    cout << _T("Executing main function thread ID = 0x");
    HEXADECIMAL_OUTPUT(4);
    cout << dwCurrentThreadID << endl;
 
    if( GetCurrentThreadId() == dwWorkerThreadID )
    {
        _asm ret
    }
 
    return 1;

And, the corresponding ThreadProc function is shown below. To prepare for the ret to be executed in main(), the desired return address is placed on the worker thread's stack while still in ThreadProc.

C++
DWORD WINAPI ThreadProc( LPVOID lpParameter )
{
    DWORD dwExitAddress = NULL;
 
    if( NULL == lpParameter ) { return -2; }
 
    dwExitAddress = * ( reinterpret_cast< DWORD* > ( lpParameter ) );
 
    __asm {
 
        push RETURNAREA
        push dwExitAddress
 
        ret
    }
 
RETURNAREA:
    DWORD dwCurrentThreadID = GetCurrentThreadId();
    cout << _T("Exiting worker function thread ID = 0x");
    HEXADECIMAL_OUTPUT(4);
    cout << dwCurrentThreadID << endl;
 
    return 2;
}

Figure 23 is the result of running sample four. Notice that both thread exited though main(). The problem lies with dwWorkerThread. dwWorkerThread is a memory location. The code which was generated by the compiler implicitly expected the location to be at a certain relative address in the context of the primary thread. The worker thread retrieved the value of dwWorkerThread using the worker's stack, which proved to be an incorrect value.

Image 28

Figure 23: Program Results

Figure 24 reveals the generated code which is causing the incorrect program results.

Relative Base Address of dwWorkerThread

Figure 24: Relative Base Address of dwWorkerThread

According to WinDbg, dwWorkerThread is located at the Relative Base Address of EBP-8. This would be true if the stack were that of the primary thread. Dumping the memory of the worker thread at EBP-8 reveals a value of 0x0050FFAC. Dereferencing offers a value of 004012D8. This is clearly not a thread ID - it appears to be an address in the process space. For completeness, a second dereference was performed on 0x004012D8, resulting in 300015FF. Refer to Figure 25.

Worker Thread Stack

Figure 25: Worker Thread Stack

Further investigation showed that 0x004012D8 is the first byte of the instruction sequence FF1500304000 - call dword ptr [GetCurrentThreadId].

To resolve this issue, the test will be changed to the following. There is a small chance this too will produce an incorrect result. However, not nearly as often as the previous test - the previous test returned an incorrect result nearly every time.

C++
if( GetCurrentThreadId() != dwPrimaryThreadID )

An additional solution could have been obtained by making dwWorkerThread a global variable. However, as a Computer Science freshman knows, a global variable is an inappropriate solution.

For the same reason that dwWorkerThread was incorrect, the following would incorrectly overwrite a value in the worker thread's stack. This is because dwCurrentThread would be based upon code generated for the primary thread (relative to EBP of the primary thread):

C++
dwCurrentThread = GetCurrentThreadId();
if( dwCurrentThread != dwPrimaryThreadID )
...

The modified code results are shown in Figure 26. Notice how each thread claims to be "Executing main function", and that the worker thread exits from its procedure, and the primary thread exits through main.

Worker Thread Stack

Figure 26: Proper Program Execution

As with previous examples, IDA did not complete the visualization of main. In addition, the label should have been loc_4013F9 rather than loc_4013FE. A similar situation occurred with respect to ThreadProc.

Image 32

Figure 27: IDA Pro Visualization of main()

Rabbit Threads 5

Example 5 removes the ambiguity of Example 4 by providing the correct value for the thread's ID. It accomplishes this by mirroring the stack frame of the primary thread. Note that there will be differences in the values of the variables when comparing the frames (such as return addresses). What is of interest is the local variable allocations.

DWORD WINAPI ThreadProc( LPVOID lpParameter )
{
    HANDLE hWorkerThread = NULL;    // Mirroring Stack Frame Structure

    DWORD dwProcessID = 0;          //   of the code which executes main()

    DWORD dwPrimaryThreadID = 0;    //

    DWORD dwWorkerThreadID = 0;     //

    DWORD dwCurrentThreadID = 0;    //

    DWORD dwLandingArea = NULL;     //


    // The only variables of interest

    //   for this thread

    DWORD dwExitAddress = NULL;

    dwWorkerThreadID = GetCurrentThreadId();

    ...
}

With the mirrored frames in place, one can perform the previous test with certainty of the results:

C++
if( GetCurrentThreadId() == dwWorkerThreadID )
{
     asm ret
}

There is one caveat to using this method: the compiler does not reserve stack space in the order in which variables are declared. Figure 28 below demonstrates the warning. Notice that the variables are initialized in the order in which they were declared, but the RVAs are out of order: 0x12FF64, 0x12FF70, 0x12FF68.

Stack Layout of Variables

Figure 28: Stack Layout of Variables

Figure 29 represents the executable under WinDbg. WinDbg shows relative based addressing, which fully demonstrates the issue. Again, variables are initialized as declared, but are not in the declared order on the thread's stack.

Stack Layout of Variables (Relative to EBP)

Figure 29: Stack Layout of Variables (Relative to EBP)

The compiler allocation issue means the programmer must be aware of allocation strategies, and verify the results under a debugger or a symbol tool.

Downloads

Acknowledgements

  • Ken Johnson, Microsoft MVP

Revisions

  • 11.14.2007 - Expanded PEB information.
  • 11.07.2007 - Added Example 5.
  • 11.01.2007 - Initial release.

References

  1. Microsoft website, Processes and Threads, accessed October 2007.
  2. J. Richter, Programming Applications for Microsoft Windows, 4ed., Microsoft Press, 1005, pp. 182-184, ISBN 1-5723-1996-8.
  3. M. Russinovich and D. Solomon, Microsoft Windows Internals, Fourth Edition: Windows Server 2003, Windows XP and Windows 2000, Microsoft Press, 2005, p. 6, ISBN 0-7356-1917-4.
  4. M. Russinovich and D. Solomon, Microsoft Windows Internals, Fourth Edition: Windows Server 2003, Windows XP and Windows 2000, Microsoft Press, 2005, p. 6, ISBN 0-7356-1917-4.
  5. J. Richter, Programming Applications for Microsoft Windows, 4ed., Microsoft Press, 2005, p. 69, ISBN 1-5723-1996-8.
  6. Microsoft website, About Processes and Threads, accessed October 2007.
  7. Microsoft website, Multiple Threads, October 2007.
  8. Microsoft website, About Processes and Threads, accessed October 2007.
  9. Microsoft website, How do I keep my driver from running out of kernel-mode stack?, accessed October 2007.
  10. Microsoft website, Fibers, accessed October 2007.
  11. M. Russinovich and D. Solomon, Microsoft Windows Internals, Fourth Edition: Windows Server 2003, Windows XP and Windows 2000, Microsoft Press, 2005, p. 289, ISBN 0-7356-1917-4.
  12. Microsoft website, PsLookupProcessByProcessId, accessed October 2007.
  13. M. Russinovich and D. Solomon, Microsoft Windows Internals, Fourth Edition: Windows Server 2003, Windows XP and Windows 2000, Microsoft Press, 2005, p. 289, ISBN 0-7356-1917-4.
  14. Microsoft website, PsLookupThreadByThreadId, accessed October 2007.
  15. Microsoft website, Environment Variables, accessed October 2007.
  16. M. Russinovich and D. Solomon, Microsoft Windows Internals, Fourth Edition: Windows Server 2003, Windows XP and Windows 2000, Microsoft Press, 2005, p. 291, ISBN 0-7356-1917-4.
  17. Microsoft Website, CreateThread Function, accessed Ocotber 2007.
  18. M. Russinovich and D. Solomon, Microsoft Windows Internals, Fourth Edition: Windows Server 2003, Windows XP and Windows 2000, Microsoft Press, 2005, p. 289, ISBN 0-7356-1917-4.
  19. J. Walton, An Analysis of the Windows PE Checksum Algorithm, accessed October 2007.
  20. M. Russinovich and D. Solomon, Microsoft Windows Internals, Fourth Edition: Windows Server 2003, Windows XP and Windows 2000, Microsoft Press, 2005, p. 12, ISBN 0-7356-1917-4.
  21. J. Richter, Programming Applications for Microsoft Windows, 4ed., Microsoft Press, 2005, p. 417, ISBN 1-5723-1996-8.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)