Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / ASM

Grafting Compiled Code: Unlimited Code Reuse

4.83/5 (30 votes)
7 Mar 2008CPOL17 min read 1   2.2K  
Add functionality to a project using existing compiled machine code.

Introduction

This article will demonstrate techniques to incorporate compiled machine code into an existing project using ASM source files. The assembly source file will be created from the compiled machine code. In addition, the article will remove the single thread safety limitation of Imagehlp.dll, and show techniques for converting a compiled STDCALL procedure to a C-CALL assembly language routine.

Example one will present a standard console application. The program adds two numbers and then returns the result. The example will introduce a few of the obstacles which will be encountered. Sample two will incorporate the graft using the compiled machine code of Add() from the first.

Finally, example three will use foreign code from imagehlp.dll to supplement the PEChecksum program. This will remove the requirement of the imagehlp library dependency from a project and fully demonstrate the techniques. The PEChecksum program was presented in An Analysis of the PE Checksum Algorithm. Those who are familiar with x86 assembly, WinDbg, and IDA Pro should begin at example three.

The samples will use standard C as the language. For those interested in reusing compiled C++, please see Paul Vincent Sabanal's and Mark Vincent Yason's BlackHat 2007 presentation, Reversing C++.

Optimizations

To keep the first two examples academic, optimizations will be disabled. In general, release code with optimizations lack the structure desired for the samples presented in this article. Release samples will be built with optimizations disabled (/Od). Refer to Figure 1.

Disable Optimizations

Figure 1: Disable Optimizations

With optimizations enabled, some function were inlined. For example, Figure 2 shows no corresponding call to a function that is present in the source file. wmain is entered at 0x00401000. The first function call encountered is at 0x40101D. 0x40101D is a call to cout. In this case, Add() was optimized away.

Listing with Missing Function Due to Optimization

Figure 2: Missing Function Call Due to Optimization

Another optimization which was not desired was the Frame Pointer Omission. FPO was not used to keep the examples easier to follow. The Frame Pointer is generally created with the instruction sequence shown below. The lack of a Frame Pointer adds a small wrinkle to an otherwise academic exercise. Frame Pointers are addressed in the 'Stack, ESP, and EBP' section.

Global Variables

Variables which are global to the process are placed in either the .data section, or (if present) the .bss section. The sections correspond to the initialized and uninitialized data sections respectively. Usually, a program refers to a global variable by address rather than relative based addressing (such as EBP used in local variables). For example, notice the code generated for access to the global scratch variable in Figure 3.

Global Variable Storage Access

Figure 3: Global Variable Storage Access

The address used is 0x00417000. When examining the program in PE Browse, the variable is listed under the .data section at address 0x00417000. Refer to Figure 4.

Allocation of Global Variable

Figure 4: Allocation of a Global Variable

The disassembly and PE header of an uninitialized global variable is shown in Figure 5. The four byte values at address 0x00403374 are garbage: CR, LF, [SPACE], and [SPACE].

Allocation of Unitialized Global Variable

Figure 5: Allocation of an Uninitialized Global Variable

The meaning of push ECX will be examined in the section Local Variables.

Local Variables

Local variables are stored on the thread's stack. Figure 5 shows a lone push of ECX, even though ECX is not used in main() per se. This is a technique to create local storage by the compiler for the variable i. Rather than issuing sub ESP, 4 - a three byte opcode (0x83 0xEC 0x04), push ECX - a one byte opcode (0x51) is used.

Local Allocation of a Single Variable

Figure 6: Local Allocation of a Single Variable

When a greater number of variables require storage allocations, a sub ESP, n is used (where n is the number of bytes required). For example, Figure 7 shows the allocation of five DWORD variables. Rather than issuing a series of five push ECX, the compiler issues one sub ESP, 0x14.

Local Allocation of Multiple Variables

Figure 7: Local Allocation of Multiple Variables

Stack, ESP, and EBP

The stack is an area in memory which a thread uses as a 'scratch pad' during program execution. Each thread in a process has its own stack. One typically envisions memory for the stack as a contiguous region starting at a low address and moving sequentially to a large address. This is similar to addressing in a heap or an array - A[0] resides at a lower address than A[63]. However, unlike most memory access operations, stacks grow down.

ESP is the stack pointer, and is maintained by the processor. EBP (if used) is maintained by the thread. When the processor encounters a push n instruction, two actions occur in the order listed below:

  • the processor decrements ESP by the machine's word size
  • the value n is placed on the stack at ESP

This implies ESP always points to the last value placed on the stack.

Argument Size and Call Stack

Figure 8: Argument Size and Call Stack

Because the push of a value is always of machine word size, pushing five consecutive bytes consumes 20 bytes (0x14) on the stack - even though the local allocation could use only two DWORDs (8 bytes). Finally, there is no 0 extend for the truncated push of a byte. Whatever occupied the upper three bytes of register surfaces as part of the function's parameters, even though only the low order byte is of interest. This is embodied in the instruction sequence mov al, byte ptr [a]; push eax. Refer to figure 8.

The final comment to make with regard to code generation is the compiler's awareness of multiple pipelines in the processor. Rather than reusing eax by issuing:

ASM
mov  al, byte ptr [a]
push eax
...
mov  al, byte ptr [d]
push eax
...

the compiler will rotate register usage (eax, ecx, edx) so that the execution pipe remains full. This optimization is critical to performance since there are no branches which might otherwise stall execution due to a branch prediction miss. So we could expect to see the following:

ASM
mov  al, byte ptr [a]
push eax
...
mov  dl, byte ptr [d]
push edx
...

When creating a stack frame for based address referencing, the compiler will issue a standard pair of instruction. The frame creates a well defined "function context". The typical prologue for stack based operations is the sequence:

ASM
push EBP
mov  EBP, ESP

The above sequence is emitted in each function invoked so that a thread (the function) does not accidentally destroy the stack pointer (ESP). Conversely, when a function exits, one typically encounters a stack and EBP restorations. This is required since the calling function has a different reference from which it is working:

ASM
pop EBP
ret

One should conclude that EBP can be relative to the function (if used), while ESP is relative to the thread. This implies when one sees EBP-0xn, the function is referring to local storage in the function. EBP+0xn signals the thread is accessing a local variable which was created by the calling function, or a function in the call chain.

OR and XOR

When viewing a disassembly, it is not uncommon to encounter xor eax, eax, and or eax, 0xFFFFFFFF. The first instruction is equivalent to mov eax, 0, while the second is equivalent to mov eax, -1. They are optimized versions of the generated code. Usually, the instruction sequences use less space than their equivalent cousins.

Image 9

Code Graft 1

Code Graft 1 is the base line on which the remaining examples will expand upon. Due to the compiler and linker's behavior, the first example is somewhat more complex than desired. The following will detail the issues encountered and outline the general workarounds used in this article.

The source code to the first sample is listed below. main() calls Add(), which adds two numbers. The result is then displayed on standard output.

C++
int main( )
{
    DWORD Augend = 32437;   // 0x7EB5
    DWORD Addend = 15369;   // 0x3C09
    DWORD Sum = 0;          // 0xBABE
    
    Sum = Add ( Augend, Addend );
 
    cout << _T("Augend: ") << Augend << endl;
    cout << _T("Addend: ") << Addend << endl;
    cout << _T("   Sum: ") << Sum << endl;
 
    return 0;
}
 
DWORD Add( DWORD Augend, DWORD Addend )
{
    DWORD result = Augend + Addend;
 
    return result;
}

The first issue encountered is that storage layout does not honor source code declarations. Refer to Figure 9.

Storage Layout versus Source Code Declaration

Figure 9: Storage Layout vs. Source Code Declaration

The variables are declared and initialized in the following order:

  • Augend
  • Addend
  • Sum

However, the layout in memory is:

  • Addend (EBP-0x04) - high memory
  • Augend (EBP-0x08)
  • Sum (EBP-0x0C) - low memory

The second issue encountered is the Add() function's use of variables. Add() creates a scratch variable (result) and adds the two values. The result is then returned to main(). Since Add() accepts two arguments - Augend and Addend, one would expect the function to operate on EBP+0x04 and EBP+0x08. EBP+0x04 and EBP+0x08 are the expected relative base addresses since it is presumed they have been pushed on the stack. For the temporary result, it is expected the value would be returned in one of two ways:

  • through the use of the Sum variable at EBP+0x00
  • through the use of EAX

However, with execution halted in Add(), a different scenario is observed. Refer to Figure 10.

Image 11

Figure 10: Storage Layout vs. Source Code Declaration

Before executing the instructions of Add() (but after entering the function), the stack appears as below. Refer to Figure 11.

Image 12

Figure 11: Stack Layout After Calling Add()

Once the function has been executed (but before the execution of the return), the stack layout is as shown in Figure 12.

Stack Layout Relative to EBP

Figure 12: Stack Layout

Table 1 explains the values with respect to their address (RVA).

Item

Address

Value

Comment

1

0x12FF5C

BABE

Add::result (created by Add)

2

0x12FF60

0012FF7C

EBP of main

3

0x12FF64

00401028

Return address

4

0x12FF68

7EB5

Add::Augden (pushed by main)

5

0x12FF6C

3C09

Add::Addend (pushed by main)

6

0x12FF70

0

main::Result

7

0x12FF74

7EB5

main::Augend

8

0x12FF78

3C09

main::Addend

Table 1: Stack Layout

Code Graft 2

Example two is Code Graft 1 less the function for Add() in the C++ source file. Code Graft 1's generated code for Add() under WinDbg is shown below.

ASM
00401cc0 55           push    ebp
00401cc1 8bec         mov     ebp, esp
00401cc3 51           push    ecx
00401cc4 8b4508       mov     eax, dword ptr [ebp+8]
00401cc7 03450c       add     eax, dword ptr [ebp+0Ch]
00401cca 8945fc       mov     dword ptr [ebp-4], eax
00401ccd 8b45fc       mov     eax, dword ptr [ebp-4]
00401cd0 8be5         mov     esp, ebp
00401cd2 5d           pop     ebp
00401cd3 c3           ret

The Graft

At this point, the donor (Code Graft 1) provides 20 bytes of code for the recipient (Code Graft 1). The easiest way to incorporate the functionality is through an assembly file (with a custom build step) added to the project. This method has the added benefit of allowing incorporation of both x32 and x64 routines since inline assembly is not being used.

There are other methods available to incorporate the graft. The first is to edit the memory directly in WinDbg. This creates a temporary exhibit. Second, inline assembly could be used to emit the instruction sequence. This has the down side that inline assembly is not supported on x64 platforms. The third alternative is patching. Patching an executable usually falls under the purview of Viruses and Crackers. Patching is left as an exercise to the reader. Two changes were required to successfully link the executable:

  • Change Add() to Addition()
  • Function prototype of Addition() was changed to extern "C"

Add() was changed to Addition() because add is a reserved word in the assembler (MASM). extern "C" was added due to the name mangling and link error LNK2001: unresolved external symbol "unsigned long __cdecl Addition(unsigned long,unsigned long)" (?Addition@@YAKKK@Z). The nearly unchanged C++ file is shown below:

C++
extern "C" DWORD Addition( DWORD, DWORD );
 
int main( int argc, char* argv[] )
{
    DWORD Augend = 32437;   // 0x7EB5
    DWORD Addend = 15369;   // 0x3C09
    DWORD Sum = 0;
    
    Sum = Addition ( Augend, Addend );
 
    cout << _T("Augend: ") << Augend << endl;
    cout << _T("Addend: ") << Addend << endl;
    cout << _T("   Sum: ") << Sum << endl;
 
    return 0;
}

To begin, create a file named Addition.asm in the project directory. Next, add the file to the project. Refer to Figure 13.

Adding ASM File to Project

Figure 13: Adding an ASM File to the Project

In later versions of Visual Studio, the environment will ask if it should use the masm.rules Custom Build Rules. Select OK. Refer to figure 14.

MASM Custom Build Rule

Figure 14: MASM Custom Build Rule

If Custom Build Rules is not available, add the following as a Custom Build Step:

  • Debug Command Line
    • ml -c -Zi "-Fl$(IntDir)\$(InputName).lst" "-Fo$(IntDir)\$(InputName).obj" "$(InputPath)"
  • Release Command Line
    • ml -c "-Fl$(IntDir)\$(InputName).lst" "-Fo$(IntDir)\$(InputName).obj" "$(InputPath)"
  • Outputs
    • $(IntDir)\$(InputName).obj

After adding the ASM file to the project, the project will appear as in Figure 15.

Image 16

Figure 15: Addition of ASM File

Next, add the following to Addition.asm. The code below demonstrates the minimum requirements for an assembly procedure.

ASM
PUBLIC Addition
 
.486
.MODEL FLAT, C
 
.CODE
 
Addition PROC
 
    push ebp                        ; Save Caller's EBP
    mov ebp, esp                    ; Grab our Frame Reference 
    push ecx                        ; Storage for local 'result'
    mov eax, dword ptr [ebp+8]      ; Augend 
    add eax, dword ptr [ebp+0Ch]    ; Addend
    mov dword ptr [ebp-4], eax      ; Temporary 'result'
    mov eax, dword ptr [ebp-4]      ; ??? Already in EAX
    mov esp, ebp                    ; Clean 'push ECX' from stack
    pop ebp                         ; Resotre Caller's EBP
 
    ret
 
Addition ENDP
 
END     ; End of .CODE

PUBLIC Addition informs the linker that the procedure Addition is available for any module to use. .486 is a Processor Directive. .MODEL is a Simplified Segment Directive which directs MASM to generate code for a particular memory model. The language ("C") informs the assembler of the calling convention.

.CODE is another Simplified Segment Directive. .CODE begins the code section, while END marks the end of the code section. Other sections exist, such as .DATA. PROC and ENDP are Procedure Directives which book-end the Addition function. Additional procedures would be book-ended in a similar manner with a different label. Residing in the Addition procedure is the copy and paste code from Code Graft 1.

Image 17

After compiling and linking, the first thing that is noticed is the second executable is 0x200 bytes (one paragraph) smaller than the first program. Refer to Figure 14.

Comparison of Executable File Sizes

Figure 14: Comparison of Executable File Sizes

However, the generated code has remained unchanged as far as execution of main() and Addition(). Refer to Figure 15.

Source Code Analysis in WinDbg

Figure 15: Source Code Analysis in WinDbg

Altered PE Checksum using Grafted Code

Code Graft 3

Code Graft 3 will demonstrate code grafting from a foreign executable. Specifically, it will reuse CheckSumMappedFile() from Imagehlp.dll. Imagehlp.dll is a single threaded library, so this is an opportunity to improve the function. A nearly complete treatment of the PE Checksum algorithm was presented in An Analysis of the Windows PE Checksum Algorithm.

To begin, download the PE Checksum Source code. Open StdAfx.h and comment the references to imagehlp.dll; and add a prototype for CheckSumMemMapFile(). The name change was incorporated due to linking with Imagehlp.lib (even though it was not specified).

C++
extern "C" {
  PIMAGE_NT_HEADERS /*WINAPI*/ CheckSumMemMapFile(
    PVOID BaseAddress,
    DWORD FileLength,
    PDWORD ExistingCheckSum,
    PDWORD CalculatedCheckSum
  );
}

extern "C" is required due to name mangling. Notice also that WINAPI (a macro for __stdcall) is missing. This is due to link errors when attempting to link the object files. I suspect this might be a packing issue, but I have not investigated further.

LNK1190: invalid fixup found, type 0x0002

Since the model is no longer STDCALL, the routines used from Imagehlp.dll will require conversion. The three most prevalent issues are:

  • STDCALL to C Call Conversion (stack cleanup)
  • Addition of Local Frame References (EBP)
  • Artifact Cleanup

Add an assembly file to the project named "CheckSum.asm". Create three procedures in CheckSum.asm: CheckSumMemMapFile, _ChkSum, and _ImageNtHeader. Acquire the assembly code for CheckSumMappedFile() and ChkSum() from imagehlp.dll and place it in CheckSum.asm under their respective procedures. Leave _ImageNtHeader empty at this point. _ImageNtHeader will be a hand coded replacement used in lieu of Imagehlp.dll's call to RtlpImageNtHeader() of NTDLL.DLL.

The only procedure which requires the PUBLIC attribute is CheckSumMemMapFile. This leaves the _ChkSum and _ImageNtHeader 'private' procedures for use by CheckSumMemMapFile.

Alternately, use the listing files for the functions provided in this article. The files are CheckSummMappedFile.listing and ChkSum.listing. The listing files were created from a Copy and Paste operation in WinDbg while examining the original CodeGraft.exe. Refer to Figure 16.

Image 21

Figure 16: WinDbg Copy and Paste

Labels

At this point, the listing includes memory addresses and opcodes, and mnemonics. Create labels for any jumps encountered (changing CheckSumMappedFile to CheckSumMemMapFile). For example, at 0x76c96f3b is the following instruction:

ASM
76c96f3b eb1d     jmp     imagehlp!CheckSumMappedFile+0x4f (76c96f5a)

The jump target is 0x76C9F5A. At that location, create a label. Note that the label name is based on the location provided by the disassembly (the '+' has been changed to '_'):

ASM
76c96f3b eb1d          jmp     imagehlp!CheckSumMappedFile+0x4f (76c96f5a)
...
76c96f57 8b7de4        mov     edi,dword ptr [ebp-1Ch]
CheckSumMemMapFile_0x4f:
76c96f5a 85c0          test    eax,eax

Finally, clean the original instruction to coincide with the jump to the label:

ASM
76c96f3b eb1d          jmp     CheckSumMemMapFile_0x4f

Artifacts

There are areas of the code which appear to be artifacts. Examine 0x76c96fc8 for instance. Since there is no assembly mnemonic to generate the opcode, create the code using the DB directive. Note that when using hex notation in MASM, prefix the number with a '0'. DUP is an operator which creates a data byte the requested number of times.

ASM
;; 76c96fc8 ff              ???
;; 76c96fc9 ff              ???
;; 76c96fca ff              ???
DB 3 DUP(0FFh)
 
;; 76c96fcb ff426f          inc     dword ptr [edx+6Fh]
DB 0FFh, 042h, 06Fh
 
;; 76c96fce c9              leave
DB 0C9h
 
;; 76c96fcf 764b            jbe     imagehlp!MapFileAndCheckSumA+0x43 (76c9701c)
DB 076h, 04Bh
 
;; 76c96fd1 6f              outs    dx,dword ptr [esi]
DB 06Fh
 
;; 76c96fd2 c9              leave
DB 0C9h
 
;; 76c96fd3 7690            jbe     imagehlp!CheckSumMappedFile+0x5a (76c96f65)
DB 076h, 090h
 
;; 76c96fd5 90              nop
;; 76c96fd6 90              nop
;; 76c96fd7 90              nop
;; 76c96fd8 90              nop
DB 4 DUP(090h)

Finally, we can remove the unneeded material in the listing. To remove an item in the listing, simply comment it:

ASM
;; push    10h
;; push    offset `string'+0x3c (76c96fc8)
;; call    _SEH_prolog (76c934b9)
    
mov     esi,dword ptr [ebp+10h]
and     dword ptr [esi],0
mov     eax,dword ptr [ebp+0Ch]
shr     eax,1
push    eax
push    dword ptr [ebp+8]
push    0
    
; 76c96f1e e856d6ffff      call    ChkSum (76c94579)    
call _ChkSum

Additional Fixups

The original code installed a Structured Exception Handler upon entry. The CodeGraft.exe code wraps the code in a handler so the installation can be skipped. The removing of the handler is realized by commenting out the call above. This creates a stack imbalance that will be addressed in the STDCALL to C CALL conversion.

STDCALL to C CALL Conversion

This step requires the most analysis. This is due to the fact that Frame Pointers are missing. So each procedure will receive the customary:

ASM
push ebp
mov ebp, esp

Once the additional push is encountered, diligence must be paid to code/stack dependencies. The CheckSumMemMapFile is shown below. Instructions in capital letters were added for stack management. Commented lines were removed. Finally, STDCALL performs a ret n, where n is an adjustment to ESP. C-CALL uses a vanilla ret, with the callee performing the stack adjustment. The result of the cleanup is available as CodeGraft4.zip.

CheckSumMemMapFile

ASM
CheckSumMemMapFile PROC

    ;;push    10h
    ;;push    offset `string'+0x3c (76c96fc8)
    ;   Inspecting 0x76c96fc8 shows this is '-1'...
    ;   push 0FFFFFFFFh
    ;; 76c96f08 e8acc5ffff      call    _SEH_prolog (76c934b9)

    PUSH EBP        ; Reference
    MOV EBP, ESP
    
    SUB ESP, 10h    ; Space for 4 Temporary Variables
                    ; T1: EBP-10h use in place of ebp-18h
                    ; T2: EBP-0Ch use in place of ebp-1Ch
                    ; T3: EBP-08h use in place of ebp-20h
                    ; T4: EBP-04h use in place of ebp-04h
                    
    mov     esi,dword ptr [ebp+10h]     ; Header CheckSum Variable (Read From PE Header)
    and     dword ptr [esi],0           ;   Header CheckSum = 0
    mov     eax,dword ptr [ebp+0Ch]     ; File Size
    shr     eax,1                       ;   File Size = File Size / 2
    push    eax                         ; Parameter 3: File Size
    push    dword ptr [ebp+8]           ; Parameter 2: Source (pBaseAddress)
    push    0                           ; Parameter 1: Partial Sum
    
    ; 76c96f1e e856d6ffff      call    _ChkSum@4(76c94579)    
    call _ChkSum
    
    ;; No Longer STDCALL
    ;;   Clean the parameters from the Stack
    ADD ESP, 0Ch
 
    mov     edi,eax                     ; EDI = Return from _ChkSum
    mov     dword ptr [EBP-0Ch],edi     ; Sum
    and     dword ptr [EBP-04h],0       ; File Size = 0???
 
    ;; push    dword ptr [ebp+8]
    ;; 76c96f2f e81ed2ffff      call    RtlpImageNtHeader (76c94152)
    push [ebp+8]                        ; Source (pBaseAddress)
    call _ImageNTHeader
    ADD ESP, 4                          ; Stack Maintenance - No longer STDCALL
 
    mov     dword ptr [EBP-08h],eax
    or      dword ptr [EBP-04h],0FFFFFFFFh    ; EBP-04h = -1
    jmp     _CheckSum_0x4f

    ;; Retain the Noise Bytes
    DB 5 DUP (090h)

    xor     eax,eax
    inc     eax
    ret

    ;; Retain the Noise Bytes
    DB 5 DUP (090h)

    mov     esp,dword ptr [EBP-10h]           ; Local Temporary Storage
    xor     eax,eax
    or      dword ptr [EBP-04h],0FFFFFFFFh    ; Local Temporary Storage
    mov     esi,dword ptr [ebp+10h]           ; Local Temporary Storage
    mov     edi,dword ptr [EBP-0Ch]           ; Local Temporary Storage
 
_CheckSumMemMapFile_0x4f:
    test    eax,eax
    je      _CheckSum_0x90
    cmp     eax,dword ptr [ebp+8]
    je      _CheckSum_0x90
    mov     cx,word ptr [eax+18h]
    cmp     cx,10Bh
    je      _CheckSum_0x6a
    cmp     cx,20Bh
    jne     _CheckSum_0xb5
 
_CheckSumMemMapFile_0x6a:
    lea     ecx,[eax+58h]           ; Existing (Header) Checksum
    mov     edx,dword ptr [ecx]     ; This routine removes the existing
    mov     dword ptr [esi],edx     ;   Checksum from the calculated value
    xor     edx,edx                 ;
    mov     dx,word ptr [ecx]       ; Notice the use of Subtract with Borrow (sbb)
    cmp     di,dx                   ;
    sbb     esi,esi                 ; This is consistent with the Documnetation stating
    neg     esi                     ;  'Calculate the checksum of the file
    add     esi,edx                 ;  with the the existing taken as 0.'
    sub     edi,esi
    movzx   ecx,word ptr [ecx+2]
    cmp     di,cx
    sbb     edx,edx
    neg     edx
    add     edx,ecx
    sub     edi,edx
 
_CheckSumMemMapFile_0x90:
    mov     ecx,dword ptr [ebp+0Ch]
    test    cl,1
    je      _CheckSumMemMapFile_0xa3
    mov     edx,dword ptr [ebp+8]
    movzx   dx,byte ptr [edx+ecx-1]
    add     edi,edx
 
_CheckSumMemMapFile_0xa3:
    movzx   edx,di
    add     edx,ecx
    mov     ecx,dword ptr [ebp+14h]
    mov     dword ptr [ecx],edx
    
    ;; 76c96fb8 e83cc5ffff      call    _SEH_epilog (76c934f9)
    
    ADD ESP, 10h
    POP EBP
 
    ;; No Longer STDCALL
    ;; ret     10h
    ret

CheckSumMemMapFile ENDP

The first change encountered was removing the SEH. The program wraps the operation in a handler, so adding the SEH mechanism at this level was abandoned. The next addition is that of a reference by push ebp and mov ebp, esp.

ASM
SUB ESP, 10h    ; Space for 4 Temporary Variables
                ; T1: EBP-10h use in place of ebp-18h
                ; T2: EBP-0Ch use in place of ebp-1Ch
                ; T3: EBP-08h use in place of ebp-20h
                ; T4: EBP-04h use in place of ebp-04h

The original code would access EBP-0x1C, without reserving stack space. Analysis revealed the stack needed to accommodate four DWORDs. The above accomplishes the task. Below, a manual stack adjustment completes the procedure and the restoration of the EBP.

ASM
ADD ESP, 10h
POP EBP
 
;; No Longer STDCALL
;; ret     10h

ret

As Joe Partridge pointed out, the original port missed the use of ESI above. Since the register was used, it must be saved and restored. EBX, ESI, EDI, and EBP must be preserved during function invocation. EAX, ECX, and EDX are scratch registers.

ChkSum

This procedure is basically unchanged. Since the procedure is moving values placed on the stack (parameters) into registers, a local frame reference was not created. The noticeable effect of conversion is the changing of ret 0Ch to ret since the caller is now cleaning the stack. _ChkSum can be examined in detail in An Analysis of the Windows PE Checksum Algorithm.

ASM
_ChkSum PROC
 
    push    esi
    mov     ecx,[esp+10h]   ; File Size / 2
    mov     esi,[esp+0Ch]   ; Source (pBaseAddress)
    mov     eax,[esp+8]     ; Partial Sum
    shl     ecx,1           ; File Size = File Size * 2
    je      _ChkSum_0x16e
    test    esi,2
    je      _ChkSum_0x2d
    sub     edx,edx
    mov     dx,[esi]
    add     eax,edx
    adc     eax,0
    add     esi,2
    sub     ecx,2

    ...

_ChkSum_0x16e:
    mov     edx,eax         ;; Fold 32 bits in 16
    shr     edx,10h
    and     eax,0FFFFh
    add     eax,edx
    mov     edx,eax
    shr     edx,10h
    add     eax,edx
    and     eax,0FFFFh
    pop     esi
    
    ;; No longer STDCALL
    ;; ret     0Ch
    
    ret
 
_ChkSum ENDP

_ChkSum is not using a local stack frame - it is accessing the parameters using ESP:

ASM
push esi
mov ecx,[esp+10h]   ; File Size / 2
mov esi,[esp+0Ch]   ; Source (pBaseAddress)
mov eax,[esp+8]     ; Partial Sum

This could be converted to use a local frame reference as follows (with the appropriate epilogue):

ASM
PUSH EBP
MOV EBP, ESP

push esi

;; Stack Based
;; mov ecx,[esp+10h] ; File Size / 2
;; mov esi,[esp+0Ch] ; Source (pBaseAddress)
;; mov eax,[esp+8] ; Partial Sum

;; Frame Based
mov ecx,[EBP+10h] ; File Size / 2
mov esi,[EBP+0Ch] ; Source (pBaseAddress)
mov eax,[EBP+08h] ; Partial Sum

In the above conversion, the offsets used to reference values through EBP and ESP were the same. In this example, it was simply coincidence. This may not always be the case.

_ImageNtHeader

_ImageNtHeader is a hand coded replacement for the original call to RtlpImageNtHeader(). The procedure takes the pointer to the memory mapped file, and adds to it the value of e_lfanew of IMAGE_DOS_HEADER. The function returns the sum on success (a pointer to IMAGE_NT_HEADER), or NULL on failure.

ASM
_ImageNtHeader PROC
 
    push ebp
    mov ebp, esp
    
    push esi
  
    ;; ESI = pBaseAdddress
    mov eax, dword ptr[ ebp+08h ]
    mov esi, eax
    
    ;; pBaseAdddress == NULL?
    cmp esi, 0
    je  NULLRETURN
    
    ;; pBaseAdddress == 0xFFFFFFFF?
    cmp esi, 0FFFFFFFFh
    je  NULLRETURN    
    
    ;; MZ Signature
    cmp byte ptr [ESI], 'M'
    jne  NULLRETURN
    cmp byte ptr [ESI+01h], 'Z'
    jne  NULLRETURN
    
    ;; ESI is a pointer to IMAGE_DOS_HEADER
    ;; Grab the e_lfanew DWORD
    ;
    ;    IMAGE_DOS_HEADER
    ;      is 64 bytes (0x40) long
    ;
    ;    e_lfanew occupies bytes
    ;      IMAGE_DOS_HEADER[60-63]
    ;
    ;    ESI+060 is _not_ Hex!!!
    ;
    mov eax, esi
    add eax, dword ptr[ ESI+060 ]   ; value at e_lfanew
    mov esi, eax
    
    ;; PE Signature
    cmp byte ptr [ESI], 'P'
    jne  NULLRETURN
    cmp byte ptr [ESI+01h], 'E'
    jne  NULLRETURN
    cmp byte ptr [ESI+02h], 0
    jne  NULLRETURN
    cmp byte ptr [ESI+03h], 0
    jne  NULLRETURN
    
    ;;
    ;; EAX = IMAGE_NT_HEADER pointer
    ;;
 
    jmp CLEANSTACK
    
NULLRETURN:    
    mov eax, 0
    
CLEANSTACK:    
    pop esi
    pop ebp
    
    ret
 
_ImageNtHeader ENDP

Code Graft 5

The fifth sample incorporates the previous examples, with the addition of optimizations applied to CheckSumMemMapFile and _ChkSum.

Optimized CheckSumMemMapFile

CheckSumMemMapFile can be further cleaned by observing the local variables that serve no purpose in the code. In addition, the artifacts can be removed if the execution path is sent to the 'Abort' jump after cmp cx,20Bh (IMAGE_NT_OPTIONAL_HDR64_MAGIC). The cleaned routine is available in example four.

ASM
CheckSumMemMapFile PROC

    PUSH EBP                      ; Create Local Stack Frame
    MOV EBP, ESP

    PUSH ESI

    mov esi,dword ptr [ebp+10h]   ; Header CheckSum Variable (Read from PE Header)
    and dword ptr [esi],0         ; Header CheckSum = 0
    mov eax,dword ptr [ebp+0Ch]   ; File Size
    shr eax,1                     ; File Size = File Size / 2

    push eax                      ; Parameter 3: File Size
    push dword ptr [ebp+8]        ; Parameter 2: Source (pBaseAddress)
    push 0                        ; Parameter 1: Partial Sum
    call _ChkSum
    ADD ESP, 0Ch                  ; C-CALL, adjust stack

    mov edi,eax                   ; EDI = Return from _ChkSum

    push [ebp+8]                  ; Source (pBaseAddress)
    call _ImageNTHeader
    ADD ESP, 4                    ; C-CALL, adjust stack

    test eax,eax                  ; Return from _ImageNTHeader. Is it NULL?
    je _CheckSum_0x90             ;   Abort
    cmp eax,dword ptr [ebp+8]     ; pBaseAddress == _ImageNTHeader
    je _CheckSum_0x90             ;   Abort
    mov cx,word ptr [eax+18h]     ; IMAGE_OPTIONAL_HEADER.Magic
    cmp cx,10Bh                   ; IMAGE_NT_OPTIONAL_HDR32_MAGIC
    je _CheckSum_0x6a
    cmp cx,20Bh                   ; IMAGE_NT_OPTIONAL_HDR64_MAGIC
    jne _CheckSum_0x90            ;   Abort

_CheckSum_0x6a:
    lea ecx,[eax+58h]             ; ADDRESSOF(IMAGE_OPTIONAL_HEADER.Checksum)
    mov edx,dword ptr [ecx]       ; IMAGE_OPTIONAL_HEADER.Checksum (dereference)
    mov dword ptr [esi],edx       ; Save To Callee parameter dwHeaderCheckSum

    xor edx,edx                   ; EDX = 0
    mov dx,word ptr [ecx]         ; 2 bytes at IMAGE_OPTIONAL_HEADER.Checksum 
    cmp di,dx                     ; DI = result of _ChkSum
    sbb esi,esi
    neg esi
    add esi,edx
    sub edi,esi
    movzx ecx,word ptr [ecx+2]
    cmp di,cx
    sbb edx,edx
    neg edx
    add edx,ecx
    sub edi,edx

_CheckSum_0x90:
    mov ecx,dword ptr [ebp+0Ch]   ; File Size
    test cl,1
    je _CheckSum_0xa3
    mov edx,dword ptr [ebp+8]
    movzx dx,byte ptr [edx+ecx-1]
    add edi,edx

_CheckSum_0xa3:
    movzx edx,di
    add edx,ecx
    mov ecx,dword ptr [ebp+14h]
    mov dword ptr [ecx],edx

    POP ESI

    POP EBP

    ret

CheckSumMemMapFile ENDP

Optimized _ChkSum

A final peep hole optimization can be enjoyed in the main summation loop of _ChkSum. This supplement will take advantage of the processor's ability to schedule simultaneous instructions. The lesser summations (0x40 DWORDs, 0x20 DWORDs, 0x10 DWORDs, etc.) will be skipped since they are encountered at most once during the routine's execution.

Because the most time in this routine is spent executing the loop below (consuming 0x80 DWORDs), a further optimization would include performing push ebx and push edx once. Once summation is complete, perform the respective pops before exiting at jne _ChkSum_0xe8.

ASM
_ChkSum_0xe8:
 
    PUSH EBX
    PUSH EDX
 
    XOR EBX, EBX
    XOR EDX, EDX
 
    add     eax,dword ptr [esi]
    adc     EBX,dword ptr [esi+4]
    adc     EDX,dword ptr [esi+8]
    adc     eax,dword ptr [esi+0Ch]
    adc     EBX,dword ptr [esi+10h]
    adc     EDX,dword ptr [esi+14h]
    adc     eax,dword ptr [esi+18h]
    adc     EBX,dword ptr [esi+1Ch]
    adc     EDX,dword ptr [esi+20h]
    adc     eax,dword ptr [esi+24h]
    adc     EBX,dword ptr [esi+28h]
    adc     EDX,dword ptr [esi+2Ch]
    adc     eax,dword ptr [esi+30h]
    adc     EBX,dword ptr [esi+34h]
    adc     EDX,dword ptr [esi+38h]
    adc     eax,dword ptr [esi+3Ch]
    adc     EBX,dword ptr [esi+40h]
    adc     EDX,dword ptr [esi+44h]
    adc     eax,dword ptr [esi+48h]
    adc     EBX,dword ptr [esi+4Ch]
    adc     EDX,dword ptr [esi+50h]
    adc     eax,dword ptr [esi+54h]
    adc     EBX,dword ptr [esi+58h]
    adc     EDX,dword ptr [esi+5Ch]
    adc     eax,dword ptr [esi+60h]
    adc     EBX,dword ptr [esi+64h]
    adc     EDX,dword ptr [esi+68h]
    adc     eax,dword ptr [esi+6Ch]
    adc     EBX,dword ptr [esi+70h]
    adc     EDX,dword ptr [esi+74h]
    adc     eax,dword ptr [esi+78h]
    adc     EBX,dword ptr [esi+7Ch]
    
    ADC     EAX, EBX
    ADC     EAX, EDX
 
    adc     eax,0
    
    POP EDX
    POP EBX
    
    add     esi,80h
    sub     ecx,80h
    
    jne     _ChkSum_0xe8

    ...

Checksums

  • CodeGraft1.zip
  • MD5: F8958E18071F9FFDE17286AC4243C514

    SHA-1: ECE1C15BA469CCABFF922C453A26C0BD6593CEEF

  • CodeGraft2.zip
  • MD5: B457ED277E848A106F20F94B1CE275F4

    SHA-1: 241E70C0660A652D4015C7787850DBA0684F62F8

  • CodeGraft3.zip
  • MD5: 5DD1A1B16D47385577C8D7FF1DD49041

    SHA-1: 98C5EFE3F2EA6CF5214C8A739FF99E1D60FD56EA

  • CodeGraft4.zip
  • MD5: AC3800CF5714922D9930D7A2EAFCBD5C

    SHA-1: 273C14760D4A438518513677424CE9A54E29294E

  • CodeGraft5.zip
  • MD5: 8F8B25301DB6C77683FF8918CD679B21

    SHA-1: 95D52336EE4EEC1A46D00ACD2BBC10C79489D41B

  • CheckSumAsm.zip
  • MD5: 35EA1BBC97F1A23E8F0B7D943BA0F9F3

    SHA-1: B0BE1D8BF772958191114A55FFB343B5B829E240

  • CodeGraft.zip
  • MD5: c0d4468002f6ff82228323dd226093b5

    SHA: 42bf918481881819fa8a1cc8f519303185964e15

  • PEChecksum.zip
  • MD5: C0D4468002F6FF82228323DD226093B5

    SHA-1: 42BF918481881819FA8A1CC8F519303185964E15

Revisions

  • 03.06.2008: General revisions and article formatting.
  • 11.20.2007: Bug Fix - Added ESI preservation to CheckSumMemMapFile.
  • 11.05.2007: Initial release.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)