Introduction
This article will demonstrate techniques to incorporate compiled machine code into an existing project using ASM source files. The assembly source file will be created from the compiled machine code. In addition, the article will remove the single thread safety limitation of Imagehlp.dll, and show techniques for converting a compiled STDCALL procedure to a C-CALL assembly language routine.
Example one will present a standard console application. The program adds two numbers and then returns the result. The example will introduce a few of the obstacles which will be encountered. Sample two will incorporate the graft using the compiled machine code of Add()
from the first.
Finally, example three will use foreign code from imagehlp.dll to supplement the PEChecksum program. This will remove the requirement of the imagehlp library dependency from a project and fully demonstrate the techniques. The PEChecksum program was presented in An Analysis of the PE Checksum Algorithm. Those who are familiar with x86 assembly, WinDbg, and IDA Pro should begin at example three.
The samples will use standard C as the language. For those interested in reusing compiled C++, please see Paul Vincent Sabanal's and Mark Vincent Yason's BlackHat 2007 presentation, Reversing C++.
Optimizations
To keep the first two examples academic, optimizations will be disabled. In general, release code with optimizations lack the structure desired for the samples presented in this article. Release samples will be built with optimizations disabled (/Od). Refer to Figure 1.
|
Figure 1: Disable Optimizations
|
With optimizations enabled, some function were inlined. For example, Figure 2 shows no corresponding call to a function that is present in the source file. wmain
is entered at 0x00401000. The first function call encountered is at 0x40101D. 0x40101D is a call to cout
. In this case, Add()
was optimized away.
|
Figure 2: Missing Function Call Due to Optimization
|
Another optimization which was not desired was the Frame Pointer Omission. FPO was not used to keep the examples easier to follow. The Frame Pointer is generally created with the instruction sequence shown below. The lack of a Frame Pointer adds a small wrinkle to an otherwise academic exercise. Frame Pointers are addressed in the 'Stack, ESP, and EBP' section.
Global Variables
Variables which are global to the process are placed in either the .data
section, or (if present) the .bss
section. The sections correspond to the initialized and uninitialized data sections respectively. Usually, a program refers to a global variable by address rather than relative based addressing (such as EBP
used in local variables). For example, notice the code generated for access to the global scratch variable in Figure 3.
|
Figure 3: Global Variable Storage Access
|
The address used is 0x00417000. When examining the program in PE Browse, the variable is listed under the .data
section at address 0x00417000. Refer to Figure 4.
|
Figure 4: Allocation of a Global Variable
|
The disassembly and PE header of an uninitialized global variable is shown in Figure 5. The four byte values at address 0x00403374 are garbage: CR, LF, [SPACE], and [SPACE].
|
Figure 5: Allocation of an Uninitialized Global Variable
|
The meaning of push ECX
will be examined in the section Local Variables.
Local Variables
Local variables are stored on the thread's stack. Figure 5 shows a lone push of ECX
, even though ECX
is not used in main()
per se. This is a technique to create local storage by the compiler for the variable i
. Rather than issuing sub ESP, 4
- a three byte opcode (0x83 0xEC 0x04), push ECX
- a one byte opcode (0x51) is used.
|
Figure 6: Local Allocation of a Single Variable
|
When a greater number of variables require storage allocations, a sub ESP, n
is used (where n is the number of bytes required). For example, Figure 7 shows the allocation of five DWORD
variables. Rather than issuing a series of five push ECX
, the compiler issues one sub ESP, 0x14
.
|
Figure 7: Local Allocation of Multiple Variables
|
Stack, ESP, and EBP
The stack is an area in memory which a thread uses as a 'scratch pad' during program execution. Each thread in a process has its own stack. One typically envisions memory for the stack as a contiguous region starting at a low address and moving sequentially to a large address. This is similar to addressing in a heap or an array - A[0] resides at a lower address than A[63]. However, unlike most memory access operations, stacks grow down.
ESP is the stack pointer, and is maintained by the processor. EBP (if used) is maintained by the thread. When the processor encounters a push n
instruction, two actions occur in the order listed below:
- the processor decrements ESP by the machine's word size
- the value n is placed on the stack at ESP
This implies ESP always points to the last value placed on the stack.
|
Figure 8: Argument Size and Call Stack
|
Because the push of a value is always of machine word size, pushing five consecutive bytes consumes 20 bytes (0x14) on the stack - even though the local allocation could use only two DWORD
s (8 bytes). Finally, there is no 0 extend for the truncated push of a byte. Whatever occupied the upper three bytes of register surfaces as part of the function's parameters, even though only the low order byte is of interest. This is embodied in the instruction sequence mov al, byte ptr [a]; push eax
. Refer to figure 8.
The final comment to make with regard to code generation is the compiler's awareness of multiple pipelines in the processor. Rather than reusing eax
by issuing:
mov al, byte ptr [a]
push eax
...
mov al, byte ptr [d]
push eax
...
the compiler will rotate register usage (eax
, ecx
, edx
) so that the execution pipe remains full. This optimization is critical to performance since there are no branches which might otherwise stall execution due to a branch prediction miss. So we could expect to see the following:
mov al, byte ptr [a]
push eax
...
mov dl, byte ptr [d]
push edx
...
When creating a stack frame for based address referencing, the compiler will issue a standard pair of instruction. The frame creates a well defined "function context". The typical prologue for stack based operations is the sequence:
push EBP
mov EBP, ESP
The above sequence is emitted in each function invoked so that a thread (the function) does not accidentally destroy the stack pointer (ESP). Conversely, when a function exits, one typically encounters a stack and EBP restorations. This is required since the calling function has a different reference from which it is working:
pop EBP
ret
One should conclude that EBP can be relative to the function (if used), while ESP is relative to the thread. This implies when one sees EBP-0xn, the function is referring to local storage in the function. EBP+0xn signals the thread is accessing a local variable which was created by the calling function, or a function in the call chain.
OR and XOR
When viewing a disassembly, it is not uncommon to encounter xor eax
, eax
, and or eax, 0xFFFFFFFF
. The first instruction is equivalent to mov eax, 0
, while the second is equivalent to mov eax, -1
. They are optimized versions of the generated code. Usually, the instruction sequences use less space than their equivalent cousins.
Code Graft 1
Code Graft 1 is the base line on which the remaining examples will expand upon. Due to the compiler and linker's behavior, the first example is somewhat more complex than desired. The following will detail the issues encountered and outline the general workarounds used in this article.
The source code to the first sample is listed below. main()
calls Add()
, which adds two numbers. The result is then displayed on standard output.
int main( )
{
DWORD Augend = 32437; DWORD Addend = 15369; DWORD Sum = 0;
Sum = Add ( Augend, Addend );
cout << _T("Augend: ") << Augend << endl;
cout << _T("Addend: ") << Addend << endl;
cout << _T(" Sum: ") << Sum << endl;
return 0;
}
DWORD Add( DWORD Augend, DWORD Addend )
{
DWORD result = Augend + Addend;
return result;
}
The first issue encountered is that storage layout does not honor source code declarations. Refer to Figure 9.
|
Figure 9: Storage Layout vs. Source Code Declaration
|
The variables are declared and initialized in the following order:
However, the layout in memory is:
Addend
(EBP-0x04
) - high memoryAugend
(EBP-0x08
)Sum
(EBP-0x0C
) - low memory
The second issue encountered is the Add()
function's use of variables. Add()
creates a scratch variable (result
) and adds the two values. The result is then returned to main()
. Since Add()
accepts two arguments - Augend
and Addend
, one would expect the function to operate on EBP+0x04 and EBP+0x08. EBP+0x04 and EBP+0x08 are the expected relative base addresses since it is presumed they have been pushed on the stack. For the temporary result, it is expected the value would be returned in one of two ways:
- through the use of the
Sum
variable at EBP+0x00 - through the use of
EAX
However, with execution halted in Add()
, a different scenario is observed. Refer to Figure 10.
|
Figure 10: Storage Layout vs. Source Code Declaration
|
Before executing the instructions of Add()
(but after entering the function), the stack appears as below. Refer to Figure 11.
|
Figure 11: Stack Layout After Calling Add()
|
Once the function has been executed (but before the execution of the return), the stack layout is as shown in Figure 12.
|
Figure 12: Stack Layout
|
Table 1 explains the values with respect to their address (RVA).
Item
| Address
| Value
| Comment |
1
| 0x12FF5C
| BABE
| Add::result (created by Add ) |
2
| 0x12FF60
| 0012FF7C
| EBP of main |
3
| 0x12FF64
| 00401028
| Return address |
4
| 0x12FF68
| 7EB5
| Add::Augden (pushed by main )
|
5
| 0x12FF6C
| 3C09
| Add::Addend (pushed by main )
|
6
| 0x12FF70
| 0
| main::Result
|
7
| 0x12FF74
| 7EB5
| main::Augend |
8
| 0x12FF78
| 3C09
| main::Addend |
Table 1: Stack Layout
|
Code Graft 2
Example two is Code Graft 1 less the function for Add()
in the C++ source file. Code Graft 1's generated code for Add()
under WinDbg is shown below.
00401cc0 55 push ebp
00401cc1 8bec mov ebp, esp
00401cc3 51 push ecx
00401cc4 8b4508 mov eax, dword ptr [ebp+8]
00401cc7 03450c add eax, dword ptr [ebp+0Ch]
00401cca 8945fc mov dword ptr [ebp-4], eax
00401ccd 8b45fc mov eax, dword ptr [ebp-4]
00401cd0 8be5 mov esp, ebp
00401cd2 5d pop ebp
00401cd3 c3 ret
The Graft
At this point, the donor (Code Graft 1) provides 20 bytes of code for the recipient (Code Graft 1). The easiest way to incorporate the functionality is through an assembly file (with a custom build step) added to the project. This method has the added benefit of allowing incorporation of both x32 and x64 routines since inline assembly is not being used.
There are other methods available to incorporate the graft. The first is to edit the memory directly in WinDbg. This creates a temporary exhibit. Second, inline assembly could be used to emit the instruction sequence. This has the down side that inline assembly is not supported on x64 platforms. The third alternative is patching. Patching an executable usually falls under the purview of Viruses and Crackers. Patching is left as an exercise to the reader. Two changes were required to successfully link the executable:
- Change
Add()
to Addition()
- Function prototype of
Addition()
was changed to extern "C"
Add()
was changed to Addition()
because add
is a reserved word in the assembler (MASM). extern "C"
was added due to the name mangling and link error LNK2001: unresolved external symbol "unsigned long __cdecl Addition(unsigned long,unsigned long)" (?Addition@@YAKKK@Z). The nearly unchanged C++ file is shown below:
extern "C" DWORD Addition( DWORD, DWORD );
int main( int argc, char* argv[] )
{
DWORD Augend = 32437; DWORD Addend = 15369; DWORD Sum = 0;
Sum = Addition ( Augend, Addend );
cout << _T("Augend: ") << Augend << endl;
cout << _T("Addend: ") << Addend << endl;
cout << _T(" Sum: ") << Sum << endl;
return 0;
}
To begin, create a file named Addition.asm in the project directory. Next, add the file to the project. Refer to Figure 13.
|
Figure 13: Adding an ASM File to the Project
|
In later versions of Visual Studio, the environment will ask if it should use the masm.rules Custom Build Rules. Select OK. Refer to figure 14.
|
Figure 14: MASM Custom Build Rule
|
If Custom Build Rules is not available, add the following as a Custom Build Step:
- Debug Command Line
- ml -c -Zi "-Fl$(IntDir)\$(InputName).lst" "-Fo$(IntDir)\$(InputName).obj" "$(InputPath)"
- Release Command Line
- ml -c "-Fl$(IntDir)\$(InputName).lst" "-Fo$(IntDir)\$(InputName).obj" "$(InputPath)"
- Outputs
- $(IntDir)\$(InputName).obj
After adding the ASM file to the project, the project will appear as in Figure 15.
|
Figure 15: Addition of ASM File
|
Next, add the following to Addition.asm. The code below demonstrates the minimum requirements for an assembly procedure.
PUBLIC Addition
.486
.MODEL FLAT, C
.CODE
Addition PROC
push ebp
mov ebp, esp
push ecx
mov eax, dword ptr [ebp+8]
add eax, dword ptr [ebp+0Ch]
mov dword ptr [ebp-4], eax
mov eax, dword ptr [ebp-4]
mov esp, ebp
pop ebp
ret
Addition ENDP
END
PUBLIC Addition
informs the linker that the procedure Addition
is available for any module to use. .486
is a Processor Directive. .MODEL
is a Simplified Segment Directive which directs MASM to generate code for a particular memory model. The language ("C") informs the assembler of the calling convention.
.CODE
is another Simplified Segment Directive. .CODE
begins the code section, while END
marks the end of the code section. Other sections exist, such as .DATA
. PROC
and ENDP
are Procedure Directives which book-end the Addition
function. Additional procedures would be book-ended in a similar manner with a different label. Residing in the Addition
procedure is the copy and paste code from Code Graft 1.
After compiling and linking, the first thing that is noticed is the second executable is 0x200 bytes (one paragraph) smaller than the first program. Refer to Figure 14.
|
Figure 14: Comparison of Executable File Sizes
|
However, the generated code has remained unchanged as far as execution of main()
and Addition()
. Refer to Figure 15.
|
Figure 15: Source Code Analysis in WinDbg
|
Code Graft 3
Code Graft 3 will demonstrate code grafting from a foreign executable. Specifically, it will reuse CheckSumMappedFile()
from Imagehlp.dll. Imagehlp.dll is a single threaded library, so this is an opportunity to improve the function. A nearly complete treatment of the PE Checksum algorithm was presented in An Analysis of the Windows PE Checksum Algorithm.
To begin, download the PE Checksum Source code. Open StdAfx.h and comment the references to imagehlp.dll; and add a prototype for CheckSumMemMapFile(). The name change was incorporated due to linking with Imagehlp.lib (even though it was not specified).
extern "C" {
PIMAGE_NT_HEADERS CheckSumMemMapFile(
PVOID BaseAddress,
DWORD FileLength,
PDWORD ExistingCheckSum,
PDWORD CalculatedCheckSum
);
}
extern "C"
is required due to name mangling. Notice also that WINAPI
(a macro for __stdcall
) is missing. This is due to link errors when attempting to link the object files. I suspect this might be a packing issue, but I have not investigated further.
LNK1190: invalid fixup found, type 0x0002
Since the model is no longer STDCALL
, the routines used from Imagehlp.dll will require conversion. The three most prevalent issues are:
- STDCALL to C Call Conversion (stack cleanup)
- Addition of Local Frame References (EBP)
- Artifact Cleanup
Add an assembly file to the project named "CheckSum.asm". Create three procedures in CheckSum.asm: CheckSumMemMapFile
, _ChkSum
, and _ImageNtHeader
. Acquire the assembly code for CheckSumMappedFile()
and ChkSum()
from imagehlp.dll and place it in CheckSum.asm under their respective procedures. Leave _ImageNtHeader
empty at this point. _ImageNtHeader
will be a hand coded replacement used in lieu of Imagehlp.dll's call to RtlpImageNtHeader()
of NTDLL.DLL.
The only procedure which requires the PUBLIC
attribute is CheckSumMemMapFile
. This leaves the _ChkSum
and _ImageNtHeader
'private' procedures for use by CheckSumMemMapFile
.
Alternately, use the listing files for the functions provided in this article. The files are CheckSummMappedFile.listing and ChkSum.listing. The listing files were created from a Copy and Paste operation in WinDbg while examining the original CodeGraft.exe. Refer to Figure 16.
|
Figure 16: WinDbg Copy and Paste
|
Labels
At this point, the listing includes memory addresses and opcodes, and mnemonics. Create labels for any jumps encountered (changing CheckSumMappedFile
to CheckSumMemMapFile
). For example, at 0x76c96f3b is the following instruction:
76c96f3b eb1d jmp imagehlp!CheckSumMappedFile+0x4f (76c96f5a)
The jump target is 0x76C9F5A. At that location, create a label. Note that the label name is based on the location provided by the disassembly (the '+' has been changed to '_'):
76c96f3b eb1d jmp imagehlp!CheckSumMappedFile+0x4f (76c96f5a)
...
76c96f57 8b7de4 mov edi,dword ptr [ebp-1Ch]
CheckSumMemMapFile_0x4f:
76c96f5a 85c0 test eax,eax
Finally, clean the original instruction to coincide with the jump to the label:
76c96f3b eb1d jmp CheckSumMemMapFile_0x4f
Artifacts
There are areas of the code which appear to be artifacts. Examine 0x76c96fc8 for instance. Since there is no assembly mnemonic to generate the opcode, create the code using the DB
directive. Note that when using hex notation in MASM, prefix the number with a '0'. DUP
is an operator which creates a data byte the requested number of times.
DB 3 DUP(0FFh)
DB 0FFh, 042h, 06Fh
DB 0C9h
DB 076h, 04Bh
DB 06Fh
DB 0C9h
DB 076h, 090h
DB 4 DUP(090h)
Finally, we can remove the unneeded material in the listing. To remove an item in the listing, simply comment it:
mov esi,dword ptr [ebp+10h]
and dword ptr [esi],0
mov eax,dword ptr [ebp+0Ch]
shr eax,1
push eax
push dword ptr [ebp+8]
push 0
call _ChkSum
Additional Fixups
The original code installed a Structured Exception Handler upon entry. The CodeGraft.exe code wraps the code in a handler so the installation can be skipped. The removing of the handler is realized by commenting out the call above. This creates a stack imbalance that will be addressed in the STDCALL to C CALL conversion.
STDCALL to C CALL Conversion
This step requires the most analysis. This is due to the fact that Frame Pointers are missing. So each procedure will receive the customary:
push ebp
mov ebp, esp
Once the additional push is encountered, diligence must be paid to code/stack dependencies. The CheckSumMemMapFile
is shown below. Instructions in capital letters were added for stack management. Commented lines were removed. Finally, STDCALL
performs a ret n
, where n is an adjustment to ESP
. C-CALL uses a vanilla ret
, with the callee performing the stack adjustment. The result of the cleanup is available as CodeGraft4.zip.
CheckSumMemMapFile
CheckSumMemMapFile PROC
PUSH EBP
MOV EBP, ESP
SUB ESP, 10h
mov esi,dword ptr [ebp+10h]
and dword ptr [esi],0
mov eax,dword ptr [ebp+0Ch]
shr eax,1
push eax
push dword ptr [ebp+8]
push 0
call _ChkSum
ADD ESP, 0Ch
mov edi,eax
mov dword ptr [EBP-0Ch],edi
and dword ptr [EBP-04h],0
push [ebp+8]
call _ImageNTHeader
ADD ESP, 4
mov dword ptr [EBP-08h],eax
or dword ptr [EBP-04h],0FFFFFFFFh
jmp _CheckSum_0x4f
DB 5 DUP (090h)
xor eax,eax
inc eax
ret
DB 5 DUP (090h)
mov esp,dword ptr [EBP-10h]
xor eax,eax
or dword ptr [EBP-04h],0FFFFFFFFh
mov esi,dword ptr [ebp+10h]
mov edi,dword ptr [EBP-0Ch]
_CheckSumMemMapFile_0x4f:
test eax,eax
je _CheckSum_0x90
cmp eax,dword ptr [ebp+8]
je _CheckSum_0x90
mov cx,word ptr [eax+18h]
cmp cx,10Bh
je _CheckSum_0x6a
cmp cx,20Bh
jne _CheckSum_0xb5
_CheckSumMemMapFile_0x6a:
lea ecx,[eax+58h]
mov edx,dword ptr [ecx]
mov dword ptr [esi],edx
xor edx,edx
mov dx,word ptr [ecx]
cmp di,dx
sbb esi,esi
neg esi
add esi,edx
sub edi,esi
movzx ecx,word ptr [ecx+2]
cmp di,cx
sbb edx,edx
neg edx
add edx,ecx
sub edi,edx
_CheckSumMemMapFile_0x90:
mov ecx,dword ptr [ebp+0Ch]
test cl,1
je _CheckSumMemMapFile_0xa3
mov edx,dword ptr [ebp+8]
movzx dx,byte ptr [edx+ecx-1]
add edi,edx
_CheckSumMemMapFile_0xa3:
movzx edx,di
add edx,ecx
mov ecx,dword ptr [ebp+14h]
mov dword ptr [ecx],edx
ADD ESP, 10h
POP EBP
ret
CheckSumMemMapFile ENDP
The first change encountered was removing the SEH. The program wraps the operation in a handler, so adding the SEH mechanism at this level was abandoned. The next addition is that of a reference by push ebp
and mov ebp, esp
.
SUB ESP, 10h
The original code would access EBP-0x1C
, without reserving stack space. Analysis revealed the stack needed to accommodate four DWORD
s. The above accomplishes the task. Below, a manual stack adjustment completes the procedure and the restoration of the EBP
.
ADD ESP, 10h
POP EBP
ret
As Joe Partridge pointed out, the original port missed the use of ESI
above. Since the register was used, it must be saved and restored. EBX
, ESI
, EDI
, and EBP
must be preserved during function invocation. EAX
, ECX
, and EDX
are scratch registers.
ChkSum
This procedure is basically unchanged. Since the procedure is moving values placed on the stack (parameters) into registers, a local frame reference was not created. The noticeable effect of conversion is the changing of ret 0Ch
to ret
since the caller is now cleaning the stack. _ChkSum
can be examined in detail in An Analysis of the Windows PE Checksum Algorithm.
_ChkSum PROC
push esi
mov ecx,[esp+10h]
mov esi,[esp+0Ch]
mov eax,[esp+8]
shl ecx,1
je _ChkSum_0x16e
test esi,2
je _ChkSum_0x2d
sub edx,edx
mov dx,[esi]
add eax,edx
adc eax,0
add esi,2
sub ecx,2
...
_ChkSum_0x16e:
mov edx,eax
shr edx,10h
and eax,0FFFFh
add eax,edx
mov edx,eax
shr edx,10h
add eax,edx
and eax,0FFFFh
pop esi
ret
_ChkSum ENDP
_ChkSum
is not using a local stack frame - it is accessing the parameters using ESP
:
push esi
mov ecx,[esp+10h]
mov esi,[esp+0Ch]
mov eax,[esp+8]
This could be converted to use a local frame reference as follows (with the appropriate epilogue):
PUSH EBP
MOV EBP, ESP
push esi
mov ecx,[EBP+10h]
mov esi,[EBP+0Ch]
mov eax,[EBP+08h]
In the above conversion, the offsets used to reference values through EBP
and ESP
were the same. In this example, it was simply coincidence. This may not always be the case.
_ImageNtHeader
_ImageNtHeader
is a hand coded replacement for the original call to RtlpImageNtHeader()
. The procedure takes the pointer to the memory mapped file, and adds to it the value of e_lfanew
of IMAGE_DOS_HEADER
. The function returns the sum on success (a pointer to IMAGE_NT_HEADER
), or NULL
on failure.
_ImageNtHeader PROC
push ebp
mov ebp, esp
push esi
mov eax, dword ptr[ ebp+08h ]
mov esi, eax
cmp esi, 0
je NULLRETURN
cmp esi, 0FFFFFFFFh
je NULLRETURN
cmp byte ptr [ESI], 'M'
jne NULLRETURN
cmp byte ptr [ESI+01h], 'Z'
jne NULLRETURN
mov eax, esi
add eax, dword ptr[ ESI+060 ]
mov esi, eax
cmp byte ptr [ESI], 'P'
jne NULLRETURN
cmp byte ptr [ESI+01h], 'E'
jne NULLRETURN
cmp byte ptr [ESI+02h], 0
jne NULLRETURN
cmp byte ptr [ESI+03h], 0
jne NULLRETURN
jmp CLEANSTACK
NULLRETURN:
mov eax, 0
CLEANSTACK:
pop esi
pop ebp
ret
_ImageNtHeader ENDP
Code Graft 5
The fifth sample incorporates the previous examples, with the addition of optimizations applied to CheckSumMemMapFile
and _ChkSum
.
Optimized CheckSumMemMapFile
CheckSumMemMapFile
can be further cleaned by observing the local variables that serve no purpose in the code. In addition, the artifacts can be removed if the execution path is sent to the 'Abort' jump after cmp cx,20Bh (IMAGE_NT_OPTIONAL_HDR64_MAGIC)
. The cleaned routine is available in example four.
CheckSumMemMapFile PROC
PUSH EBP
MOV EBP, ESP
PUSH ESI
mov esi,dword ptr [ebp+10h]
and dword ptr [esi],0
mov eax,dword ptr [ebp+0Ch]
shr eax,1
push eax
push dword ptr [ebp+8]
push 0
call _ChkSum
ADD ESP, 0Ch
mov edi,eax
push [ebp+8]
call _ImageNTHeader
ADD ESP, 4
test eax,eax
je _CheckSum_0x90
cmp eax,dword ptr [ebp+8]
je _CheckSum_0x90
mov cx,word ptr [eax+18h]
cmp cx,10Bh
je _CheckSum_0x6a
cmp cx,20Bh
jne _CheckSum_0x90
_CheckSum_0x6a:
lea ecx,[eax+58h]
mov edx,dword ptr [ecx]
mov dword ptr [esi],edx
xor edx,edx
mov dx,word ptr [ecx]
cmp di,dx
sbb esi,esi
neg esi
add esi,edx
sub edi,esi
movzx ecx,word ptr [ecx+2]
cmp di,cx
sbb edx,edx
neg edx
add edx,ecx
sub edi,edx
_CheckSum_0x90:
mov ecx,dword ptr [ebp+0Ch]
test cl,1
je _CheckSum_0xa3
mov edx,dword ptr [ebp+8]
movzx dx,byte ptr [edx+ecx-1]
add edi,edx
_CheckSum_0xa3:
movzx edx,di
add edx,ecx
mov ecx,dword ptr [ebp+14h]
mov dword ptr [ecx],edx
POP ESI
POP EBP
ret
CheckSumMemMapFile ENDP
Optimized _ChkSum
A final peep hole optimization can be enjoyed in the main summation loop of _ChkSum
. This supplement will take advantage of the processor's ability to schedule simultaneous instructions. The lesser summations (0x40 DWORD
s, 0x20 DWORD
s, 0x10 DWORD
s, etc.) will be skipped since they are encountered at most once during the routine's execution.
Because the most time in this routine is spent executing the loop below (consuming 0x80 DWORD
s), a further optimization would include performing push ebx
and push edx
once. Once summation is complete, perform the respective pops before exiting at jne _ChkSum_0xe8
.
_ChkSum_0xe8:
PUSH EBX
PUSH EDX
XOR EBX, EBX
XOR EDX, EDX
add eax,dword ptr [esi]
adc EBX,dword ptr [esi+4]
adc EDX,dword ptr [esi+8]
adc eax,dword ptr [esi+0Ch]
adc EBX,dword ptr [esi+10h]
adc EDX,dword ptr [esi+14h]
adc eax,dword ptr [esi+18h]
adc EBX,dword ptr [esi+1Ch]
adc EDX,dword ptr [esi+20h]
adc eax,dword ptr [esi+24h]
adc EBX,dword ptr [esi+28h]
adc EDX,dword ptr [esi+2Ch]
adc eax,dword ptr [esi+30h]
adc EBX,dword ptr [esi+34h]
adc EDX,dword ptr [esi+38h]
adc eax,dword ptr [esi+3Ch]
adc EBX,dword ptr [esi+40h]
adc EDX,dword ptr [esi+44h]
adc eax,dword ptr [esi+48h]
adc EBX,dword ptr [esi+4Ch]
adc EDX,dword ptr [esi+50h]
adc eax,dword ptr [esi+54h]
adc EBX,dword ptr [esi+58h]
adc EDX,dword ptr [esi+5Ch]
adc eax,dword ptr [esi+60h]
adc EBX,dword ptr [esi+64h]
adc EDX,dword ptr [esi+68h]
adc eax,dword ptr [esi+6Ch]
adc EBX,dword ptr [esi+70h]
adc EDX,dword ptr [esi+74h]
adc eax,dword ptr [esi+78h]
adc EBX,dword ptr [esi+7Ch]
ADC EAX, EBX
ADC EAX, EDX
adc eax,0
POP EDX
POP EBX
add esi,80h
sub ecx,80h
jne _ChkSum_0xe8
...
Checksums
- CodeGraft1.zip
MD5: F8958E18071F9FFDE17286AC4243C514
SHA-1: ECE1C15BA469CCABFF922C453A26C0BD6593CEEF
- CodeGraft2.zip
MD5: B457ED277E848A106F20F94B1CE275F4
SHA-1: 241E70C0660A652D4015C7787850DBA0684F62F8
- CodeGraft3.zip
MD5: 5DD1A1B16D47385577C8D7FF1DD49041
SHA-1: 98C5EFE3F2EA6CF5214C8A739FF99E1D60FD56EA
- CodeGraft4.zip
MD5: AC3800CF5714922D9930D7A2EAFCBD5C
SHA-1: 273C14760D4A438518513677424CE9A54E29294E
- CodeGraft5.zip
MD5: 8F8B25301DB6C77683FF8918CD679B21
SHA-1: 95D52336EE4EEC1A46D00ACD2BBC10C79489D41B
- CheckSumAsm.zip
MD5: 35EA1BBC97F1A23E8F0B7D943BA0F9F3
SHA-1: B0BE1D8BF772958191114A55FFB343B5B829E240
- CodeGraft.zip
MD5: c0d4468002f6ff82228323dd226093b5
SHA: 42bf918481881819fa8a1cc8f519303185964e15
- PEChecksum.zip
MD5: C0D4468002F6FF82228323DD226093B5
SHA-1: 42BF918481881819FA8A1CC8F519303185964E15
Revisions
- 03.06.2008: General revisions and article formatting.
- 11.20.2007: Bug Fix - Added ESI preservation to
CheckSumMemMapFile
. - 11.05.2007: Initial release.