Introduction
This article will discuss the Microsoft algorithm for generating PE Checksums. Microsoft does not appear to publicly offer the specification, including a lack of documentation in the Microsoft Portable Executable and Common Object File Format Specification. Additionally, Matt Pietrek does not offer the algorithm in his works.
Contrary to popular belief, the algorithm used by Microsoft is not a CRC. This is despite the fact that Knowledge Base articles such as 65122 claim a 32 bit CRC is used (shown below).
Below is the breakout of four octets used in the checksum. The unused byte is set to 0, effectively creating a 24 bit checksum. It is noteworthy that the algorithm - an Additive Checksum - is a redundancy check; however it is not a CRC. A CRC has additional mathematical properties desireable in the problem domain of error detection.
For those interested in Microsoft's Outlook CRC Algorithm, see Frank Sconzo's thread CRC on Unix vs Win32 on comp.lang.perl.misc. Tim Heaney reveals Microsoft's method.
Finally, for the reader who is interested in the complete algorithm, Peter Szor presents the high level design in The Art Of Computer Virus Research And Defense. The W32/Kriz virus internally computed the checksum of infected files to mask tampering. Szor documents the algorithm and virus in Section 6.2.8.1.12, 'Checksum Recalculation'.
The topics to be covered are as follows. Note that Generating Loop and Generating Loop Setup appear to be out of order. This is by design, and will make sense when visited. This article will examine the main Generating Loop and Generating Loop Setup in detail, while offering no treatment of the final linear transformations. The goal of the omissions is to preserve Microsoft's Security through Obscurity policy.
- Tools
- Symbols
- ImageHlp.dll
- WinDbg
- Analysis
- Prologue
- Generating Loop
- Generating Loop Setup
- Epilogue
- Effects of Time Stamping
- Practical Attack
- PE Checksum Algorithm
- RFC-1071 C Implementation
- Summary
Tools
This article uses four tools to examine the algorithm:
- PEChecksum.exe
- WinDbg
- UltraEdit
- UltraCompare
PEChecksum is used to interrogate an EXE's checksum and write an arbitrary checksum. In addition, the executable depends on ImageHlp.dll. The dependency forces the link-loader to load the Image Help so that a live target analysis can proceed. PEChecksum is available with source code in the download area.
WinDbg is used because SoftICE is no longer available from Compuware; and perhaps even better: WinDbg is free. WinDbg is available for download from Microsoft's site.
kd, OllyDbg, or SyserDebug should suffice the reader as a replacement if desired. The reader should keep in mind that WinDbg is authored by the Windows Operating System Team. This is in contrast to the Developer Team which produces Visual Studio, or third parties which produce competing products such as OllyDbg or SyserDebug.
UltraEdit from IDM software can be used as a hex editor to write values to the executable file. UltraCompare is used to verify differences between the original file and the altered file.
Symbols
It is highly recommended that the reader set up local Symbols on his or her machine. John Robbins covers the subject extensively in Debugging Microsoft .NET 2.0 Applications, Chapters 2 and 6. John also provides an excellent script (OSsyms.js) to automate the symbol retrieval process.
For those who do not have access to the text, visit [Debugging] - Symbols by Johnathan [Darka]. Note that a local Symbol Server is not required - simply use the server provided by Microsoft. However, the reader does want the symbols available locally.
ImageHlp.dll
Imagehlp.dll is a library which exposes many debugging and maintenance routines to the programmer. Noteworthy is that the library is not thread safe. For techniques to reuse code of Imagehlp in a thread safe manner, see Grafting Compiled Code: Unlimited Code Reuse.
ImageHlp.dll offers function CheckSumMappedFile(). CheckSumMappedFile()
internally calls function ChkSum()
, which is the workhorse of checksum computation. The signature of CheckSumMappedFile()
is shown below.
PIMAGE_NT_HEADERS CheckSumMappedFile(
PVOID BaseAddress,
DWORD FileLength,
PDWORD HeaderSum,
PDWORD CheckSum
);
BaseAddress
is returned from MapViewOfFile(), and FileLength
can be determined by GetFileSize() or GetFileSizeEx(). The Loader will map a view of the executable during process creation by way of MtMapViewOfSection()
(which
offers a forward API face of MapViewOfFile()
). For a more detailed examination of the Loader's behavior, see Dynamic TEXT Section Image Verification.
Previous Microsoft SDKs
According to Ken Johnson, Microsoft MVP, past SDKs did in fact have CheckSumMappedFile()
in source code, less ChkSum()
:
imagehlp used to be an SDK sample a long time ago, with the pointed omission of the source code for the "ChkSum" helper function used by CheckSumMappedFile(). From the SDK sample source, it looks like it was intended to be linked in from another library or a .asm file that was never shipped with the sample code.
Additionally, the earlier SDK provides the following prototype:
USHORT ChkSum(
DWORD PartialSum,
PUSHORT Source,
DWORD Length
);
and
PartialSum = ChkSum(0, (PUSHORT)BaseAddress, (FileLength + 1) >> 1);
WinDbg
A more complete introduction to WinDbg is available on Code Project at Windows Debuggers: Part 1: A WinDbg Tutorial by Saikat Sen. The author prefers three windows to be present in WinDbg: the Command Window, the Disassembly Window, and the Register Window. At any given time, a Memory Window and Call Stack Window may make an appearence as required. Take the time in advance to open and dock the windows. As John Robbins states
with
regards to the floating windows, "they have a serious z-order problem".
Based on how lucky the author is, the Command Window usually docks on top of the lower two windows. To open the various Windows, use the View menu. Below is a typical view of a WinDbg session.
Commands
Below is list of typical commands used for a session. Users who are familiar with gdb or SoftICE should feel right at home, without the need for CTRL-D to break into SoftICE.
F5 | Run |
F9 | In the Disassembly Window, set breakpoint |
F10 | Step Over |
F11 | Step Into |
Shift-F11 | Step Out |
| |
bp <address> | Set breakpoint at address |
bl | List breakpoints |
bd # | Disable breakpoint |
be # | Enable breakpoint |
bc | Clear breakpoint |
| |
ba <address> | Break on Access (Hardware Breakpoint) |
ba <address> <passes> | Break on Access after number of passes |
| |
lm | List loaded modules |
ld | Load modules |
| |
u <address> | Disassemble at address |
| |
x *!* | Display all Symbols for all loaded Modules |
x <module>!* | Display Symbols for module |
| |
CTRL-Break | Break the Debuggee (return to the Command Window) |
| |
dd <address|register> | Display data as double word at address or register |
dw <address|register> | Display data as word at address or register |
db <address|register>
| Display data as byte at address or register |
| |
.cls | Clear Command Window |
| |
.restart
| Restart the Debuggee (target executable) |
.attach pid
| Attach to a process |
.detach
| Detach from a process |
| |
?
| Help on a Debuggee command |
.help
| Help on a Debugger command |
.hh command
| Help from the Help File (debugger.chm) on command |
Once an executable file is loaded in WinDbg, the program will be suspended. To begin execution, press F5 or type g in the Command Window after setting the desired breakpoints. Then, exercise the excutable as one would when using the Visual Studio debugger.
Cross Platform Analysis
For those interested in analizing Windows code using Linux and gdb, see Joe Stewart's Reverse Engineering Hostile Code and Alien Autopsy: Reverse Engineering Win32 Trojans on Linux at the SecurityFocus website.
For Unix and Linux, objdump (with it's PERL based wrapper dasm) and gdb are two of the tools of choice. gdb supports debugging of C, C++, Java, Fortran and Assembly among other languages. In addition, gdb is designed to work closely with the GNU Compiler
Collection (GCC). objdump and dasm collectively act as full disassembler. Alternately, one can run Windows applications such as IDA on Linux using Wine, which acts as a compatibility layer for running Windows programs on Linux.
Analysis
Analysis will proceed with a presumed ImageHlp.dll load address of 0x76C9000. If the reader does not know what modules are loaded, execute the list modules command in WinDbg (lm) after opening the executable. Below PEChecksum.exe was opened in WinDbg, resulting in the following modules being loaded due to dependencies.
deferred is displayed because WinDbg is using a lazy algorithm for resolving symbols. To force the loading of symbols, use the load symbols command (ld). The command can take a module name, or the wildcard (*) to load all symbols.
There are four areas of interest in the analysis: Prologue, Loop Setup, Generating Loop, and Epilogue. Prologue is similar to Loop Setup, except Prologue sets up registers and the stack - it is a lower level setup. Loop Setup, as defined in this article, works at a higher level. Epilogue is the area this article will remain vague, since some additional calculations are occurring which will not be revealed. The analysis is left as an exercise to the reader.
Note that the analysis will proceed out of order since once one grasps the essence of the main Generating Loop, what is occurring in the Generating Loop Setup will be more apparent.
Before the examination can begin, two key pieces of information are required to aide in tracing program flow: The BaseAddress
and the FileSize
. These values are 0xB7000 and 0x10E00 respectively. This information can be gathered from OutputDebugString()
in the executable if the reader has instrumented such. There are other means, such as examining the return value from MapViewOfFile()
in WinDbg.
Prologue
To examine Prologue, issue bp imagehlp!ChecksumMappedFile. This line is highlighted in red by WinDbg in the above capture. The blue highlight indicates the next instruction sequence to be executed. From above, a Structured Exception Handler is installed at 0x76C96F03. Then values are pushed onto the stack in preparation for calling ChkSum
.
76c96f03 68c86fc976 push offset imagehlp!`string'+0x3c (76c96fc8)
76c96f08 e8acc5ffff call imagehlp!_SEH_prolog (76c934b9)
76c96f0d 8b7510 mov esi,dword ptr [ebp+10h]
76c96f10 832600 and dword ptr [esi],0
76c96f13 8b450c mov eax,dword ptr [ebp+0Ch]
76c96f16 d1e8 shr eax,1
76c96f18 50 push eax
76c96f19 ff7508 push dword ptr [ebp+8]
76c96f1c 6a00 push 0
76c96f1e e856d6ffff call imagehlp!ChkSum (76c94579)
It is desired to determine what is being moved into ESI at 0x76C96F0D (mov esi, dword ptr [ebp+10h]) in preparation for the call. Issue dd ebp. This will display double word values at the address of the base pointer:
0xB70000 is the file mapping view base address, which is pushed on the stack (from above).
76c96f13 8b450c mov eax,dword ptr [ebp+0Ch]
76c96f16 d1e8 shr eax,1
76c96f18 50 push eax
Next, EAX is populated with the file's size 0xE100 (shown above). For an unknown reason, a right shift occurs, effectively pushing 0xE100/2 = 0x7080 onto the stack. It is presumed by the author this is from the original adaptation of RFC 1071, which operated on word size (16 bit) quanities. Many thanks to Peter Bell for bringing to light the signifigance of the shift pertaining to machine word sizes.
Stepping into 0x76C96F1E (private function ChkSum()
), one can examine the final steps of the prologue (shown below).
76c94579 56 push esi
76c9457a 8b4c2410 mov ecx,dword ptr [esp+10h]
76c9457e 8b74240c mov esi,dword ptr [esp+0Ch]
76c94582 8b442408 mov eax,dword ptr [esp+8]
76c94586 d1e1 shl ecx,1
The following is occuring:
- ESI is preserved
- ESI = Base Address
- ECX = File Size/2 (0x7080)
- EAX = 0
Then, the original file size is restored with shl ecx, 1.
76c94586 d1e1 shl ecx,1
Notably missing is push ebp and mov ebp, esp. This sequence is generally emitted by the compiler since it is bad karma to operate directly on the Stack Pointer (ESP). This could also indicate a Frame Pointer Omission (FPO) generated by the compiler.
At this point, it is apparent ESI will be used as a pointer to the value being consumed, and ECX will be a counter. The role of EAX will be revealed below.
One final note on Prologue. The code below is testing for a 0 file size by way of the je instruction. If the file size is 0, the entire Loop Setup and Generating Loop execution paths are bypassed. This should imply to the reader various return values are being prepared at 0x76C946E7.
...
76c94582 8b442408 mov eax,dword ptr [esp+8]
76c94586 d1e1 shl ecx,1
76c94588 0f8459010000 je imagehlp!ChkSum+0x16e (76c946e7)
Generating Loop
To set a breakpoint on the main generating loop, issue bp imagehlp!ChkSum+0xE8 (or address 0x76C94661). The main generating loop is shown below. The loop consumes 0x80 bytes (0x20 DWORDS) of program data at a time. When a double word is consumed, it is added to a running total in EAX (adc assembly instruction). By using the adc instruction, the algorithm realizes a 33 bit register rather than a 32 bit register.
76c94661 0306 add eax,dword ptr [esi]
76c94663 134604 adc eax,dword ptr [esi+4]
76c94666 134608 adc eax,dword ptr [esi+8]
...
76c946b7 134674 adc eax,dword ptr [esi+74h]
76c946ba 134678 adc eax,dword ptr [esi+78h]
76c946bd 13467c adc eax,dword ptr [esi+7Ch]
76c946c0 83d000 adc eax,0
The last instruction (adc eax, 0) simply adjusts the flags register. The instructions following the main loop are book keeping (shown below).
76c946c3 81c680000000 add esi,80h
76c946c9 81e980000000 sub ecx,80h
76c946cf 7590 jne imagehlp!ChkSum+0xe8 (76c94661)
ESI (the file mapping pointer) is incremented, and ECX (the counter) is decremented. After the sub ecx, 80h is executed, control will either be passed to the top of the loop at 0x76C94661 (if ECX does not equal 0) or fall into other code (ECX = 0). So the reader can infer some of the Epilogue resides at 0x76C946D1.
In the case of the PEChecksum.exe, the file size is 0xE100. The loop operates on blocks of 0x80, so the loop will execute 0x1C2 or 450 iterations. This begs the question: What if the file size is not a multiple of 0x80? This question is answered in the Loop Setup.
Generating Loop Setup
The loop setup code starts at 0x76C9458E, and ends at 0x76c94660 (the byte before the Main Generating Loop). Consider the first test:
76c9458e f7c602000000 test esi,2
if [br=1] (branch = true), jump to the next test.
76c94594 7410 je imagehlp!ChkSum+0x2d (76c945a6)
Otherwise some additions are performed as in the main generating loop - but not a complete loop:
76c94596 2bd2 sub edx,edx ; Zero EDX
76c94598 668b16 mov dx,word ptr [esi] ; set up for addition
76c9459b 03c2 add eax,edx ; Add 1 WORD to eax (2 = sizeof(WORD) )
76c9459d 83d000 adc eax,0 ; adjust EFLAG register
76c945a0 83c602 add esi,2 ; adjust ESI
Then into the next test (test esi, 8) at 76c945b3.
76c945b3 f7c108000000 test ecx,8
if [br=1] (branch = true), jump to the next test. Otherwise:
76c945b9 7414 je imagehlp!ChkSum+0x56 (76c945cf)
76c945bb 0306 add eax,dword ptr [esi]
76c945bd 134604 adc eax,dword ptr [esi+4]
76c945c0 83d000 adc eax,0
76c945c3 83c608 add esi,8
76c945c6 83e908 sub ecx,8
Excuting the code results in an add, adjust pointer, and decrement counter. The observation from above is sizeof
( 2 DWORDS ) = 8. Next, another test/branch. The code of interest is:
76c945cf f7c110000000 test ecx,10h
76c945d5 741a je imagehlp!ChkSum+0x78 (76c945f1)
76c945d7 0306 add eax,dword ptr [esi]
76c945d9 134604 adc eax,dword ptr [esi+4]
76c945dc 134608 adc eax,dword ptr [esi+8]
76c945df 13460c adc eax,dword ptr [esi+0Ch]
76c945e2 83d000 adc eax,0
76c945e5 83c60C add esi,0Ch
76c945e8 83e90C sub ecx,0Ch
Again, add, adjust pointer, and decrement counter. The observation from above is sizeof
( 4 DWORDS ) = 16 = 10h. Following this pattern (before the main loop), up to and including test ecx,40h will cause ECX % 0x80 = 0 (where % is the customary modular reduction). At this point, the main generating loop executes and the remaining files size to be processed is a multiple 0x80.
Ken Johnson points out that the loop prologue could be an optimaization by adjusting for alignments. The author also suspected such, but cannot state the reasoning in a definitive manner. Finally, Parch Andri suggested this is similar to Loop Unwinding with a Duff Device. The code may be hand written rather than emitted by the compiler.
Epilogue
As stated earlier, the Epilogue will be ommitted. However, one observation is noteworthy to support the author's opinion of an adaptaion of RFC 1071: the folding of the 32 bit value into a 16 bit value. The instruction sequence is a follows. Note the folding occurs twice in case a Carry was generated during the first fold.
76c946e7 8bd0 mov edx,eax ; EDX = EAX
76c946e9 c1ea10 shr edx,10h ; EDX = EDX >> 16 EDX is high order
76c946ec 25ffff0000 and eax,0FFFFh ; EAX = EAX & 0xFFFF EAX is low order
76c946f1 03c2 add eax,edx ; EAX = EAX + EDX High Order Folded into Low Order
76c946f3 8bd0 mov edx,eax ; EDX = EAX
76c946f5 c1ea10 shr edx,10h ; EDX = EDX >> 16 EDX is high order
76c946f8 03c2 add eax,edx ; EAX = EAX + EDX High Order Folded into Low Order
76c946fa 25ffff0000 and eax,0FFFFh ; EAX = EAX & 0xFFFF EAX is low order 16 bits
Effects of Time Stamping
Each time an executable is compiled, a time stamp is embedded in the executable. Should the reader add whitespace to a source file (to force recompilation), the checksum will be different. The effects of time stamping are also present in object files.
Practical Attack
The author has never been one to shy away from an implementation. To this end, an attack on the checksum is presented. Originally, the author desired to demonstrate the weakness of a CRC as an integrity verification tool, arguing that software authors should embrace Assembly Code Signing. The article was originally going to perform the following:
- Modify Notepad by adding line numbers relative to the cursor position (based on the work of Razzia)
- Determine the new Checksum
- Use Reversing CRC - Theory and Practice to determine the noise bytes to modify such that the altered EXE has the same checksum as the original EXE
Interestingly, the W32/Kriz virus performed the same operation to mask it's tampering of infected files in 1999. The described seige is clearly overkill based on Microsoft's scheme. Instead, an attack which is easier to understand will be demonstrated which shows altering of bytes on DWORD
boundaries goes undetected.
The assault will begin on the innocuous string, "This program cannot be run in DOS mode." The string is '$' terminated since it is an assembly language string (as opposed to NULL
termination for a C string). After the '$', a string of NULL
s is encountered as noise bytes for padding the remainder of the stub program.
So, it is desired to swap one DWORD
with another. If we envisioned the tail of the string ("This program ...") as a character representation in an array, it would look as follows:
..., '$', 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, ...
Performing the DWORD swap, the executable would be modified as follows:
..., 0x00, 0x00, 0x00, 0x00, '$', 0x00, 0x00, 0x00, ...
Viewing the difference in the files using IDM's UltraCompare reveals the following binary difference at 0x78. The difference at 0x7C is not in view:
And finally, the resulting Checksum of the altered notepad.exe. It too has a checksum of 0x00014F7F.
Both the original and altered notepad programs are available for download. For those who question whether the file has been truly altered, one only needs to look at the different MD5 hashes of the files (recall that both files have the same checksum):
- Original Notepad: 388B8FBC36A8558587AFC90FB23A3B99
- Altered Notepad: 76CB8109191B87E769371B6340A0B257
Since the CRC-32 is of interest, the output of CRC32.exe:
- Original Notepad: C1227F19
- Altered Notepad: 1652D843
RFC 1071 Implementation
The authors of RFC 1071 offer a 32 bit implementation of the Computing the Internet Checksum as follows (taken from Section 4.1):
register long sum = 0;
while( count > 1 ) {
sum += * (unsigned short) addr++;
count -= 2;
}
if( count > 0 )
sum += * (unsigned char *) addr;
while (sum>>16)
sum = (sum & 0xffff) + (sum >> 16);
checksum = ~sum;
Summary
It is left to the reader to conclude how similar the Microsoft PE Checksum algorithm is to the Internet Checksum Algorithm. To the author, it is readily apparent. In the author's humble opinion, Microsoft misapplied the Additive Checksum. Microsoft did not keep in mind the additional error detection which occurs in the upper layers in the OSI model or TCP/IP stack. In the Microsoft application of the checksum, the checksum is the only layer of detection.
A larger concern to the author is that the checksum is one of the last safeguards between loading a Kernel Driver or Trusted Service and the Operating System. What is disappointing is that Microsoft choose speed over a slighly more robust CRC-32. To add to the woes, Microsoft's PE format leaves plenty of room for tampering after the MS-DOS 2.0 Compatible EXE Header. Perhaps Microsoft will close these holes with Code Signing
and Next Generation Secure Computing Base in the future.
As of this writing, Microsoft only requires Code Signing for x64 based Vista Kernel Mode drivers. The reader is invited to read Kernel-Mode Code Signing Walkthrough.
In fairness to Microsoft, the author would like to offer David Delaune's somewhat differing opinion:
... the algorithm used for the PE checksum was weak, when analyzing the algorithm from a historical viewpoint... it was doing exactly what it was designed to do; simple file integrity and memory failure detection. Microsoft was certainly not concerned with security during Win 3.1 development it seems. In addition, processing power was far less.
However, I do not believe Microsoft should put forth any effort upgrading this algorithm. And I believe they must have also reached the same conclusion, because within PE32+ (magic 0x20b) the checksum field is the same size. I personally believe code signing is the correct direction for Microsoft. Although its current implementation has flaws which are beyond the scope of this simple comment.
Finally, this is the author's opinion, and clearly open to misinterpretations during the analysis. Please feel free to correct any mistakes.
Acknowledgements
- Dr. A. Brooke Stephens who laid my Cryptographic foundations
- Ken Johnson, Microsoft MVP
- Peter Bell, Crypto++ Mailing List
- Parch Andri, Crypto++ Mailing List
Revisions
- 03.08.2008 Article and Link Cleanup
- 11.05.2007 Added Reference to Grafting Compiled Code
- 10.29.2007 Added Reference to Szor's Book
- 10.29.2007 Added Reference to W32/Kriz Virus
- 10.20.2007 Added Reference to gdb, objdump, and dasm
- 10.13.2007 Added Cross Platform Analysis
- 09.22.2007 Gramatical Corrections
- 08.24.2007 Added Reference to David Delaune's Comments
- 08.24.2007 Added Reference to Windows Debuggers: A WinDbg Tutorial
- 08.22.2007 Added Clarification on an Additivie Checksum
- 07.22.2007 Added Reference to Kernel-Mode Code Signing Walkthrough
- 07.21.2007 Added Reference to possible loop setup alignment optimization
- 07.21.2007 Added Reference to Microsoft Outlook CRC Calculation
- 07.06.2005 Added WinDbg Command x
- 07.01.2007 Added Note on FPO
- 06.30.2007 Added Checksum Octet Breakout Graphic
- 06.30.2007 Added Note on ADC Instruction (33 bit Adds)
- 06.26.2007 Incorporated Ken Johnson's Comments
- 06.25.2007 Added Practical Attack
- 06.25.2007 Added CRC32 download
- 06.24.2007 Initial Release
Downloads
Checksums
- PEChecksum.zip
MD5: 830804E7427612285FC08997C127E68C
SHA-1: 582FC2E3194B34DF9F8B49BCDB70F40D59100E28 - PEChecksumSource.zip
MD5: 521D0B260377D5835BA3F6C715D040DC
SHA-1: EA4F6B60CB94CE3800C6F2510B0318919600FBBA - NotepadAltered.zip
MD5: B1D0F0668408E1C8946DDB38EFFE5CCE
SHA-1: 90F52F3014828DCE0FB6DAAF6D6F3CA3C751953B - NotepadOriginal.zip
MD5: E4FD9A097723BBC1702DBA70DDE0F967
SHA-1: 74809A95DF40AC24B610A92089B3A2FA4006235D - CRC32.zip
MD5: 5BCA445EF5ED629A6DAE2727A826E0AB
SHA-1: 4F0990D4764BBB1F493772E96CD00F7EAD6862FE - CRC32Source.zip
MD5: F39CA9B45041B2730A502DAF296792F8
SHA-1: 517DB8DE335B16A4132334A482656CA37EB4D37B