Downloads
Introduction
Determining if a file's disk image has been altered after loading into memory can be a useful operation. Reasons for doing so would include hardware fault isolation - whether memory or disk - and detecting the effects of a binary patcher.
Academia has been studying the problem of a program image integrity checking itself (and related issues) for some time. In academic literature this is known as self checksumming. Papers of interest include Strengthening Software Self-Checksumming via Self-Modifying Code by Giffin, Christodorescu, and Kruger; Watermarking, Tamper-Proofing, and Obfuscation - Tools for Software Protection by Collberg and Thomborson; Architectural Support for Copy and Tamper-Resistant Software; and finally Glen Wurster's thesis A Generic Attack on Hashing-Based Software Tamper Resistance.
Microsoft employs a passive and semi-passive integrity system called Windows File Protection, which monitors Operating System files for inadvertent replacement. The passive system uses System File Checker to scan protected files for inadvertent replacement. In this system, the user must manually launch the tool to initiate the operation (hence the passive). In the semi-passive system, a protected directory is monitored. If the OS determines a file has been improperly replaced, the file will be restored from the cache, network installation point, or Windows CD.
It is not possible for a developer to request protection from the Operating System. This may not be a bad situation, since Microsoft performs poorly when protecting it's own binaries from tampering. For examples of kernel patching, see Eliminating Explorer's Delay when Deleting an In Use File or ClearType over Remote Desktop in Windows XP by Dan Farino. To this end, this article will present the reader with the framework for performing integrity verification using Cryptographic Hash functions with Crypto++.
Should the reader be inclined, part two of this article is available: Tamper Aware and Self Healing Code. Post-Build Executable Back Patching is also available, which demonstrates how to automate the process of back pacthing a value into a compile executable.
This article does not cover the Linker and Loader behavior as Matt Pietrek's various Microsoft System Journal articles. The reader is encouraged to visit his articles listed in the Resources. Whether using Pietrek's articles or developing tools for examining executables, the reader should find that Microsoft is sufficiently vague in certain areas of the Portable Executable and Common Object File Format Specification and undocumented in others.
Crypto++
The samples provided use Crypto++ Hashes. Crypto++ can be downloaded from Wei Dai's Crypto++ pages. For compilation and integration issues, visit Integrating Crypto++ into the Microsoft Visual C++ Environment. This article is based upon assumptions presented in the previously mentioned article. For those who are interested in other C++ Cryptographic libraries, please see Peter Gutmann's Cryptlib or Victor Shoup's NTL.
Image Execution
Depending on the source we use, we are told that the executable section can be found by name (for example, .text, .code, .textbss), or by examining the sections of an executable searching for IMAGE_SCN_MEM_EXECUTE
. Naming standards do not exist for sections of the PE File (though common section names are usually used), and a compiled executable can have its characteristics modified such that IMAGE_SCN_CNT_CODE
and IMAGE_SCN_MEM_EXECUTE
are no longer present. Altered in these ways, an image will still be loaded and executed by the Operating System. This is because the OS will execute code that has been mapped from a file (SEC_IMAGE
attribute) using AddressOfEntryPoint
to determine the executable section of the file. When we attempt to determine the .text section, we find the executable section by finding a section such that SectionStart ≤ AddressOfEntryPoint < SectionEnd (which is usually the distinguished '.text' section). Matt Pietrek demonstrates the technique using PEDUMP in An In-Depth Look into the Win32 Portable Executable File Format, Part 1. See GetEnclosingSectionHeader()
in common.cpp of PEDUMP.
Reading a Disk Image
The first step in developing the system is based on the disk image. There is very little difference between an on-disk and in-memory image. Regardless of whether we use a pointer acquired via CreateFileMapping()
or a HMODULE
from GetModuleHandle()
, the various structures are the same. The most noticeable differences occur between Debug and Release builds of the executable, where Debug builds tend to have more indirection using stubs or jump tables in the binary.
Winnt.h defines the structures and constants or interest. The first structure of interest is IMAGE_DOS_HEADER
- referred to as the MS-DOS 2.0 Compatible EXE Header - located at Byte 0. The fields of interest are e_magic
and e_lfanew
. For the purposes of this article, e_magic
should be IMAGE_DOS_SIGNATURE
, the familiar "MZ." Once e_magic
is verified, the DOS header is stepped over. How far to step is determined by e_lfanew
. In between the IMAGE_DOS_HEADER
and IMAGE_NT_HEADERS
(the New Executable header) is the stub "This program cannot be run in DOS mode." Figure 1 displays the Hex dump of IMAGE_DOS_HEADER
and the stub program.
|
Figure 1: DOS Header and Stub Program
|
The first order of business is mapping the disk file into memory. The task is accomplished as follows. CreateFile()
, CreateFileMapping()
, and MapViewOfFile()
are biased to read only opens (employing the concept of least privilege).
if( 0 == GetModuleFileName( NULL, szFilename, PATH_SIZE ) )
{ return -1; }
hFile = CreateFile( szFilename, GENERIC_READ, FILE_SHARE_READ,
NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, 0);
if ( hFile == INVALID_HANDLE_VALUE )
{ return -1; }
hFileMapping = CreateFileMapping( hFile, NULL,
PAGE_READONLY, 0, 0, NULL );
if ( NULL == hFileMapping )
{ return -1; }
pBaseAddress = MapViewOfFile( hFileMapping,
FILE_MAP_READ, 0, 0, 0 );
if ( NULL == pBaseAddress )
{ return -1; }
The code to inspect the IMAGE_DOS_HEADER
follows. Note that the HANDLE returned from CreateFileMapping()
will usually be a memory address on the order of 0x00350000
in debug builds.
pDOSHeader = static_cast<PIMAGE_DOS_HEADER>( pBaseAddress );
if( pDOSHeader->e_magic != IMAGE_DOS_SIGNATURE )
{ return -1; }
pNTHeader = reinterpret_cast<PIMAGE_NT_HEADERS>(
(PBYTE)pMappedFile + pDOSHeader->e_lfanew );
Once in possession of the IMAGE_NT_HEADERS
pointer (pNTHeader
), the next step is to verify that the Signature
field of the header is IMAGE_NT_SIGNATURE
. IMAGE_NT_SIGNATURE
is four bytes consisting of "PE" and two NULL
bytes. Once the Signature has been verified, a thorough examination can begin. Winnt.h shows the structure of IMAGE_NT_HEADERS
:
typedef struct _IMAGE_NT_HEADERS {
DWORD Signature;
IMAGE_FILE_HEADER FileHeader;
IMAGE_OPTIONAL_HEADER32 OptionalHeader;
};
At this point, the reader should familiarize themselves with 32 and 64 bit versions of the NT header structures, as well as the IMAGE_FIRST_SECTION
macro defined in Winnt.h. For the purposes of this article, only IMAGE_NT_HEADERS32
will be examined.
PIMAGE_NT_HEADERS pNTHeader = NULL;
pNTHeader = reinterpret_cast<PIMAGE_NT_HEADERS>(
(PBYTE)pBaseAddress + pDOSHeader->e_lfanew );
if(pNTHeader->Signature != IMAGE_NT_SIGNATURE )
{ return -1; }
PIMAGE_FILE_HEADER pFileHeader = NULL;
pFileHeader = reinterpret_cast<PIMAGE_FILE_HEADER>(
(PBYTE)&pNTHeader->FileHeader );
PIMAGE_OPTIONAL_HEADER pOptionalHeader = NULL;
pOptionalHeader = reinterpret_cast<PIMAGE_OPTIONAL_HEADER>(
(PBYTE)&pNTHeader->OptionalHeader );
if( IMAGE_NT_OPTIONAL_HDR32_MAGIC != pNTHeader->OptionalHeader.Magic )
{ return -1; }
The IMAGE_NT_HEADERS
yields an IMAGE_FILE_HEADER
and IMAGE_OPTIONAL_HEADER
. The File Header is also referred to as the COFF header. Below is a view of a Debug build File Header structure. 0x00004550 is a byte swapped signature of PE\0\0.
|
Figure 2: File Header
|
The PE Browse view of the Optional Header is shown below. The Optional Header is not optional; it is required in executable files, but not COFF object files.
|
Figure 3: Optional Header
|
Fields of interest in IMAGE_FILE_HEADER
and IMAGE_OPTIONAL_HEADER
include the following.
IMAGE_FILE_HEADER
| Machine
|
IMAGE_FILE_HEADER
| NumberOfSections
|
IMAGE_OPTIONAL_HEADER
| MajorLinkerVersion and MinorLinkerVersion
|
IMAGE_OPTIONAL_HEADER
| SizeOfCode
|
IMAGE_OPTIONAL_HEADER
| BaseOfCode
|
IMAGE_OPTIONAL_HEADER
| SizeOfInitializedData
|
IMAGE_OPTIONAL_HEADER
| SizeOfUninitializedData
|
IMAGE_OPTIONAL_HEADER
| BaseOfData
|
IMAGE_OPTIONAL_HEADER
| AddressOfEntryPoint
|
IMAGE_OPTIONAL_HEADER
| ImageBase
|
In addition, the IMAGE_OPTIONAL_HEADER
has an array (IMAGE_NUMBEROF_DIRECTORY_ENTRIES
- 16 each) IMAGE_DATA_DIRECTORY
. To find the program's entry point at runtime - which is CRT code for the command line samples - one would use the following:
pImageBase + pNTHeader->OptionalHeader.AddressOfEntryPoint
Stepping over the previous two headers reveals a IMAGE_SECTION_HEADER
. These sections are the familiar .text
, .textbss
(if incremental linking is enabled), .data
, .rdata
, etc. Winnt.h defines the IMAGE_SECTION_HEADER
as follows:
typedef struct _IMAGE_SECTION_HEADER {
BYTE Name[IMAGE_SIZEOF_SHORT_NAME];
union {
DWORD PhysicalAddress;
DWORD VirtualSize;
} Misc;
DWORD VirtualAddress;
DWORD SizeOfRawData;
DWORD PointerToRawData;
DWORD PointerToRelocations;
DWORD PointerToLinenumbers;
WORD NumberOfRelocations;
WORD NumberOfLinenumbers;
DWORD Characteristics;
};
The IMAGE_SECTION_HEADER
is documented at MSDN. Taking from the document:
Name | An 8-byte, null-padded UTF-8 string. There is no terminating null character if the string is exactly eight characters long. For longer names, this member contains a forward slash (/) followed by an ASCII representation of a decimal number that is an offset into the string table. Executable images do not use a string table and do not support section names longer than eight characters
|
Misc.PhysicalSize | The file address
|
Misc.VirtualAddress | The total size of the section when loaded into memory, in bytes. If this value is greater than the SizeOfRawData member, the section is filled with zeroes. This field is valid only for executable images and should be set to 0 for object files
|
VirtualAddress | The address of the first byte of the section when loaded into memory, relative to the image base. For object files, this is the address of the first byte before relocation is applied
|
PointerToRawData
| A file pointer to the first page within the COFF file. This value must be a multiple of the FileAlignment member of the IMAGE_OPTIONAL_HEADER structure. If a section contains only uninitialized data, this member is zero
|
Since this article is concerned with modified code, the section of interest is .text
when using the Visual Studio line of products. As both Stephen Hewitt and Ken Johnson point out, there is no requisite naming convention. On the x86 architecture, the .text
section will almost always have characteristics IMAGE_SCN_CNT_CODE
, IMAGE_SCN_MEM_READ
and IMAGE_SCN_MEM_EXECUTE
.
As Jeffrey Richter states, on other architectures IMAGE_SCN_MEM_READ
and IMAGE_SCN_MEM_EXECUTE
may be encountered separately, since the processor may enforce the distinction. For the purposes of this article, the default section name is used. To find the .text
section, we loop over the IMAGE_SECTION_HEADER
s until a section encompasses AddressOfEntryPoint
. The number of the section is retrieved using pNTHeader->FileHeader.NumberOfSections
.
DWORD dwEntryPoint = pNTHeader->OptionalHeader.AddressOfEntryPoint;
UINT nSectionCount = pNTHeader->FileHeader.NumberOfSections;
for( UINT i = 0; i < nSectionCount; i++ )
{
if( pSectionHeader->VirtualAddress <= dwEntryPoint &&
dwEntryPoint < pSectionHeader->VirtualAddress +
pSectionHeader->Misc.VirtualSize )
{ break; }
pSectionHeader++;
}
Once the loop completes, pSectionHeader
will be a valid pointer to the executable's code (.text
) section. At this point, we can use pBaseAddress
- the pointer acquired from MapViewOfFile()
- and pSectionHeader->PointerToRawData
to determine the start of the .text
section on disk. To determine the size of the .text
section, we would use pSectionHeader->Misc.VirtualSize
.
pCodeStart = (PVOID)((PBYTE)pBaseAddress +
pSectionHeader->PointerToRawData );
dwCodeSize = pSectionHeader->Misc.VirtualSize;
Now armed with a foundation, one can use PE Browse to reveal the various structures of the PE executable. For example, the Import Directory found in IMAGE_OPTIONAL_HEADER
, IMAGE_DATA_DIRECTORY
can be found at Virtual Address 0x0D4000.
|
Figure 4: Image Directories
|
Examining this area of the disk file in fact displays the Import Table.
|
Figure 5: Import Section
|
Finally, the code to read an executable's .text
section on disk would be as follows:
int _tmain(int argc, _TCHAR* argv[])
{
PVOID pBaseAddress = NULL;
DWORD dwRawData = 0;
PVOID pEntryPoint = NULL;
PVOID pCodeStart = NULL;
PVOID pCodeEnd = NULL;
SIZE_T dwCodeSize = 0;
GatherDiskImageInformation( pBaseAddress, dwRawData,
pEntryPoint, pCodeStart, dwCodeSize, pCodeEnd );
DumpDiskImageInformation( pBaseAddress, dwRawData,
pEntryPoint, pCodeStart, dwCodeSize, pCodeEnd );
HexDump( pCodeStart, pCodeStart, DUMP_SIZE );
return 0;
}
|
Figure 6: Disk Image
|
The difference between Base Address (0x350000) above and PE Browse base address (0x434000) below is superficial: 0x350000 was returned from MapViewOfFile()
while 0x00434000 is derived from the Image Base Address and Virtual Address in the headers. That is, PE Browse calculates where the image will be in memory using OptionalHeader.ImageBase
.
|
Figure 7: File Header
|
Reading a Memory Image
The code to read a memory image is nearly the same as that of a disk image. The differences are:
- DOS Header location is determined by
HMODULE
, rather than MapViewOfFile
VirtualAddress
is used in combination with AddressOfEntryPoint
to locate .text
, rather than PointerToRawData
int _tmain(int argc, _TCHAR* argv[])
{
HMODULE hModule = NULL;
PVOID pVirtualAddress = NULL;
PVOID pCodeStart = NULL;
PVOID pCodeEnd = NULL;
SIZE_T dwCodeSize = 0;
...
pDOSHeader = static_cast<PIMAGE_DOS_HEADER>( (PVOID)hModule );
if(pDOSHeader->e_magic != IMAGE_DOS_SIGNATURE )
{ return -1; }
...
DWORD dwEntryPoint = pNTHeader->OptionalHeader.AddressOfEntryPoint;
UINT nSectionCount = pNTHeader->FileHeader.NumberOfSections;
for( UINT i = 0; i < nSectionCount; i++ )
{
if( pSectionHeader->VirtualAddress <= dwEntryPoint &&
dwEntryPoint < pSectionHeader->VirtualAddress +
pSectionHeader->Misc.VirtualSize )
{ break; }
pSectionHeader++;
}
...
}
Verifying Integrity
To determine if the executable's .text
section has been modified, Sample 3 combines Sample 1 (on-disk) and Sample 2 (in-memory) with MD5 Hashing. MD5 was chosen because it provides short signatures (16 bytes), which are easily displayed as part of the article.
In production, we would use a hash satisfying the requirements of a MDC, or Message Detection Code. MDCs are also known as Manipulation Detection Codes or less commonly Message Integrity Codes. MDCs satisfy two properties: One Way Hash Function (OWHF) and Collision Resistant Hash Function (CRHF). Hashes such as Whirlpool, RIPE-MD, or SHA-2 family (SHA224, SHA256, etc) comply with the requirements. These hash functions are preferred in part due to their digest length - each produces a signature of at least 160 bits.
Because we now calculate both the on-disk and in-memory information, we have adjusted our variable accordingly. We also introduce the hash variables and objects:
PVOID pBaseAddress = NULL;
DWORD dwRawData = 0;
PVOID pDiskEntryPoint = NULL;
PVOID pDiskCodeStart = NULL;
PVOID pDiskCodeEnd = NULL;
SIZE_T dwDiskCodeSize = 0;
HMODULE hModule = NULL;
PVOID pVirtualAddress = NULL;
PVOID pMemoryEntryPoint = NULL;
PVOID pMemoryCodeStart = NULL;
PVOID pMemoryCodeEnd = NULL;
SIZE_T dwMemoryCodeSize = 0;
MD5 hash;
BYTE cbDiskHash[ MD5::DIGESTSIZE ];
BYTE cbMemoryHash[ MD5::DIGESTSIZE ];
Two functions CalculateHash()
and DumpHashInformation()
have been added. Below, CalculateHash()
is shown. The function takes a HashTransformation
reference, which is a base class of all hash classes in Crypto++.
BOOL CalculateHash( HashTransformation& hash,
PVOID pMessage, SIZE_T nMessageSize,
PBYTE pcbHashBuffer, SIZE_T nHashBufferSize )
{
if( nHashBufferSize != hash.DigestSize() )
{
ZeroMemory( pcbHashBuffer, nHashBufferSize );
return FALSE;
}
hash.Update( (const PBYTE)pMessage, nMessageSize );
hash.Final( pcbHashBuffer );
return TRUE;
}
In the listing above, we reuse the MD5 hash object - the object hashes both the on-disk and in-memory images. Calling Final()
to retrieve the hash value of the message (the executable image) resets the object.
Debug Builds
|
Figure 8: Disk versus Memory Image Signature (Under Debugger)
|
In Figure 8 the signatures are not consistent. This is because the capture was performed under the Visual Studio debugger. The debugger inserts software breakpoints, which affects the in-memory hash value. Although not very useful, a Debug build can be verified by running the program from the command prompt. The result is shown in Figure 9, which produces expected results. If we compare Figures 8 and, 9 we see that the on-disk signature is consistent regardless of whether the program is being hosted by the debugger.
|
Figure 9: Disk versus Memory Image Signature (Outside Debugger)
|
To overcome the effects of the debuggers using software breakpoints, we can use WinDbg and set a breakpoint using a Debug Register. Taking from Ken Johnson:
...in WinDbg, if you use the 'ba' command then the code bytes in question will not be modified (i.e. substituted with an 0xcc/int 3). You are limited to 4 simultaneously active 'ba' breakpoints as they use the hardware supplied debug registers, which only support four target addresses.
Release Builds
Figure 10 demonstrates running a release build of sample three. The noticeable change (besides different signatures between debug and release builds) is the release code size is nearly 10 times smaller than a debug build.
|
Figure 10: Disk versus Memory Image Signature, Release Build
|
Binding and Rebasing
Binding an executable refers to writing the addresses of imported functions into the IAT of an executable. More precisely, when an image is bound the IMAGE_THUNK_DATA structures in the IAT are overwritten with the actual address of the imported function. This is done so the loader does not have to determine the address of an imported function and write the address at load time, thereby speeding up the load process. Since we do not use the IAT in our calculation of the digest, it does not affect our results. We will examine the effects of binding a DLL in the Dynamic Link Library section.
Rebasing a DLL is common to avoid load address collisions. It is a procedure the loader will perform on DLLs when a conflict arises. Executable files are not rebased per se. Changing the base address of an executable is uncommon, but not unheard of. For example, the command interpreter (cmd.exe) on Windows XP, SP2 has a preferred base address of 0x4AD00000. JMP and CALL instructions emitted by the compiler use offsets relative to the instruction, rather than absolute addresses in the 32-bit flat segment. If the image needs to be loaded somewhere other than its specified base address, generated instructions don't need to change since they are using relative addressing.
|
Figure 11: Rebased Executable, Release Build
|
In Figure 11, sample three was compiled with a base address of 0x00500000. As we can see, we have a different signature than expected (F0:A7:3B:5E...40:CF:E9:39 in a customary base address). When we rebase the executable, we find that more changes have occurred than simply changing ImageBase
and AddressOfEntryPoint
. Sample four investigates this situation further.
In Sample 4, all code has been removed except the GatherImageInformation()
and DumpImageInformation()
. Next, we add the following to dump the .text section to disk using the filename 'textdump.bin'.
StringSource(
(const BYTE*)pCodeStart, dwCodeSize,
true, new FileSink( "textdump.bin", true )
);
Finally, we run sample four using base addresses of both 0x400000 and 0x500000. When we examine the binary files using a difference program, we see that there are over 2800 difference. Investigating further, we find that the differences tend to be small, usually consisting of one byte. The first change occurs at byte 8 (0x43 vs 0x53). The next 15 changes are the same at varying offsets. At difference 16, the byte change is 0x44 to 0x54.
|
Figure 12: Binary Difference of .text Sections
|
The binary dumps (textdump.bin) of the .text sections are available in Sample 4. Next, we use PE Browse to examine the compiled code. We know that the .text section will start at either 0x401000 or 0x501000, depending on the image base of the executable we are examining. The disassembly is shown below in Figure 13.
|
Figure 13: Disassembly of .text Section
|
According to the disassembly, the cause of the disparity is due to three factors. First, the exception handler parameter (0x533F6E) is being placed on the stack, which is a function address (constant) in the .text section. The second cause is the function address of objects being called through their vtable entry (for example, see 0x5013A8). In this case, the function address is being loaded based the object's layout which is located in the .rdata section. The dereference into the .rdata section is not relative - it is absolute.
The final reason for the signature difference is Microsoft's Buffer Security Check introduced in Visual Studio 2003. Buffer security check is similar to GCC's StackGuard (which uses canary values) and IBM's ProPolice which offers enhancements to StackGuard (see also Secure Programmer: Countering Buffer Overflows).
The security cookie is the return address of a function call XOR'd with a random value which is then placed next to the return address in memory. If an attacker attempts to overwrite the return address with a buffer overflow (stack smash), the check of the security cookie will usually catch exploit.
In all cases above, the differences arise because of memory addresses which are absolute in the 32 bit flat address space.. If we were motivated, we could parse the .reloc section of the executable and fix the image offline (if the image was built without the /FIXED linker switch). For a discussion of the IMAGE_BASE_RELOCATION directory, see Pietrek's Peering Inside the PE: A Tour of the Win32 Portable Executable File Format.
Dynamic Link Libraries
Up to this point, we have examined executable files while only touching on DLLs when a DLL needed consideration. We will now examine DLLs in detail. To begin, we will add a second project to the solution - a Win32 DLL - to create Sample 5. Select a Win32 Console Application as shown in Figure 14. When prompted by the wizard, select an Application Type of 'DLL'. There is no need to check 'Export Symbols'.
|
Figure 14: DLL Project Creation
|
Add a DEF file and EXPORT GatherImageInformation:
|
Figure 15: DEF File
|
Next, move the function GatherImageInformation()
from the executable and into the DLL. Add __declspec(dllimport)
to the original project (VerifyIntegrity.exe):
__declspec(dllimport)
VOID GatherImageInformation( HMODULE& hModule,
PVOID& pVirtualAddress, PVOID& pEntryPoint,
PVOID& pCodeStart, SIZE_T& dwCodeSize, PVOID& pCodeEnd );
And __declspec(dllexport)
to the DLL implementation (VerifyIntegrityDll.dll):
__declspec(dllexport)
VOID GatherImageInformation( HMODULE& hModule,
PVOID& pVirtualAddress, PVOID& pEntryPoint,
PVOID& pCodeStart, SIZE_T& dwCodeSize, PVOID& pCodeEnd )
{
...
}
When we run the program, we observe the results of Figure 16. Since the dynamic library calls GetModuleHandle( NULL )
, the return value is that of the executable - 0x00400000 in this case.
|
Figure 16: DLL Returning Exe Information
|
Finally, Sample 5 adds code to perform both executable and library in-memory interrogations by the DLL. In essence, we have moved GatherInfromation()
functionality of sample three into the DLL. The result of running sample five is shown in Figure 17.
|
Figure 17: Dll and Exe In-Memory Information
|
Examining the disassembly of the DLL confirms the entry point. Note that the entry point is the runtime's entry point, and not DllMain()
. The initialization call graph of interest for the DLL is as follows. Note that DllMain
is optional - we happen to use it (though it simply performs a return TRUE
). See MSDN's DllMain Callback Function and Dynamic-Link Library Entry-Point Function.
- _DllMainCRTStartup
- __DllMainCRTStartup
- DllMain
|
Figure 18: DLL Entry Point
|
While the majority of the code is equivalent for the .text section of both an executable and DLL, there is one critical point: how do we retrieve the image base of the dynamic library (even if loaded at an address other than preferred)? For this, we will use the pseudo-variable __ImageBase
. The variable is available in Visual Studio 7.0 and above. It is initialized by the linker and adjusted by the loader as required. According to Raymond Chen on The Old New Thing, "[Using] HINSTANCEs as a [sic] base-address [is valid] ... since the base address gets relocated with the rest of the DLL. A relocatable reference to an address at rva=0 should have done the trick, and with this pseudo-variable, it has been formalized." Chen also states the variable is valid for static libraries which have not yet been linked to an executable.
If you don't have access to __ImageBase
due to an earlier version of Visual Studio, use MEMORY_BASIC_INFORMATION
and VirtualQuery()
to obtain the AllocationBase
. Finally, this discussion does not apply to Windows CE.
EXTERN_C IMAGE_DOS_HEADER __ImageBase;
PIMAGE_DOS_HEADER pDOSHeader =
static_cast<PIMAG_DOS_HEADER>( __ImageBase );
Sample 6 moves GatherDiskImageInformation()
and GatherMemoryImageInformation()
into the DLL such that the operations are performed on the DLL. The DLL uses __ImageBase
and GetModuleFileName()
to determine the disk file to open.
TCHAR szFilename[ MAX_PATH ] = { 0 };
if( 0 == GetModuleFileName( (HMODULE)&__ImageBase,
szFilename, MAX_PATH ) ) { return -1; }
hFile = CreateFile( szFilename, GENERIC_READ, FILE_SHARE_READ,
NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, 0);
When executed out side of the debugger, we observe the expected results. We can also observe these results under the debugger if software breakpoints are set in the executable, and not the DLL.
|
Figure 19: Dll Image Verification
|
Rebasing
If we specify a different base address for the DLL during compilation and linking, we see that the new base address affects the hash of the .text section. However, the hash of the on-disk image is consistent with the in-memory image. Figure 20 below, a base address of 0x08000000 was specified.
|
Figure 20: Dll Image Verification, Base Address 0x08000000
|
Next, we use rebase.exe on the DLL to set the preferred base address back to 0x10000000. The results are exactly the same as specifying the base address during compile time (shown in Figure 21).
|
Figure 21: Dll Image Verification, Rebase Address 0x10000000
|
Next, we will rebase the DLL to 0xA0000000. After the rebase, we examine the executable using Dependency Walker (depends.exe) to verify the base address.
|
Figure 22: Dependency Walk, Base Address 0xA0000000
|
When we run the executable, we observe two issues. First, the DLL loads at address 0x00330000. Second, the hashes are inconsistent. It is apparent the runtime loader performed fixups when the operating system relocated the DLL from its preferred base address.
|
Figure 23: Rebased Dll Image Verification, Rebase Address 0xA0000000
|
For the last exercise in rebasing, we will build the DLL as usual (base address 0x10000000), and then rebase the DLL to 0x00330000 as the Operating System did in Figure 23.
|
Figure 24: Rebased Dll Image Verification, Base Address 0x00330000
|
As we can see, the results are if the linker used a base address of 0x00330000 after compilation.
Binding
The final exercise will run bind.exe over the DLL and examine the results. As discussed earlier, bind hard codes function addresses in the IAT. We expect that binding the image should not effect its .text section. When we view the output of sample six after binding, we find it is the case.
|
Figure 25: Bound Dll Image Verification, Base Address 0x10000000
|
GetAddressOfMain()
Determining the location of a function can be useful as a "flag in the sand" or a basic sanity check. An interesting aspect of debugging is that the debugger should not influence the program. To this end, Visual Studio does a very good job. However, there is some influence from the environment. Consider the following:
PVOID pfnMain = (PVOID)&_tmain;
In Debug builds, pfnMain
is a pointer to a jump (E9
) instruction. In Release builds, it is the actual address of main()
. In both debug and release builds, the linker performs the back patch of the address. The following acknowledges the influence of a jump stub on main when executing in Debug builds:
PVOID pfnMain = (PVOID)&_tmain;
PBYTE pPossibleJump = static_cast<PBYTE>(pfnMain);
BYTE opcode = *pPossibleJump;
if( 0xE9 != opcode )
{
cout << "main() is not a jump opcode... no fixup applied" << endl;;
}
else
{
DWORD dwJump = *( reinterpret_cast<PDWORD>(pPossibleJump+1) );
pfnMain = pPossibleJump + dwJump + sizeof(opcode) + sizeof(dwJump);
cout << "main() is a jump opcode... fixup applied" << endl;
}
A picture being worth 1000 words, the figure below displays the result of the previous code.
Address Space Layout Randomization
ASLR is Address Space Location Randomization. It is meant to thwart certain types of attacks, such as stack smashing, which some binaries could fall victim. Since it is a runtime rebasing policy, ASLR does effect integrity checks. Fortunately, we can remove ASLR by not specifying the /dynamicbase linker option. Additional resources include Inside the Windows Vista Kernel: Part 3 by Russinovich and Windows Vista ISV Security. Also of interest is On the Effectiveness of Address-Space Randomization, an analysis of ASLR. The paper was authored by researchers at Stanford University.
Miscellaneous and Other Errata
For the purposes of this article, the named section of interest when using Microsoft tools is .text
. Other compiler vendors may name the section differently. In addition, certain characteristics of the executable files discussed apply only to the x86 architecture. Since PCs dominate the desktop market (on the order of 90% market share), coupled with the fact the author does not have a PowerPC platform running Windows NT, only the x86 COFF on 32 bit architectures is examined. In addition, neither Windows Vista is, nor the possible influence of DEP.
Matt Pietrek's articles are generally considered the standard when examining and manipulating executable headers. However, please keep in mind some articles cited by others from Pietrek are over 14 years old. The author finds the syndrome similar to that described by Donald Knuth in The Art of Computer Programming, Volume 2 Seminumerical Algorithms, Section 3.1:
Many random number generators in use ... were not very good. People have tended to avoid learning about [the systems]; ... and [the systems] have been passed down blindly from one programmer to another, until the users have no understanding of the original limitations.
Surely Microsoft's implementations have changed as the environment has become more hostile in the years since Pietrek's articles were released. For example, consider Data Execution Protection. DEP is implemented in one of two ways within the confines of a Windows XP, SP 2 system on a PC. Another example is Address Space Layout Randomization (ASLR), which is a feature of Windows Vista. It should be readily apparent that DEP and ASLR were not available when Pietrek's articles were originally written.
By no means should a reader construe the author's differing opinion as an assertion of incorrectness. It is simply felt that it is now time to re-examine Pietrek's works, especially in the context of malicious software environments.
Resources
Acknowledgments
- Wei Dai for Crypto++ and his invaluable help on the Crypto++ mailing list
- Dr. A. Brooke Stephens who laid my Cryptographic foundations
- Ken Johnson, Microsoft Windows SDK MVP
- Stephen Hewitt, Code Project MVP
Checksums
- VerifyIntegrity01.zip
- MD5: 8061D977D223B491DF33BECF1E4B30C0
- SHA-1: 7997CF4ED6C52419F71D629F50BCA9F61D0E2D29
- VerifyIntegrity02.zip
- MD5: 53956F4C502365CEE835E0D079D696E7
- SHA-1: 90DE30F61EA1D0660382A6D651E29D2E6AECF49B
- VerifyIntegrity03.zip
- MD5: 729022DF0865EEA9341711C6D1EADD02
- SHA-1: 714D853CC4A729C0919A63BE777070AB1A0F1FD8
- VerifyIntegrity04.zip
- MD5: AED4FAB4BA7B11A74B35837860EC54F8
- SHA-1: D61E72BD151F1547EDF2719F4BCD9B317CF99862
- VerifyIntegrity05.zip
- MD5: B3150B77EF2A4612FB24A5AA4EFC951A
- SHA-1: D329C212A8FDE44C8F9A7B2B372ECC9AEA413C51
- VerifyIntegrity06.zip
- MD5: 401B9F531648E1CEEA3E6CCCEE299465
- SHA-1: 3AE875D2505043CEAEC3AEB6152CC798D7ECB57A
Revisions
- 03.19.2008 Added Reference to Post-Build Executable Back Patching
- 03.08.2008 Added 'Binding and Rebasing' Sections
- 03.08.2008 VS2002 (7.0) to VS2005 (8.0) Port
- 03.08.2008 Added Samples 4, 5, and 6
- 03.08.2008 Article Samples Rewritten
- 03.08.2008 Reworked Introduction
- 05.28.2007 Added Reference to Tamper Aware and Self Healing Code
- 04.15.2007 Added Reference to Process Creation
- 04.15.2007 Added Reference to Windows Internals
- 02.17.2006 Added DEP
- 02.17.2007 Added ASLR
- 02.11.2007 Initial Release