Dynamic TEXT Section Image Verification

Jeffrey Walton

4.90/5 (38 votes)

20 Mar 2008CPOL23 min read

5.2K

Detect Hardware Faults and Unauthorized In-Memory Patches with Hashing using Crypto++

Downloads

Introduction

Determining if a file's disk image has been altered after loading into memory can be a useful operation. Reasons for doing so would include hardware fault isolation - whether memory or disk - and detecting the effects of a binary patcher.

Academia has been studying the problem of a program image integrity checking itself (and related issues) for some time. In academic literature this is known as self checksumming. Papers of interest include Strengthening Software Self-Checksumming via Self-Modifying Code by Giffin, Christodorescu, and Kruger; Watermarking, Tamper-Proofing, and Obfuscation - Tools for Software Protection by Collberg and Thomborson; Architectural Support for Copy and Tamper-Resistant Software; and finally Glen Wurster's thesis A Generic Attack on Hashing-Based Software Tamper Resistance.

Microsoft employs a passive and semi-passive integrity system called Windows File Protection, which monitors Operating System files for inadvertent replacement. The passive system uses System File Checker to scan protected files for inadvertent replacement. In this system, the user must manually launch the tool to initiate the operation (hence the passive). In the semi-passive system, a protected directory is monitored. If the OS determines a file has been improperly replaced, the file will be restored from the cache, network installation point, or Windows CD.

It is not possible for a developer to request protection from the Operating System. This may not be a bad situation, since Microsoft performs poorly when protecting it's own binaries from tampering. For examples of kernel patching, see Eliminating Explorer's Delay when Deleting an In Use File or ClearType over Remote Desktop in Windows XP by Dan Farino. To this end, this article will present the reader with the framework for performing integrity verification using Cryptographic Hash functions with Crypto++.

Should the reader be inclined, part two of this article is available: Tamper Aware and Self Healing Code. Post-Build Executable Back Patching is also available, which demonstrates how to automate the process of back pacthing a value into a compile executable.

This article does not cover the Linker and Loader behavior as Matt Pietrek's various Microsoft System Journal articles. The reader is encouraged to visit his articles listed in the Resources. Whether using Pietrek's articles or developing tools for examining executables, the reader should find that Microsoft is sufficiently vague in certain areas of the Portable Executable and Common Object File Format Specification and undocumented in others.

Crypto++

Crypto++ Logo The samples provided use Crypto++ Hashes. Crypto++ can be downloaded from Wei Dai's Crypto++ pages. For compilation and integration issues, visit Integrating Crypto++ into the Microsoft Visual C++ Environment. This article is based upon assumptions presented in the previously mentioned article. For those who are interested in other C++ Cryptographic libraries, please see Peter Gutmann's Cryptlib or Victor Shoup's NTL.

Image Execution

Depending on the source we use, we are told that the executable section can be found by name (for example, .text, .code, .textbss), or by examining the sections of an executable searching for IMAGE_SCN_MEM_EXECUTE. Naming standards do not exist for sections of the PE File (though common section names are usually used), and a compiled executable can have its characteristics modified such that IMAGE_SCN_CNT_CODE and IMAGE_SCN_MEM_EXECUTE are no longer present. Altered in these ways, an image will still be loaded and executed by the Operating System. This is because the OS will execute code that has been mapped from a file (SEC_IMAGE attribute) using AddressOfEntryPoint to determine the executable section of the file. When we attempt to determine the .text section, we find the executable section by finding a section such that SectionStart ≤ AddressOfEntryPoint < SectionEnd (which is usually the distinguished '.text' section). Matt Pietrek demonstrates the technique using PEDUMP in An In-Depth Look into the Win32 Portable Executable File Format, Part 1. See GetEnclosingSectionHeader() in common.cpp of PEDUMP.

Reading a Disk Image

The first step in developing the system is based on the disk image. There is very little difference between an on-disk and in-memory image. Regardless of whether we use a pointer acquired via CreateFileMapping() or a HMODULE from GetModuleHandle(), the various structures are the same. The most noticeable differences occur between Debug and Release builds of the executable, where Debug builds tend to have more indirection using stubs or jump tables in the binary.

Winnt.h defines the structures and constants or interest. The first structure of interest is IMAGE_DOS_HEADER - referred to as the MS-DOS 2.0 Compatible EXE Header - located at Byte 0. The fields of interest are e_magic and e_lfanew. For the purposes of this article, e_magic should be IMAGE_DOS_SIGNATURE, the familiar "MZ." Once e_magic is verified, the DOS header is stepped over. How far to step is determined by e_lfanew. In between the IMAGE_DOS_HEADER and IMAGE_NT_HEADERS (the New Executable header) is the stub "This program cannot be run in DOS mode." Figure 1 displays the Hex dump of IMAGE_DOS_HEADER and the stub program.

Figure 1: DOS Header and Stub Program

The first order of business is mapping the disk file into memory. The task is accomplished as follows. CreateFile(), CreateFileMapping(), and MapViewOfFile() are biased to read only opens (employing the concept of least privilege).

C++

/////////////////////////////////////////////////
if( 0 == GetModuleFileName( NULL, szFilename, PATH_SIZE ) )
{ return -1; }

/////////////////////////////////////////////////////////////
hFile = CreateFile( szFilename, GENERIC_READ, FILE_SHARE_READ,
    NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, 0);
if ( hFile == INVALID_HANDLE_VALUE )
{ return -1; }

/////////////////////////////////////////////////////////////
hFileMapping = CreateFileMapping( hFile, NULL,
    PAGE_READONLY, 0, 0, NULL );
if ( NULL == hFileMapping )
{ return -1; }

/////////////////////////////////////////////////////////////
pBaseAddress = MapViewOfFile( hFileMapping,
    FILE_MAP_READ, 0, 0, 0 );
if ( NULL == pBaseAddress )
{ return -1; }

The code to inspect the IMAGE_DOS_HEADER follows. Note that the HANDLE returned from CreateFileMapping() will usually be a memory address on the order of 0x00350000 in debug builds.

C++

////////////////////////////////////////////////////////////
pDOSHeader = static_cast<PIMAGE_DOS_HEADER>( pBaseAddress );
if( pDOSHeader->e_magic != IMAGE_DOS_SIGNATURE )
{ return -1; }

////////////////////////////////////////////////////////////
pNTHeader = reinterpret_cast<PIMAGE_NT_HEADERS>(
    (PBYTE)pMappedFile + pDOSHeader->e_lfanew );

Once in possession of the IMAGE_NT_HEADERS pointer (pNTHeader), the next step is to verify that the Signature field of the header is IMAGE_NT_SIGNATURE. IMAGE_NT_SIGNATURE is four bytes consisting of "PE" and two NULL bytes. Once the Signature has been verified, a thorough examination can begin. Winnt.h shows the structure of IMAGE_NT_HEADERS:

C++

typedef struct _IMAGE_NT_HEADERS {
    DWORD Signature;
    IMAGE_FILE_HEADER FileHeader;
    IMAGE_OPTIONAL_HEADER32 OptionalHeader;
};

At this point, the reader should familiarize themselves with 32 and 64 bit versions of the NT header structures, as well as the IMAGE_FIRST_SECTION macro defined in Winnt.h. For the purposes of this article, only IMAGE_NT_HEADERS32 will be examined.

C++

/////////////////////////////////////////////////////////////
PIMAGE_NT_HEADERS pNTHeader = NULL;
pNTHeader = reinterpret_cast<PIMAGE_NT_HEADERS>(
    (PBYTE)pBaseAddress + pDOSHeader->e_lfanew );

if(pNTHeader->Signature != IMAGE_NT_SIGNATURE )
{ return -1; }

/////////////////////////////////////////////////////////////
PIMAGE_FILE_HEADER pFileHeader = NULL;
pFileHeader = reinterpret_cast<PIMAGE_FILE_HEADER>(
    (PBYTE)&pNTHeader->FileHeader );

/////////////////////////////////////////////////////////////
PIMAGE_OPTIONAL_HEADER pOptionalHeader = NULL;
pOptionalHeader = reinterpret_cast<PIMAGE_OPTIONAL_HEADER>(
    (PBYTE)&pNTHeader->OptionalHeader );

/////////////////////////////////////////////////////////////
if( IMAGE_NT_OPTIONAL_HDR32_MAGIC != pNTHeader->OptionalHeader.Magic )
{ return -1; }

The IMAGE_NT_HEADERS yields an IMAGE_FILE_HEADER and IMAGE_OPTIONAL_HEADER. The File Header is also referred to as the COFF header. Below is a view of a Debug build File Header structure. 0x00004550 is a byte swapped signature of PE\0\0.

Figure 2: File Header

The PE Browse view of the Optional Header is shown below. The Optional Header is not optional; it is required in executable files, but not COFF object files.

Figure 3: Optional Header

Fields of interest in IMAGE_FILE_HEADER and IMAGE_OPTIONAL_HEADER include the following.

IMAGE_FILE_HEADER	Machine
IMAGE_FILE_HEADER	NumberOfSections
IMAGE_OPTIONAL_HEADER	MajorLinkerVersion and MinorLinkerVersion
IMAGE_OPTIONAL_HEADER	SizeOfCode
IMAGE_OPTIONAL_HEADER	BaseOfCode
IMAGE_OPTIONAL_HEADER	SizeOfInitializedData
IMAGE_OPTIONAL_HEADER	SizeOfUninitializedData
IMAGE_OPTIONAL_HEADER	BaseOfData
IMAGE_OPTIONAL_HEADER	AddressOfEntryPoint
IMAGE_OPTIONAL_HEADER	ImageBase

In addition, the IMAGE_OPTIONAL_HEADER has an array (IMAGE_NUMBEROF_DIRECTORY_ENTRIES - 16 each) IMAGE_DATA_DIRECTORY. To find the program's entry point at runtime - which is CRT code for the command line samples - one would use the following:

C++

pImageBase + pNTHeader->OptionalHeader.AddressOfEntryPoint

Stepping over the previous two headers reveals a IMAGE_SECTION_HEADER. These sections are the familiar .text, .textbss (if incremental linking is enabled), .data, .rdata, etc. Winnt.h defines the IMAGE_SECTION_HEADER as follows:

C++

typedef struct _IMAGE_SECTION_HEADER {
    BYTE Name[IMAGE_SIZEOF_SHORT_NAME];
    union {
        DWORD PhysicalAddress;
        DWORD VirtualSize;
    } Misc;
    DWORD VirtualAddress;
    DWORD SizeOfRawData;
    DWORD PointerToRawData;
    DWORD PointerToRelocations;
    DWORD PointerToLinenumbers;
    WORD NumberOfRelocations;
    WORD NumberOfLinenumbers;
    DWORD Characteristics;
};

The IMAGE_SECTION_HEADER is documented at MSDN. Taking from the document:

Name	An 8-byte, null-padded UTF-8 string. There is no terminating null character if the string is exactly eight characters long. For longer names, this member contains a forward slash (/) followed by an ASCII representation of a decimal number that is an offset into the string table. Executable images do not use a string table and do not support section names longer than eight characters
Misc.PhysicalSize	The file address
Misc.VirtualAddress	The total size of the section when loaded into memory, in bytes. If this value is greater than the SizeOfRawData member, the section is filled with zeroes. This field is valid only for executable images and should be set to 0 for object files
VirtualAddress	The address of the first byte of the section when loaded into memory, relative to the image base. For object files, this is the address of the first byte before relocation is applied
PointerToRawData	A file pointer to the first page within the COFF file. This value must be a multiple of the FileAlignment member of the IMAGE_OPTIONAL_HEADER structure. If a section contains only uninitialized data, this member is zero

Since this article is concerned with modified code, the section of interest is .text when using the Visual Studio line of products. As both Stephen Hewitt and Ken Johnson point out, there is no requisite naming convention. On the x86 architecture, the .text section will almost always have characteristics IMAGE_SCN_CNT_CODE, IMAGE_SCN_MEM_READ and IMAGE_SCN_MEM_EXECUTE.

As Jeffrey Richter states, on other architectures IMAGE_SCN_MEM_READ and IMAGE_SCN_MEM_EXECUTE may be encountered separately, since the processor may enforce the distinction. For the purposes of this article, the default section name is used. To find the .text section, we loop over the IMAGE_SECTION_HEADERs until a section encompasses AddressOfEntryPoint. The number of the section is retrieved using pNTHeader->FileHeader.NumberOfSections.

C++

/////////////////////////////////////////////////////////////
DWORD dwEntryPoint = pNTHeader->OptionalHeader.AddressOfEntryPoint;
UINT nSectionCount = pNTHeader->FileHeader.NumberOfSections;

for( UINT i = 0; i < nSectionCount; i++ )
{
    // When we find a Section such that
    //  Section Start <= Entry Point < Section End,
    //  we have found the .TEXT Section
    if( pSectionHeader->VirtualAddress <= dwEntryPoint &&

        dwEntryPoint < pSectionHeader->VirtualAddress +
                       pSectionHeader->Misc.VirtualSize )
    { break; }

    pSectionHeader++;
}

Once the loop completes, pSectionHeader will be a valid pointer to the executable's code (.text) section. At this point, we can use pBaseAddress - the pointer acquired from MapViewOfFile() - and pSectionHeader->PointerToRawData to determine the start of the .text section on disk. To determine the size of the .text section, we would use pSectionHeader->Misc.VirtualSize.

C++

/////////////////////////////////////////////////////////////
pCodeStart = (PVOID)((PBYTE)pBaseAddress +
    pSectionHeader->PointerToRawData );

/////////////////////////////////////////////////////////////
dwCodeSize = pSectionHeader->Misc.VirtualSize;

Now armed with a foundation, one can use PE Browse to reveal the various structures of the PE executable. For example, the Import Directory found in IMAGE_OPTIONAL_HEADER, IMAGE_DATA_DIRECTORY can be found at Virtual Address 0x0D4000.

Figure 4: Image Directories

Examining this area of the disk file in fact displays the Import Table.

Figure 5: Import Section

Finally, the code to read an executable's .text section on disk would be as follows:

C++

int _tmain(int argc, _TCHAR* argv[])
{
    PVOID  pBaseAddress = NULL;
    DWORD  dwRawData = 0;
    PVOID  pEntryPoint = NULL;
    PVOID  pCodeStart = NULL;
    PVOID  pCodeEnd = NULL;   
    SIZE_T dwCodeSize = 0;
       
    GatherDiskImageInformation( pBaseAddress, dwRawData,
        pEntryPoint, pCodeStart, dwCodeSize, pCodeEnd );

    DumpDiskImageInformation( pBaseAddress, dwRawData,
        pEntryPoint, pCodeStart, dwCodeSize, pCodeEnd );

    HexDump( pCodeStart, pCodeStart, DUMP_SIZE ); 

    return 0;
}

Figure 6: Disk Image

The difference between Base Address (0x350000) above and PE Browse base address (0x434000) below is superficial: 0x350000 was returned from MapViewOfFile() while 0x00434000 is derived from the Image Base Address and Virtual Address in the headers. That is, PE Browse calculates where the image will be in memory using OptionalHeader.ImageBase.

Figure 7: File Header

Reading a Memory Image

The code to read a memory image is nearly the same as that of a disk image. The differences are:

DOS Header location is determined by HMODULE, rather than MapViewOfFile
VirtualAddress is used in combination with AddressOfEntryPoint to locate .text, rather than PointerToRawData

C++

int _tmain(int argc, _TCHAR* argv[])
{
    HMODULE hModule = NULL;
    PVOID pVirtualAddress = NULL;
    PVOID pCodeStart = NULL;
    PVOID pCodeEnd = NULL;
    SIZE_T dwCodeSize = 0;

    ...
    
    /////////////////////////////////////////////////////////////
    pDOSHeader = static_cast<PIMAGE_DOS_HEADER>( (PVOID)hModule );
    if(pDOSHeader->e_magic != IMAGE_DOS_SIGNATURE )
    { return -1; }

    ...

    /////////////////////////////////////////////////////////////
    DWORD dwEntryPoint = pNTHeader->OptionalHeader.AddressOfEntryPoint;
    UINT nSectionCount = pNTHeader->FileHeader.NumberOfSections;

    for( UINT i = 0; i < nSectionCount; i++ )
    {
        // When we find a Section such that
        //   Section Start <= Entry Point < Section End,
        //   we have found the .TEXT Section
        if( pSectionHeader->VirtualAddress <= dwEntryPoint &&
            dwEntryPoint < pSectionHeader->VirtualAddress +
                           pSectionHeader->Misc.VirtualSize )
        { break; }

        pSectionHeader++;
    }
    
    ...
}

Verifying Integrity

To determine if the executable's .text section has been modified, Sample 3 combines Sample 1 (on-disk) and Sample 2 (in-memory) with MD5 Hashing. MD5 was chosen because it provides short signatures (16 bytes), which are easily displayed as part of the article.

In production, we would use a hash satisfying the requirements of a MDC, or Message Detection Code. MDCs are also known as Manipulation Detection Codes or less commonly Message Integrity Codes. MDCs satisfy two properties: One Way Hash Function (OWHF) and Collision Resistant Hash Function (CRHF). Hashes such as Whirlpool, RIPE-MD, or SHA-2 family (SHA224, SHA256, etc) comply with the requirements. These hash functions are preferred in part due to their digest length - each produces a signature of at least 160 bits.

Because we now calculate both the on-disk and in-memory information, we have adjusted our variable accordingly. We also introduce the hash variables and objects:

C++

/////////////////////////////////////////////////
// On-Disk Variables
PVOID   pBaseAddress = NULL;        
DWORD   dwRawData = 0;  
PVOID   pDiskEntryPoint = NULL;
PVOID   pDiskCodeStart = NULL;
PVOID   pDiskCodeEnd = NULL;
SIZE_T  dwDiskCodeSize = 0;

/////////////////////////////////////////////////
// In-Memory Variables
HMODULE hModule = NULL;
PVOID   pVirtualAddress = NULL;  
PVOID   pMemoryEntryPoint = NULL;
PVOID   pMemoryCodeStart = NULL;
PVOID   pMemoryCodeEnd = NULL;     
SIZE_T  dwMemoryCodeSize = 0;

/////////////////////////////////////////////////
// Hash Specific Variables
MD5  hash;
BYTE cbDiskHash[ MD5::DIGESTSIZE ];
BYTE cbMemoryHash[ MD5::DIGESTSIZE ];

Two functions CalculateHash() and DumpHashInformation() have been added. Below, CalculateHash() is shown. The function takes a HashTransformation reference, which is a base class of all hash classes in Crypto++.

C++

BOOL CalculateHash( HashTransformation& hash,
                    PVOID pMessage, SIZE_T nMessageSize,
                    PBYTE pcbHashBuffer, SIZE_T nHashBufferSize )
{
    if( nHashBufferSize != hash.DigestSize() )
    {
        ZeroMemory( pcbHashBuffer, nHashBufferSize );
        return FALSE;
    }

    hash.Update( (const PBYTE)pMessage, nMessageSize );
    hash.Final( pcbHashBuffer );

    return TRUE;
}

In the listing above, we reuse the MD5 hash object - the object hashes both the on-disk and in-memory images. Calling Final() to retrieve the hash value of the message (the executable image) resets the object.

Debug Builds

Figure 8: Disk versus Memory Image Signature (Under Debugger)

In Figure 8 the signatures are not consistent. This is because the capture was performed under the Visual Studio debugger. The debugger inserts software breakpoints, which affects the in-memory hash value. Although not very useful, a Debug build can be verified by running the program from the command prompt. The result is shown in Figure 9, which produces expected results. If we compare Figures 8 and, 9 we see that the on-disk signature is consistent regardless of whether the program is being hosted by the debugger.

Figure 9: Disk versus Memory Image Signature (Outside Debugger)

To overcome the effects of the debuggers using software breakpoints, we can use WinDbg and set a breakpoint using a Debug Register. Taking from Ken Johnson:

...in WinDbg, if you use the 'ba' command then the code bytes in question will not be modified (i.e. substituted with an 0xcc/int 3). You are limited to 4 simultaneously active 'ba' breakpoints as they use the hardware supplied debug registers, which only support four target addresses.

Release Builds

Figure 10 demonstrates running a release build of sample three. The noticeable change (besides different signatures between debug and release builds) is the release code size is nearly 10 times smaller than a debug build.

Figure 10: Disk versus Memory Image Signature, Release Build

Binding and Rebasing

Binding an executable refers to writing the addresses of imported functions into the IAT of an executable. More precisely, when an image is bound the IMAGE_THUNK_DATA structures in the IAT are overwritten with the actual address of the imported function. This is done so the loader does not have to determine the address of an imported function and write the address at load time, thereby speeding up the load process. Since we do not use the IAT in our calculation of the digest, it does not affect our results. We will examine the effects of binding a DLL in the Dynamic Link Library section.

Rebasing a DLL is common to avoid load address collisions. It is a procedure the loader will perform on DLLs when a conflict arises. Executable files are not rebased per se. Changing the base address of an executable is uncommon, but not unheard of. For example, the command interpreter (cmd.exe) on Windows XP, SP2 has a preferred base address of 0x4AD00000. JMP and CALL instructions emitted by the compiler use offsets relative to the instruction, rather than absolute addresses in the 32-bit flat segment. If the image needs to be loaded somewhere other than its specified base address, generated instructions don't need to change since they are using relative addressing.

Figure 11: Rebased Executable, Release Build

In Figure 11, sample three was compiled with a base address of 0x00500000. As we can see, we have a different signature than expected (F0:A7:3B:5E...40:CF:E9:39 in a customary base address). When we rebase the executable, we find that more changes have occurred than simply changing ImageBase and AddressOfEntryPoint. Sample four investigates this situation further.

In Sample 4, all code has been removed except the GatherImageInformation() and DumpImageInformation(). Next, we add the following to dump the .text section to disk using the filename 'textdump.bin'.

C++

StringSource(
    (const BYTE*)pCodeStart, dwCodeSize,
    true, new FileSink( "textdump.bin", true )
);

Finally, we run sample four using base addresses of both 0x400000 and 0x500000. When we examine the binary files using a difference program, we see that there are over 2800 difference. Investigating further, we find that the differences tend to be small, usually consisting of one byte. The first change occurs at byte 8 (0x43 vs 0x53). The next 15 changes are the same at varying offsets. At difference 16, the byte change is 0x44 to 0x54.

Figure 12: Binary Difference of .text Sections

The binary dumps (textdump.bin) of the .text sections are available in Sample 4. Next, we use PE Browse to examine the compiled code. We know that the .text section will start at either 0x401000 or 0x501000, depending on the image base of the executable we are examining. The disassembly is shown below in Figure 13.

Figure 13: Disassembly of .text Section

According to the disassembly, the cause of the disparity is due to three factors. First, the exception handler parameter (0x533F6E) is being placed on the stack, which is a function address (constant) in the .text section. The second cause is the function address of objects being called through their vtable entry (for example, see 0x5013A8). In this case, the function address is being loaded based the object's layout which is located in the .rdata section. The dereference into the .rdata section is not relative - it is absolute.

The final reason for the signature difference is Microsoft's Buffer Security Check introduced in Visual Studio 2003. Buffer security check is similar to GCC's StackGuard (which uses canary values) and IBM's ProPolice which offers enhancements to StackGuard (see also Secure Programmer: Countering Buffer Overflows).

The security cookie is the return address of a function call XOR'd with a random value which is then placed next to the return address in memory. If an attacker attempts to overwrite the return address with a buffer overflow (stack smash), the check of the security cookie will usually catch exploit.

In all cases above, the differences arise because of memory addresses which are absolute in the 32 bit flat address space.. If we were motivated, we could parse the .reloc section of the executable and fix the image offline (if the image was built without the /FIXED linker switch). For a discussion of the IMAGE_BASE_RELOCATION directory, see Pietrek's Peering Inside the PE: A Tour of the Win32 Portable Executable File Format.

Dynamic Link Libraries

Up to this point, we have examined executable files while only touching on DLLs when a DLL needed consideration. We will now examine DLLs in detail. To begin, we will add a second project to the solution - a Win32 DLL - to create Sample 5. Select a Win32 Console Application as shown in Figure 14. When prompted by the wizard, select an Application Type of 'DLL'. There is no need to check 'Export Symbols'.

Figure 14: DLL Project Creation

Add a DEF file and EXPORT GatherImageInformation:

Figure 15: DEF File

Next, move the function GatherImageInformation() from the executable and into the DLL. Add __declspec(dllimport) to the original project (VerifyIntegrity.exe):

C++

// Project 1: VerifyIntegrity.exe
__declspec(dllimport)
VOID GatherImageInformation( HMODULE& hModule,
            PVOID& pVirtualAddress, PVOID& pEntryPoint,
            PVOID& pCodeStart, SIZE_T& dwCodeSize, PVOID& pCodeEnd );

And __declspec(dllexport) to the DLL implementation (VerifyIntegrityDll.dll):

C++

// Project 2: VerifyIntegrityDll.dll
__declspec(dllexport)
VOID GatherImageInformation( HMODULE& hModule,
        PVOID& pVirtualAddress, PVOID& pEntryPoint,
        PVOID& pCodeStart, SIZE_T& dwCodeSize, PVOID& pCodeEnd )
{
    ...
}

When we run the program, we observe the results of Figure 16. Since the dynamic library calls GetModuleHandle( NULL ), the return value is that of the executable - 0x00400000 in this case.

Figure 16: DLL Returning Exe Information

Finally, Sample 5 adds code to perform both executable and library in-memory interrogations by the DLL. In essence, we have moved GatherInfromation() functionality of sample three into the DLL. The result of running sample five is shown in Figure 17.

Figure 17: Dll and Exe In-Memory Information

Examining the disassembly of the DLL confirms the entry point. Note that the entry point is the runtime's entry point, and not DllMain(). The initialization call graph of interest for the DLL is as follows. Note that DllMain is optional - we happen to use it (though it simply performs a return TRUE). See MSDN's DllMain Callback Function and Dynamic-Link Library Entry-Point Function.

_DllMainCRTStartup
__DllMainCRTStartup
DllMain

Figure 18: DLL Entry Point

While the majority of the code is equivalent for the .text section of both an executable and DLL, there is one critical point: how do we retrieve the image base of the dynamic library (even if loaded at an address other than preferred)? For this, we will use the pseudo-variable __ImageBase. The variable is available in Visual Studio 7.0 and above. It is initialized by the linker and adjusted by the loader as required. According to Raymond Chen on The Old New Thing, "[Using] HINSTANCEs as a [sic] base-address [is valid] ... since the base address gets relocated with the rest of the DLL. A relocatable reference to an address at rva=0 should have done the trick, and with this pseudo-variable, it has been formalized." Chen also states the variable is valid for static libraries which have not yet been linked to an executable.

If you don't have access to __ImageBase due to an earlier version of Visual Studio, use MEMORY_BASIC_INFORMATION and VirtualQuery() to obtain the AllocationBase. Finally, this discussion does not apply to Windows CE.

C++

EXTERN_C IMAGE_DOS_HEADER __ImageBase;

PIMAGE_DOS_HEADER pDOSHeader =
    static_cast<PIMAG_DOS_HEADER>( __ImageBase );

Sample 6 moves GatherDiskImageInformation() and GatherMemoryImageInformation() into the DLL such that the operations are performed on the DLL. The DLL uses __ImageBase and GetModuleFileName() to determine the disk file to open.

C++

TCHAR szFilename[ MAX_PATH ] = { 0 };

////////////////////////////////////////////////
if( 0 == GetModuleFileName( (HMODULE)&__ImageBase,
    szFilename, MAX_PATH ) ) { return -1; }

/////////////////////////////////////////////////////////////
hFile = CreateFile( szFilename, GENERIC_READ, FILE_SHARE_READ,
    NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, 0);

When executed out side of the debugger, we observe the expected results. We can also observe these results under the debugger if software breakpoints are set in the executable, and not the DLL.

Figure 19: Dll Image Verification

Rebasing

If we specify a different base address for the DLL during compilation and linking, we see that the new base address affects the hash of the .text section. However, the hash of the on-disk image is consistent with the in-memory image. Figure 20 below, a base address of 0x08000000 was specified.

Figure 20: Dll Image Verification,
Base Address 0x08000000

Next, we use rebase.exe on the DLL to set the preferred base address back to 0x10000000. The results are exactly the same as specifying the base address during compile time (shown in Figure 21).

Figure 21: Dll Image Verification,
Rebase Address 0x10000000

Next, we will rebase the DLL to 0xA0000000. After the rebase, we examine the executable using Dependency Walker (depends.exe) to verify the base address.

Figure 22: Dependency Walk, Base Address 0xA0000000

When we run the executable, we observe two issues. First, the DLL loads at address 0x00330000. Second, the hashes are inconsistent. It is apparent the runtime loader performed fixups when the operating system relocated the DLL from its preferred base address.

Figure 23: Rebased Dll Image Verification,
Rebase Address 0xA0000000

For the last exercise in rebasing, we will build the DLL as usual (base address 0x10000000), and then rebase the DLL to 0x00330000 as the Operating System did in Figure 23.

Figure 24: Rebased Dll Image Verification,
Base Address 0x00330000

As we can see, the results are if the linker used a base address of 0x00330000 after compilation.

Binding

The final exercise will run bind.exe over the DLL and examine the results. As discussed earlier, bind hard codes function addresses in the IAT. We expect that binding the image should not effect its .text section. When we view the output of sample six after binding, we find it is the case.

Figure 25: Bound Dll Image Verification,
Base Address 0x10000000

GetAddressOfMain()

Determining the location of a function can be useful as a "flag in the sand" or a basic sanity check. An interesting aspect of debugging is that the debugger should not influence the program. To this end, Visual Studio does a very good job. However, there is some influence from the environment. Consider the following:

C++

PVOID pfnMain = (PVOID)&_tmain;

In Debug builds, pfnMain is a pointer to a jump (E9) instruction. In Release builds, it is the actual address of main(). In both debug and release builds, the linker performs the back patch of the address. The following acknowledges the influence of a jump stub on main when executing in Debug builds:

C++

PVOID pfnMain = (PVOID)&_tmain;
PBYTE pPossibleJump = static_cast<PBYTE>(pfnMain);
BYTE opcode = *pPossibleJump;
if( 0xE9 /* Jump */ != opcode )
{
    cout << "main() is not a jump opcode... no fixup applied" << endl;;
}
else
{
    DWORD dwJump = *( reinterpret_cast<PDWORD>(pPossibleJump+1) );
    pfnMain = pPossibleJump + dwJump + sizeof(opcode) + sizeof(dwJump);
    cout << "main() is a jump opcode... fixup applied" << endl;
}

A picture being worth 1000 words, the figure below displays the result of the previous code.

Address Space Layout Randomization

ASLR is Address Space Location Randomization. It is meant to thwart certain types of attacks, such as stack smashing, which some binaries could fall victim. Since it is a runtime rebasing policy, ASLR does effect integrity checks. Fortunately, we can remove ASLR by not specifying the /dynamicbase linker option. Additional resources include Inside the Windows Vista Kernel: Part 3 by Russinovich and Windows Vista ISV Security. Also of interest is On the Effectiveness of Address-Space Randomization, an analysis of ASLR. The paper was authored by researchers at Stanford University.

Miscellaneous and Other Errata

For the purposes of this article, the named section of interest when using Microsoft tools is .text. Other compiler vendors may name the section differently. In addition, certain characteristics of the executable files discussed apply only to the x86 architecture. Since PCs dominate the desktop market (on the order of 90% market share), coupled with the fact the author does not have a PowerPC platform running Windows NT, only the x86 COFF on 32 bit architectures is examined. In addition, neither Windows Vista is, nor the possible influence of DEP.

Matt Pietrek's articles are generally considered the standard when examining and manipulating executable headers. However, please keep in mind some articles cited by others from Pietrek are over 14 years old. The author finds the syndrome similar to that described by Donald Knuth in The Art of Computer Programming, Volume 2 Seminumerical Algorithms, Section 3.1:

Many random number generators in use ... were not very good. People have tended to avoid learning about [the systems]; ... and [the systems] have been passed down blindly from one programmer to another, until the users have no understanding of the original limitations.

Surely Microsoft's implementations have changed as the environment has become more hostile in the years since Pietrek's articles were released. For example, consider Data Execution Protection. DEP is implemented in one of two ways within the confines of a Windows XP, SP 2 system on a PC. Another example is Address Space Layout Randomization (ASLR), which is a feature of Windows Vista. It should be readily apparent that DEP and ASLR were not available when Pietrek's articles were originally written.

By no means should a reader construe the author's differing opinion as an assertion of incorrectness. It is simply felt that it is now time to re-examine Pietrek's works, especially in the context of malicious software environments.

Resources

Windows File Protection
System File Checker
DbgHelp Library Reference
Windows Vista ISV Security
Inside the Windows Vista Kernel: Part 3 by Mark Russinovich
A Detailed Description of the Data Execution Prevention (DEP)
Microsoft Portable Executable and Common Object File Format Specification
Programming Applications for Microsoft Windows by Jeffrey Richter
Microsoft Windows Internals by Mark Russinovich and Solomon
MSJ Under the Hood (July 1997) by Matt Pietrek
Rebasing Win32 DLLs: The Whole Story by Ruediger Asche
Peering Inside the PE: A Tour of the Win32 Portable Executable File Format by Matt Pietrek
An In-Depth Look into the Win32 Portable Executable File Format, Part 1 by Matt Pietrek
An In-Depth Look into the Win32 Portable Executable File Format, Part 2 by Matt Pietrek
What Goes On Inside Windows 2000: Solving the Mysteries of the Loader by Russ Osterlund
Optimizing DLL Load Time Performance by Matt Pietrek
Tamper Aware and Self Healing Code by Jeffrey Walton
Secure Programmer: Countering Buffer Overflows by David Wheeler

Acknowledgments

Wei Dai for Crypto++ and his invaluable help on the Crypto++ mailing list
Dr. A. Brooke Stephens who laid my Cryptographic foundations
Ken Johnson, Microsoft Windows SDK MVP
Stephen Hewitt, Code Project MVP

Checksums

VerifyIntegrity01.zip
- MD5: 8061D977D223B491DF33BECF1E4B30C0
- SHA-1: 7997CF4ED6C52419F71D629F50BCA9F61D0E2D29
VerifyIntegrity02.zip
- MD5: 53956F4C502365CEE835E0D079D696E7
- SHA-1: 90DE30F61EA1D0660382A6D651E29D2E6AECF49B
VerifyIntegrity03.zip
- MD5: 729022DF0865EEA9341711C6D1EADD02
- SHA-1: 714D853CC4A729C0919A63BE777070AB1A0F1FD8
VerifyIntegrity04.zip
- MD5: AED4FAB4BA7B11A74B35837860EC54F8
- SHA-1: D61E72BD151F1547EDF2719F4BCD9B317CF99862
VerifyIntegrity05.zip
- MD5: B3150B77EF2A4612FB24A5AA4EFC951A
- SHA-1: D329C212A8FDE44C8F9A7B2B372ECC9AEA413C51
VerifyIntegrity06.zip
- MD5: 401B9F531648E1CEEA3E6CCCEE299465
- SHA-1: 3AE875D2505043CEAEC3AEB6152CC798D7ECB57A

Revisions

03.19.2008 Added Reference to Post-Build Executable Back Patching
03.08.2008 Added 'Binding and Rebasing' Sections
03.08.2008 VS2002 (7.0) to VS2005 (8.0) Port
03.08.2008 Added Samples 4, 5, and 6
03.08.2008 Article Samples Rewritten
03.08.2008 Reworked Introduction
05.28.2007 Added Reference to Tamper Aware and Self Healing Code
04.15.2007 Added Reference to Process Creation
04.15.2007 Added Reference to Windows Internals
02.17.2006 Added DEP
02.17.2007 Added ASLR
02.11.2007 Initial Release

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)