Introduction
The Portable Executable Format is the data structure that describes how the various parts of a Win32 executable file are held together. It allows the operating system to load the executable and to locate the dynamically linked libraries required to run that executable and to navigate the code, data and resource sections compiled into that executable.
Getting over DOS
The PE Format was created for Windows but Microsoft had to make sure that running such an executable in DOS would yield a meaningful error message and exit. To this end the very first bit of a Windows executable file is actually a DOS executable (sometimes known as the stub) which writes "This program requires Windows" or similar, then exits.
The format of the DOS stub is:
Private Type IMAGE_DOS_HEADER
e_magic As Integer
e_cblp As Integer
e_cp As Integer
e_crlc As Integer
e_cparhdr As Integer
e_minalloc As Integer
e_maxalloc As Integer
e_ss As Integer
e_sp As Integer
e_csum As Integer
e_ip As Integer
e_cs As Integer
e_lfarlc As Integer
e_ovno As Integer
e_res(0 To 3) As Integer
e_oemid As Integer
e_oeminfo As Integer
e_res2(0 To 9) As Integer
e_lfanew As Long
End Type
The only field of this structure that is of interest to Windows is e_lfanew
which is the file pointer to the new Windows executable header. To skip over the DOS part of the program, set the file pointer to the value held in this field:
Private Sub SkipDOSStub(ByVal hfile As Long)
Dim BytesRead As Long
Call SetFilePointer(hfile, 0, 0, FILE_BEGIN)
If Err.LastDllError Then
Debug.Print LastSystemError
End If
Dim stub As IMAGE_DOS_HEADER
Call ReadFileLong(hfile, VarPtr(stub), Len(stub), BytesRead, ByVal 0&)
Call SetFilePointer(hfile, stub.e_lfanew, 0, FILE_BEGIN)
End Sub
The NT header
The NT header holds the information needed by the Windows program loader to load the program. It consists of the PE File signature followed by an IMAGE_FILE_HEADER
and IMAGE_OPTIONAL_HEADER
records.
For applications designed to run under Windows (i.e. not OS/2 or VxD files) the four bytes of the PE File signature should equal &h4550. The other defined signatures are:
Public Enum ImageSignatureTypes
IMAGE_DOS_SIGNATURE = &H5A4D
IMAGE_OS2_SIGNATURE = &H454E
IMAGE_OS2_SIGNATURE_LE = &H454C
IMAGE_VXD_SIGNATURE = &H454C
IMAGE_NT_SIGNATURE = &H4550
End Enum
Following the PE file signature is the IMAGE_NT_HEADERS
structure that stores information about the target environment of the executable. The structure is:
Private Type IMAGE_FILE_HEADER
Machine As Integer
NumberOfSections As Integer
TimeDateStamp As Long
PointerToSymbolTable As Long
NumberOfSymbols As Long
SizeOfOptionalHeader As Integer
Characteristics As Integer
End Type
The Machine
member describes what target CPU the executable was compiled for. It can be one of:
Public Enum ImageMachineTypes
IMAGE_FILE_MACHINE_I386 = &H14C
IMAGE_FILE_MACHINE_R3000 = &H162
IMAGE_FILE_MACHINE_R4000 = &H166
IMAGE_FILE_MACHINE_R10000 = &H168
IMAGE_FILE_MACHINE_WCEMIPSV2 = &H169
IMAGE_FILE_MACHINE_ALPHA = &H184
IMAGE_FILE_MACHINE_POWERPC = &H1F0
IMAGE_FILE_MACHINE_SH3 = &H1A2
IMAGE_FILE_MACHINE_SH3E = &H1A4
IMAGE_FILE_MACHINE_SH4 = &H1A6
IMAGE_FILE_MACHINE_ARM = &H1C0
IMAGE_FILE_MACHINE_IA64 = &H200
End Enum
The SizeOfOptionalHeader
member indicates the size (in bytes) of the IMAGE_OPTIONAL_HEADER
structure that immediately follows it. In practice this structure is not optional, so that is a bit of a misnomer. This structure is defined as:
Private Type IMAGE_OPTIONAL_HEADER
Magic As Integer
MajorLinkerVersion As Byte
MinorLinkerVersion As Byte
SizeOfCode As Long
SizeOfInitializedData As Long
SizeOfUninitializedData As Long
AddressOfEntryPoint As Long
BaseOfCode As Long
BaseOfData As Long
End Type
and this in turn is immediately followed by the IMAGE_OPTIONAL_HEADER_NT
structure:
Private Type IMAGE_OPTIONAL_HEADER_NT
ImageBase As Long
SectionAlignment As Long
FileAlignment As Long
MajorOperatingSystemVersion As Integer
MinorOperatingSystemVersion As Integer
MajorImageVersion As Integer
MinorImageVersion As Integer
MajorSubsystemVersion As Integer
MinorSubsystemVersion As Integer
Win32VersionValue As Long
SizeOfImage As Long
SizeOfHeaders As Long
CheckSum As Long
Subsystem As Integer
DllCharacteristics As Integer
SizeOfStackReserve As Long
SizeOfStackCommit As Long
SizeOfHeapReserve As Long
SizeOfHeapCommit As Long
LoaderFlags As Long
NumberOfRvaAndSizes As Long
DataDirectory(0 To 15) As IMAGE_DATA_DIRECTORY
End Type
The most useful field of this structure (to my purposes, anyhow) are the 16 IMAGE_DATA_DIRECTORY
entries. These describe whereabouts (if at all) the particular sections of the executable are located. The structure is defined thus:
Private Type IMAGE_DATA_DIRECTORY
VirtualAddress As Long
Size As Long
End Type
And the directories are held in order thus:
Public Enum ImageDataDirectoryIndexes
IMAGE_DIRECTORY_ENTRY_EXPORT = 0
IMAGE_DIRECTORY_ENTRY_IMPORT = 1
IMAGE_DIRECTORY_ENTRY_RESOURCE = 2
IMAGE_DIRECTORY_ENTRY_EXCEPTION = 3
IMAGE_DIRECTORY_ENTRY_SECURITY = 4
IMAGE_DIRECTORY_ENTRY_BASERELOC = 5
IMAGE_DIRECTORY_ENTRY_DEBUG = 6
IMAGE_DIRECTORY_ENTRY_ARCHITECTURE = 7
IMAGE_DIRECTORY_ENTRY_GLOBALPTR = 8
IMAGE_DIRECTORY_ENTRY_TLS = 9
IMAGE_DIRECTORY_ENTRY_LOAD_CONFIG = 10
IMAGE_DIRECTORY_ENTRY_BOUND_IMPORT = 11
IMAGE_DIRECTORY_ENTRY_IAT = 12
IMAGE_DIRECTORY_ENTRY_DELAY_IMPORT = 13
End Enum
Note that if an executable does not contain one of the sections (as is often the case) there will be an IMAGE_DATA_DIRECTORY
for it, but the address and size will both be zero.
The image data directories
The exports directory
The exports directory holds details of the functions exported by this executable. For example, if you were to look in the exports directory of the MSVBVM50.dll it would list all the functions it exports, that make up the Visual Basic 5 runtime environment.
This directory consists of some info to tell you how many exported functions there are, followed by three parallel arrays which give you the address, name and ordinal of the functions respectively. The structure is defined thus:
Private Type IMAGE_EXPORT_DIRECTORY
Characteristics As Long
TimeDateStamp As Long
MajorVersion As Integer
MinorVersion As Integer
lpName As Long
Base As Long
NumberOfFunctions As Long
NumberOfNames As Long
lpAddressOfFunctions As Long
lpAddressOfNames As Long
lpAddressOfNameOrdinals As Long
End Type
And you can read this info from the executable thus:
Private Sub ProcessExportTable(ExportDirectory As IMAGE_DATA_DIRECTORY)
Dim deThis As IMAGE_EXPORT_DIRECTORY
Dim lBytesWritten As Long
Dim lpAddress As Long
Dim nFunction As Long
If ExportDirectory.VirtualAddress > 0 And ExportDirectory.Size > 0 Then
lpAddress = AbsoluteAddress(ExportDirectory.VirtualAddress)
Call ReadProcessMemoryLong(DebugProcess.Handle, lpAddress, _
VarPtr(deThis), Len(deThis), lBytesWritten)
With deThis
If .lpName <> 0 Then
image.Name = StringFromOutOfProcessPointer(DebugProcess.Handle,_
image.AbsoluteAddress(.lpName), 32, False)
End If
If .NumberOfFunctions > 0 Then
For nFunction = 1 To .NumberOfFunctions
lpAddress = LongFromOutOfprocessPointer_
(DebugProcess.Handle, _
image.AbsoluteAddress(.lpAddressOfNames)_
+ ((nFunction - 1) * 4))
fExport.Name = StringFromOutOfProcessPointer_
(DebugProcess.Handle, _
image.AbsoluteAddress(lpAddress), 64, False)
fExport.Ordinal = .Base + _
IntegerFromOutOfprocessPointer(DebugProcess.Handle, _
image.AbsoluteAddress(.lpAddressOfNameOrdinals) + _
((nFunction - 1) * 2))
fExport.ProcAddress = LongFromOutOfprocessPointer_
(DebugProcess.Handle,_
image.AbsoluteAddress(.lpAddressOfFunctions) + _
((nFunction - 1) * 4))
Next nFunction
End If
End With
End If
End Sub
The imports directory
The imports directory lists the dynamic link libraries that this executable depends on and which functions it imports from that dynamic link library. It consists of an array of IMAGE_IMPORT_DESCRIPTOR
structures terminated by an instance of this structure where the lpName
parameter is zero. The structure is defined as:
Private Type IMAGE_IMPORT_DESCRIPTOR
lpImportByName As Long
TimeDateStamp As Long
ForwarderChain As Long
lpName As Long
lpFirstThunk As Long
End Type
And you can walk the import directory thus:
Private Sub ProcessImportTable(ImportDirectory As IMAGE_DATA_DIRECTORY)
Dim lpAddress As Long
Dim diThis As IMAGE_IMPORT_DESCRIPTOR
Dim byteswritten As Long
Dim sName As String
Dim lpNextName As Long
Dim lpNextThunk As Long
Dim lImportEntryIndex As Long
Dim nOrdinal As Integer
Dim lpFuncAddress As Long
If ImportDirectory.VirtualAddress > 0 And ImportDirectory.Size > 0 Then
lpAddress = AbsoluteAddress(ImportDirectory.VirtualAddress)
Call ReadProcessMemoryLong(DebugProcess.Handle, lpAddress, _
VarPtr(diThis), Len(diThis), byteswritten)
While diThis.lpName <> 0
sName = StringFromOutOfProcessPointer(DebugProcess.Handle, _
image.AbsoluteAddress(diThis.lpName), 32, False)
If diThis.lpImportByName <> 0 Then
lpNextName = LongFromOutOfprocessPointer(DebugProcess.Handle,_
image.AbsoluteAddress(diThis.lpImportByName))
lpNextThunk = LongFromOutOfprocessPointer(DebugProcess.Handle,_
image.AbsoluteAddress(diThis.lpFirstThunk))
While (lpNextName <> 0) And (lpNextThunk <> 0)
lpFuncAddress = LongFromOutOfprocessPointer_
(DebugProcess.Handle, lpNextThunk)
nOrdinal = IntegerFromOutOfprocessPointer_
(DebugProcess.Handle, lpNextName)
lpNextName = lpNextName + 2
sName = StringFromOutOfProcessPointer(DebugProcess.Handle, _
image.AbsoluteAddress(lpNextName), 64, False)
If Trim$(sName) <> "" Then
lImportEntryIndex = lImportEntryIndex + 1
lpNextName = LongFromOutOfprocessPointer_
(DebugProcess.Handle, _
image.AbsoluteAddress(diThis.lpImportByName _
+ (lImportEntryIndex * 4)))
lpNextThunk = LongFromOutOfprocessPointer_
(DebugProcess.Handle,_
image.AbsoluteAddress(diThis.lpFirstThunk_
+ (lImportEntryIndex * 4)))
Else
lpNextName = 0
End If
Wend
End If
lpAddress = lpAddress + Len(diThis)
Call ReadProcessMemoryLong(DebugProcess.Handle, lpAddress, _
VarPtr(diThis), Len(diThis), byteswritten)
Wend
End If
End Sub
The resource directory
The structure of the resource directory is somewhat more involved. It consists of a root directory (defined by the structure IMAGE_RESOURCE_DIRECTORY
) immediately followed by a number of resource directory entries (defined by the structure IMAGE_RESOURCE_DIRECTORY_ENTRY
). These are defined thus:
Private Type IMAGE_RESOURCE_DIRECTORY
Characteristics As Long
TimeDateStamp As Long
MajorVersion As Integer
MinorVersion As Integer
NumberOfNamedEntries As Integer
NumberOfIdEntries As Integer
End Type
Private Type IMAGE_RESOURCE_DIRECTORY_ENTRY
dwName As Long
dwDataOffset As Long
CodePage As Long
Reserved As Long
End Type
Each resource directory entry can either point to the actual resource data or to another layer of resource directory entries. If the highest bit of dwDataOffset
is set, then this points to a directory. Otherwise it points to the resource data.
How is this information useful?
Once you know how an executable is put together, you can use this information to peer into its workings. You can view the resources compiled into it, the DLLs it depends on and the actual functions it imports from them. More importantly you can attach to the executable a debugger and track down any of those really troublesome general protection faults. The next article will describe how to attach a debugger and use the PE file format.