PE Format Illustrated – Part 1

Mark Pelf

5.00/5 (11 votes)

10 Mar 2023CPOL9 min read

10.6K

Beginner’s tutorial on PE format, with illustrations

Beginners tutorial on PE format, with illustrations. Planned to be an easy-to-follow overview tutorial with a lot of illustrations, without going into all the details. We tried to focus on the big picture.

1. Illustrated Tutorial

This is planned to be an illustrated tutorial on PE file format. Many tutorials I saw are overwhelmed with details, going over all the different possibilities and branches for PE file format [9], with the end result that they become hard to follow. Our plan is to provide an overview tutorial focused on the big picture with a number of illustrations, and the reader will later find more details elsewhere if he is interested in the topic.

1.1. Tools Used – Hex Editor ImHex

I am using Hex Editor ImHex (freeware) which enables me to color sections of files ([1]).

1.2. Tools Used – PE Viewer “PE-bear”

PE-bear (freeware) is a very useful tool to analyze PE files visually. Based on the documentation, it does not cover all variants and flavors of PE files, but for simpler ones, it is a great viewer/parser/analyzer. I think it is always easier to study topics using visual tools. ([2])

2. Executable File Formats

Before we start our exploration of PE format, let us mention that PE format belongs to a family of “executable file formats” which are nicely listed at [3], around 40 of them, for different Operating Systems. Let us mention several of the most popular ones:

MZ – For DOS and Windows, extension .exe
COFF - For UNIX/Linux-like systems, no extension
ELF - For UNIX/Linux-like systems, no extension or extension .elf
Mach-O – For macOS and iOS, no extension
PE - For Windows, extensions .exe, .dll, .sys, etc.
PE32+ - For 64-bit Windows, extension .exe

Please see [4] for more details on PE and PE32+ formats.

3. History of PE File Format

PE stands for ‘Portable Executable” and the format is invented in the 1980s. The dominant format then was MZ MS-DOS format, which has a special marker at the beginning of file to identify itself, letters “MZ” which by the way are initials of Mark Zbikowski, one of the MS-DOS developers. PE format was about to target Window platform, and they preserved backward compatibility with MZ format (.exe files) and enabled PE format (.exe files) if accidentally run on MS-DOS to report “This program cannot be run in DOS mode”, which was an important issue for Microsoft at the time. Therefore, you will see still that PE format contains MS-DOS style header, meaning it starts with the magic letters “MZ” and also a DOS-Stub that prints that message. That part is unnecessary today but has become part of the standard.
PE format originates from Unix COFF format.
Today, PE format is extended to host .NET code.

4. Technical Details

4.1. Problems PE was Designed to Solve

Designed to support different programs on different hardware
Idea was to separate the program from the processor
Address the move to 64-bit processors

4.2. Loading PE File into Memory

The physical layout of how bytes of PE file are arranged on the disk is not how they are loaded into memory
PE file contains several sections, and to avoid wasting space, they are aligned one after the other on the disk
PE file contains several sections, containing data and programs, and each section is loaded into a separate segment of memory
Reason for that, among others, that each section needs to be aligned to a page boundary
Then, each section can be assigned different memory protection, typically program sections would get execute/read-only protection, and data sections would get no-execute/read-write protection.
For that reason, most addresses/offsets in the file are specified using Relative Virtual Address (RVA). This specifies the offset from the start of each section

Here is a picture that illustrates how different section look aligned in “raw alignment” on the disk, and how they are being loaded into memory (‘virtual alignment”) into different virtual addresses resulting in new address schema.

4.3. Converting from “Raw Address” to “Virtual Address” and back

Tasks that will frequently appear are conversions from “Raw Address” (how bytes are aligned in the file) to “Virtual Address” (how bytes are aligned in the memory) and back, using Relative Virtual Address (RVA).
For example, in the above picture, you can see that .text section starts at 0x200 (raw) and has end at 0xF32 (raw). That means, file end has offset of 0xF32(raw)-0x200(raw)=0xD32 from beginning of the section. When that section .text is mapped to memory address 0x2000(virtual), then end of section has address 0x2000+0xD32=0x2D32(RVA). We say that address of the end of section is 0x2D32(RVA).

5. Example Program

We will for our demo use a simple C#11/.NET-7, “Hello World” program, where we created resource file for the string “Hello World!”. As you will see, .NET assemblies are packaged into PE file format. We complied it as C#11/.NET7.

6. PE Format Definition

The precise definition of PE format can be found in [5]. For the purpose of this tutorial, and to follow the outline of PE-bear tool, we will define it here as:

A typical PE file consists of the following parts:

DOS Header (aka “MZ Header”)
DOS Stub
NT Header, (aka “PE File Header”)
which itself consists of:
1. PE signature
2. File Header (aka “COFF Header”, “Image File Header”)
3. Optional Header (aka “Image Optional Header”)
  which itself consists of:
  1. General part
  2. Data Dictionary
Sections Headers (aka “Section Table”)
Multiple sections (aka “Sections”)
1. Section 1
2. Section 2
3. …
4. Section n

Here is a look at the headers in Hex editor, focus on headers:

Here is a look at the whole file, just to get an idea of how headers are a small part (in quantity) of the file.

Here is how the PE-bear tool outlines it in its interface:

7. DOS Header (aka “MZ Header”)

Here is DOS Header in the Hex editor:

Here is an analysis of DOS Header by tool PE-bear:

Interpretation:

Note that the header starts with “Magic number” 0x5A4D” which stands for “MZ” (which by the way are initials of Mark Zbikowski, one of the MS-DOS developers) which acts as a file format identifier
Note at offset 0x3C, there is address of new exe header, which is 0x80, pointing to “NT headers”

8. DOS Stub

Here is DOS Stub in the Hex editor:

Interpretation:

This is a small piece of code that is DOS compatible that just prints an error message saying “This program cannot be run in DOS mode”, in case the program is run under DOS.

9. NT Headers, (aka “PE File Header”)

Here are NT Headers in Hex editor:

9.1. NT Headers - Signature

Interpretation:

That is just 4 bytes starting with “PE” indicating that this is PE format.

9.2. NT Headers - File Header (aka “COFF Header”, “Image File Header”)

Here is an analysis of the File Header by tool PE-bear:

Interpretation:

Note at offset 0x88 it says this file will contain 3 sections. For a fixed section-header size, OS can calculate the size of Section -Headers and how many entries in Section-Headers to look for.
Note at offset 0x94, it says the size of the Optional Header.

9.3. NT Headers - Optional Header (aka “Image Optional Header”)

Here is an analysis of the Optional Header by tool PE-bear:

Interpretation:

This header contains some additional information beyond the basic one contained in the basic File Header
Look at file offset 0x98, so called Magic. It actually says which file type is it. 0x10B stands for PE32 format.
It is interesting to look at file offset 0xA8, for Entry Point. It says address 0x2D32(RVA), which we need to convert to raw address, that it is 0x2D32-0x2000+0x200=0xF32(raw file). That is in the area of section .text. The address of the entry point is the address where the PE loader will begin execution. For the program image, this is the starting address.

9.3.1. NT Headers - Optional Header – Data Dictionary

Interpretation:

If you look at file offset 0x100, for Import Directory, it says at address 0x2CDD(RVA) is import directory, with size of 0x4F bytes. As we will later, that is section .text. We need to convert the RVA address to raw offset, that is 0x2CDD-0x2000+0x200=0xEDD(raw file). That is in the area called section .text. It is a bit complicated what is happening here, there is a table acting as a directory for other entries, see [5] for details. But the point is at the end, you get your import dependency, which is in this case “mscoree.dll, _CorExeMain”. You can see it in the following picture:
And PE-bear tool provides a nice interpretation of this:
If you look at file offset 0x128, for Debug Directory, it says at address 0x2BD4(RVA). It is resolved similarly to the above case, and again that info is saved in section .text. Again, 0x2BD4-0x2000+0x200=0xDD4(raw file).
Here it is in Hex Editor:

And PE-bear tool provides a nice interpretation of this:
If you look at file offset 0x168, for .NET Header, it says at address 0x2008(RVA). We will not analyze this in this tutorial but plan some other tutorial for that.
If you look at file offset 0x108, for Resource Directory it says at address 0x4000(RVA). That is the area of section .rsrc. Now we calculate 0x4000-0x4000+0x1000=0x1000(raw file). We will examine that below, in sections chapter.
If you look at file offset 0x120, for Base Relocation Table it says at address 0x6000(RVA). That is the area of section .reloc. Now we calculate 0x6000-0x6000+0x1600=0x1600(raw file). We will examine that below, in sections chapter.

10. Section Headers (aka “Section Table”)

Here are the Section Headers in the Hex editor:

Here is an analysis of the Section Header by tool PE-bear:

11. Multiple Sections (aka “Sections”)

11.1. Section Names

Sections can have any 8 character name starting with “.”. But usual conventions are:

.text – contains executable code and data
.idata, .rdata – contains Import API
.data, .bss– contains data
.pdata – contains exception info
.reloc – contains relocation info
.rsrc – contains resources
.debug – contains debug information

11.2. Section .text

Here is Section .text in Hex editor:

Interpretation:

In our case, since this is .NET assembly, this section contains:
1. Metadata
2. Managed resources
3. IL code

11.3. Section .rsrc

Here are Section .rsrc in Hex editor:

Here is an analysis of Section .rsrc by tool PE-bear:

Interpretation:

You can see both from Hex Editor and PE-bear analysis (seems like unfinished app here?) that this section contains info about the version and the Application manifest.
These are UNMANAGED resources. Managed .NET resources are inside the section .text .

11.4. Section .reloc

Here is Section .reloc in Hex editor:

Here is an analysis of Section .reloc by tool PE-bear:

12. Conclusion

We will finish here in order to make this tutorial of manageable size. We gave a basic description of the PE format, sufficient for a good technical overview and the interested reader can find more details elsewhere. We didn’t dive into too many details in this article.

More details about PE File Format can be found at [6], [7], [8]. A very interesting slide illustrating different options of PE format can be found at [9].

13. References

14. History

10^th March, 2023: Initial version

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)