Creating Your Very Own x64 PE Packer/Protector from Scratch using C++

The Ænema

4.95/5 (26 votes)

13 Nov 2021MIT24 min read

42.2K

2.1K

This article brings you a very detailed but easy to learn experience on creating your own x64 PE packer/protector using only VC++!

In this article I will teach you to create your own very pe packer/protector from scratch using only visual studio and C/C++ without the need for assembly knowledge, We start with basics and explore more advanced areas to the end of the article, This article is a perfect touch for the people who wants to gain deeper understanding of computer science, If you're ready grab a cup of tea and take a fascinating journey with me!

Introduction

Remember the time people had fun using PE Detectors like Pied, exeinfo, die, RDG, ... to detect what packer/protector the developer used?

Once upon a time pe packers/protectors was very popular and people was using them to reduce their binary size efficiently and add some protection layers to their code.

With the advancement of technology and tools in reverse engineering, protectors became very fragile and defeatable but the war between good and evil continued...

However, pe packers can still be very useful, both in security and in reducing code size but making a packer is a very difficult and it's a complex task. It requires very precise low-level programming knowledge which makes very few people able to complete it successfully.

This article will teach you how to create your very own pe packer using only VC++ and the good news is there's no need for assembly knowledge!

Background

You may ask why do I need a custom packer for my own when there's hundreds of them out there, to get the answer to this question you need to know how pe packers work.

A PE packer/protector gets a pe file, analyzes it and extracts all the information of input pe file, then it modifies the pe file and recreates it using its own structure, it may compress all of your sections into a new one and add its decompression code as the entrypoint and when pe is launched it will decompress the data dynamically into a memory space and restores the original entrypoint and calls it, also it may encrypt the code as well so the original raw code only is accessible and readable at runtime.

Now the structure of a packer is always the same and after a while it becomes a easy target to attack when all the people have access to it, they start packing different pe files and then search for a signature inside it, the signature becomes a mark that can be used to create unpacker/unprotector for the packer.

For example if you get an exe file which is packed using ASPack you can easily unpack it using an OllyDbg script or downloadable unpacker tool like ASPackDie with just one click!

So the point of creating custom packers is :

Only you have the packer and it's only for your product which makes analyses harder because it's unique
Packer is only in your hands and attacker cannot download it from a public site to analyze its functionality
You control how the program restore and launch, compression/encryption algorithms and etc.
You can use extra anti reverse engineering techniques and whatever you want!
You can quickly change the signature and structure when current version is attacked
You can hide useful information that the attacker may use for their analyses

Also, In this tutorial we're not going to develop a regular kind of pe packer, instead of manipulating existing exe/dll file we create a new one just like a linker based on the input pe file.

NOTE : This article is the second part of a previous article on how to build shellcodes using Visual Studio

Preparing Development Environment

1. Required Tools & Software

Visual Studio 2019
VC++ Build Tools ( C++ 17+ Support )
CFF Explorer ( PE Viewer/Editor )
HxD (Hex Editor)

2. Creating Empty Projects

Open Visual Studio 2019
Create two empty C++ projects.
Name one pe_packer and other one unpacker_stub
Set pe_packer Configuration Type to "Application (.exe)"
Set unpacker_stub Configuration Type to "Application (.exe)"
Setup unpacker_stub independent on CRT (C Runtime) and Windows Kernel, If you don't know how read the previous article, Also in this article unpacker_stub is an exe so you need to remove /NOENTRY option.
Set projects to x64 and Release mode.

Add two .cpp files to the projects, one for packer and one for unpacker with following code setups :

C++

// packer.cpp (pe_packer project)
#include <Windows.h>
#include <iostream>
#include <fstream>

using namespace std;

int main(int argc, char* argv[])
{
    if (argc != 3) return EXIT_FAILURE;

    char* input_pe_file        = argv[1];
    char* output_pe_file    = argv[2];

    return EXIT_SUCCESS;
}

C++

// unpacker.cpp (unpacker_stub project)
#include <Windows.h>

// Entrypoint
void func_unpack()
{
    
}

Alright, now we're all set and ready to start developing!

NOTE : For speeding up packer testing you can create a pe_packer_tester.bat file with following content :

"%cd%\pe_packer.exe" "%cd%\input_pe.exe" "%cd%\output_pe.exe"

You can download the basic setup source here.

Download 01_pe_packer_tutorial_starter_kit_vs16_x64.zip

Packer : Parsing + Validating Input PE

Ok, For now we have one input path (input_pe_file) and one output pe path (output_pe_file) passed by the user to our packer, the first step is to validate input file and make sure it's a valid pe file and also make sure it meets the standards that our packer needs.

To perform validation we need to parse the pe file :

C++

// Reading Input PE File
ifstream input_pe_file_reader(argv[1], ios::binary);
vector<uint8_t> input_pe_file_buffer(istreambuf_iterator<char>(input_pe_file_reader), {});

// Parsing Input PE File
PIMAGE_DOS_HEADER in_pe_dos_header = (PIMAGE_DOS_HEADER)input_pe_file_buffer.data();
PIMAGE_NT_HEADERS in_pe_nt_header =  (PIMAGE_NT_HEADERS)(input_pe_file_buffer.data() + in_pe_dos_header->e_lfanew);

Then we validate properties like this :

C++

bool isPE  = in_pe_dos_header->e_magic == IMAGE_DOS_SIGNATURE;
bool is64  = in_pe_nt_header->FileHeader.Machine == IMAGE_FILE_MACHINE_AMD64 &&
    in_pe_nt_header->OptionalHeader.Magic == IMAGE_NT_OPTIONAL_HDR64_MAGIC;
bool isDLL = in_pe_nt_header->FileHeader.Characteristics & IMAGE_FILE_DLL;
bool isNET = in_pe_nt_header->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_COM_DESCRIPTOR].Size != 0;

After adding checking and actions, packer code should look like this :

C++

// packer.cpp
#include <Windows.h>
#include <iostream>
#include <fstream>
#include <vector>

using namespace std;

// Macros
#define BOOL_STR(b) b ? "true" : "false"
#define CONSOLE_COLOR_DEFAULT   SetConsoleTextAttribute(hConsole, 0x09);
#define CONSOLE_COLOR_ERROR     SetConsoleTextAttribute(hConsole, 0x0C);

int main(int argc, char* argv[])
{
    // Setup Console 
    HANDLE  hConsole = GetStdHandle(STD_OUTPUT_HANDLE);
    SetConsoleTitle("Custom x64 PE Packer by H.M v1.0");
    FlushConsoleInputBuffer(hConsole);
    CONSOLE_COLOR_DEFAULT;

    // Validate Arguments Count
    if (argc != 3) return EXIT_FAILURE;

    // User Inputs
    char* input_pe_file     = argv[1];
    char* output_pe_file    = argv[2];

    // Reading Input PE File
    ifstream input_pe_file_reader(argv[1], ios::binary);
    vector<uint8_t> input_pe_file_buffer(istreambuf_iterator<char>(input_pe_file_reader), {});
    
    // Parsing Input PE File
    PIMAGE_DOS_HEADER in_pe_dos_header = (PIMAGE_DOS_HEADER)input_pe_file_buffer.data();
    PIMAGE_NT_HEADERS in_pe_nt_header =  (PIMAGE_NT_HEADERS)(input_pe_file_buffer.data() + in_pe_dos_header->e_lfanew);
    
    // Validte PE Infromation
    bool isPE  = in_pe_dos_header->e_magic == IMAGE_DOS_SIGNATURE;
    bool is64  = in_pe_nt_header->FileHeader.Machine == IMAGE_FILE_MACHINE_AMD64 &&
                 in_pe_nt_header->OptionalHeader.Magic == IMAGE_NT_OPTIONAL_HDR64_MAGIC;
    bool isDLL = in_pe_nt_header->FileHeader.Characteristics & IMAGE_FILE_DLL;
    bool isNET = in_pe_nt_header->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_COM_DESCRIPTOR].Size != 0;

    // Log Validation Data
    printf("[Validation] Is PE File : %s\n", BOOL_STR(isPE));
    printf("[Validation] Is 64bit : %s\n", BOOL_STR(is64));
    printf("[Validation] Is DLL : %s\n", BOOL_STR(isDLL));
    printf("[Validation] Is COM or .Net : %s\n", BOOL_STR(isNET));

    // Validate and Apply Action
    if (!isPE)
    {
        CONSOLE_COLOR_ERROR;
        printf("[Error] Input PE file is invalid. (Signature Mismatch)\n");
        return EXIT_FAILURE;
    }
    if (!is64)
    {
        CONSOLE_COLOR_ERROR;
        printf("[Error] This packer only supports x64 PE files.\n");
        return EXIT_FAILURE;
    }
    if (isNET) 
    {
        CONSOLE_COLOR_ERROR;
        printf("[Error] This packer currently doesn't support .NET/COM assemblies.\n");
        return EXIT_FAILURE;
    }

    return EXIT_SUCCESS;
}

Packer : Developing PE Generator

Alright, Now that we know our input pe file is valid, It's time to create a pe generator that produce a valid empty pe file and for this operation we use Windows API.

Creating DOS Header

Each pe file begins with dos header which contains the magic number (file signature) and basic information about the pe file like where the main header is or address of relocation table, to create the dos header we need to initialize a IMAGE_DOS_HEADER struct and set the values :

C++

// Initializing Dos Header
IMAGE_DOS_HEADER    dos_h;
memset(&dos_h, NULL, sizeof IMAGE_DOS_HEADER);
dos_h.e_magic       = IMAGE_DOS_SIGNATURE;
dos_h.e_cblp        = 0x0090;
dos_h.e_cp          = 0x0003;
dos_h.e_crlc        = 0x0000;
dos_h.e_cparhdr     = 0x0004;
dos_h.e_minalloc    = 0x0000;
dos_h.e_maxalloc    = 0xFFFF;
dos_h.e_ss          = 0x0000;
dos_h.e_sp          = 0x00B8;
dos_h.e_csum        = 0x0000; // Checksum
dos_h.e_ip          = 0x0000;
dos_h.e_cs          = 0x0000;
dos_h.e_lfarlc      = 0x0040;
dos_h.e_ovno        = 0x0000;
dos_h.e_oemid       = 0x0000;
dos_h.e_oeminfo     = 0x0000;
dos_h.e_lfanew      = 0x0040; // Address of the NT Header

Creating NT Header

After we created the dos header the next header must nt header which contains all the important information about the pe file, a nt header contains :

Signature
File Header
Optional Header

All 3 parts is included in a single struct which is MAGE_NT_HEADERS and to create that we simply initialize it and set the following values :

C++

// Initializing Nt Header
IMAGE_NT_HEADERS    nt_h;
memset(&nt_h, NULL, sizeof IMAGE_NT_HEADERS);
nt_h.Signature                                          = IMAGE_NT_SIGNATURE;
nt_h.FileHeader.Machine                                 = IMAGE_FILE_MACHINE_AMD64;
nt_h.FileHeader.NumberOfSections                        = 2;
nt_h.FileHeader.TimeDateStamp                           = 0x00000000; // Must Update
nt_h.FileHeader.PointerToSymbolTable                    = 0x0;
nt_h.FileHeader.NumberOfSymbols                         = 0x0;
nt_h.FileHeader.SizeOfOptionalHeader                    = 0x00F0;
nt_h.FileHeader.Characteristics                         = 0x0022;     // Must Update 
nt_h.OptionalHeader.Magic                               = IMAGE_NT_OPTIONAL_HDR64_MAGIC;
nt_h.OptionalHeader.MajorLinkerVersion                  = 10;
nt_h.OptionalHeader.MinorLinkerVersion                  = 0x05;
nt_h.OptionalHeader.SizeOfCode                          = 0x00000200; // Must Update
nt_h.OptionalHeader.SizeOfInitializedData               = 0x00000200; // Must Update
nt_h.OptionalHeader.SizeOfUninitializedData             = 0x0;
nt_h.OptionalHeader.AddressOfEntryPoint                 = 0x00001000; // Must Update
nt_h.OptionalHeader.BaseOfCode                          = 0x00001000;
nt_h.OptionalHeader.ImageBase                           = 0x0000000140000000;
nt_h.OptionalHeader.SectionAlignment                    = 0x00001000;
nt_h.OptionalHeader.FileAlignment                       = 0x00000200;
nt_h.OptionalHeader.MajorOperatingSystemVersion         = 0x0;
nt_h.OptionalHeader.MinorOperatingSystemVersion         = 0x0;
nt_h.OptionalHeader.MajorImageVersion                   = 0x0006;
nt_h.OptionalHeader.MinorImageVersion                   = 0x0000;
nt_h.OptionalHeader.MajorSubsystemVersion               = 0x0006;
nt_h.OptionalHeader.MinorSubsystemVersion               = 0x0000;
nt_h.OptionalHeader.Win32VersionValue                   = 0x0;
nt_h.OptionalHeader.SizeOfImage                         = 0x00003000; // Must Update
nt_h.OptionalHeader.SizeOfHeaders                       = 0x00000200;
nt_h.OptionalHeader.CheckSum                            = 0xFFFFFFFF; // Must Update
nt_h.OptionalHeader.Subsystem                           = IMAGE_SUBSYSTEM_WINDOWS_CUI;
nt_h.OptionalHeader.DllCharacteristics                  = 0x0120;
nt_h.OptionalHeader.SizeOfStackReserve                  = 0x0000000000100000;
nt_h.OptionalHeader.SizeOfStackCommit                   = 0x0000000000001000;
nt_h.OptionalHeader.SizeOfHeapReserve                   = 0x0000000000100000;
nt_h.OptionalHeader.SizeOfHeapCommit                    = 0x0000000000001000;
nt_h.OptionalHeader.LoaderFlags                         = 0x00000000;
nt_h.OptionalHeader.NumberOfRvaAndSizes                 = 0x00000010;

NOTE : MAGE_NT_HEADERS is based on CPU architecture you set for the project.

in this article it produces MAGE_NT_HEADERS64.

Creating Sections

Now we have a dos header and a nt header and the only thing is left are sections! Sections contain every data in the pe file and they have their headers too, So we need to initialize a header for them and then write the data at the addressed offsets, for creating headers we use IMAGE_SECTION_HEADER struct :

C++

// Initializing Section [ Code ]
IMAGE_SECTION_HEADER    c_sec;
memset(&c_sec, NULL, sizeof IMAGE_SECTION_HEADER);
c_sec.Name[0] = '[';
c_sec.Name[1] = ' ';
c_sec.Name[2] = 'H';
c_sec.Name[3] = '.';
c_sec.Name[4] = 'M';
c_sec.Name[5] = ' ';
c_sec.Name[6] = ']';
c_sec.Name[7] = 0x0;
c_sec.Misc.VirtualSize                  = 0x00001000;   // Virtual Size
c_sec.VirtualAddress                    = 0x00001000;   // Virtual Address
c_sec.SizeOfRawData                     = 0x00000600;   // Raw Size
c_sec.PointerToRawData                  = 0x00000200;   // Raw Address
c_sec.PointerToRelocations              = 0x00000000;   // Reloc Address
c_sec.PointerToLinenumbers              = 0x00000000;   // Line Numbers
c_sec.NumberOfRelocations               = 0x00000000;   // Reloc Numbers
c_sec.NumberOfLinenumbers               = 0x00000000;   // Line Numbers Number
c_sec.Characteristics                   = IMAGE_SCN_MEM_EXECUTE   | 
    IMAGE_SCN_MEM_READ    |
    IMAGE_SCN_CNT_CODE    ;

// Initializing Section [ Data ]
IMAGE_SECTION_HEADER    d_sec;
memset(&d_sec, NULL, sizeof IMAGE_SECTION_HEADER);
d_sec.Name[0] = '[';
d_sec.Name[1] = ' ';
d_sec.Name[2] = 'H';
d_sec.Name[3] = '.';
d_sec.Name[4] = 'M';
d_sec.Name[5] = ' ';
d_sec.Name[6] = ']';
d_sec.Name[7] = 0x0;
d_sec.Misc.VirtualSize                  = 0x00000200;   // Virtual Size
d_sec.VirtualAddress                    = 0x00002000;   // Virtual Address
d_sec.SizeOfRawData                     = 0x00000200;   // Raw Size
d_sec.PointerToRawData                  = 0x00000800;   // Raw Address
d_sec.PointerToRelocations              = 0x00000000;   // Reloc Address
d_sec.PointerToLinenumbers              = 0x00000000;   // Line Numbers
d_sec.NumberOfRelocations               = 0x00000000;   // Reloc Numbers
d_sec.NumberOfLinenumbers               = 0x00000000;   // Line Numbers Number
d_sec.Characteristics                   = IMAGE_SCN_CNT_INITIALIZED_DATA |
    IMAGE_SCN_MEM_READ;

Creating PE File

Great! Now we are all set and ready to write the pe file to disk, to perform this use the following code :

C++

// Create/Open PE File
fstream pe_writter;
pe_writter.open(output_pe_file, ios::binary | ios::out);

// Write DOS Header
pe_writter.write((char*)&dos_h, sizeof dos_h);

// Write NT Header
pe_writter.write((char*)&nt_h, sizeof nt_h);

// Write Headers of Sections
pe_writter.write((char*)&c_sec, sizeof c_sec);
pe_writter.write((char*)&d_sec, sizeof d_sec);

// Add Padding
while (pe_writter.tellp() != c_sec.PointerToRawData) pe_writter.put(0x0);

// Write Code Section
pe_writter.put(0xC3); // Empty PE Return Opcode
for (size_t i = 0; i < c_sec.SizeOfRawData - 1; i++) pe_writter.put(0x0);

// Write Data Section
for (size_t i = 0; i < d_sec.SizeOfRawData; i++) pe_writter.put(0x0);

// Close PE File
pe_writter.close();

Now run your packer and see the magic!

Download 02_pe_packer_tutorial_packer_chapter1_vs16_x64.zip

Packer : Main Implementation

Ok now we have our pe parser and pe generator, It's time to develop the packer itself, to perform this operation we use fast-lzma2 for compression and AES-256 for encryption, then we will write the data to pe file.

NOTE : I chose fast-lzma2 for compression because it's fast and produce very high ratio compression.

You can use zlib or any compression library you want.

Adding Required Libraries

Clone fast-lzma2 repo and add it to your project using static linking.
Clone tiny-aes-c repo and add it to your project.

Also you can use tiny-aes-c shellcodes that we generated in previous part of the article.

Add libraries headers and libs like this :

C++

// Encryption Library
extern "C"
{
    #include "aes.h"
}

// Compression Library
#include "lzma2\fast-lzma2.h"
#pragma comment(lib, "lzma2\\fast-lzma2.lib")

Compressing/Encrypting Data

And finally we compress and encrypt the entire input pe file like this:

C++

// <----- Packing Data ( Main Implementation ) ----->
printf("[Information] Initializing AES Cryptor...\n");
struct AES_ctx ctx;
const unsigned char key[32] = {
    0xD6, 0x23, 0xB8, 0xEF, 0x62, 0x26, 0xCE, 0xC3, 0xE2, 0x4C, 0x55, 0x12,
    0x7D, 0xE8, 0x73, 0xE7, 0x83, 0x9C, 0x77, 0x6B, 0xB1, 0xA9, 0x3B, 0x57,
    0xB2, 0x5F, 0xDB, 0xEA, 0x0D, 0xB6, 0x8E, 0xA2
};
const unsigned char iv[16] = {
    0x18, 0x42, 0x31, 0x2D, 0xFC, 0xEF, 0xDA, 0xB6, 0xB9, 0x49, 0xF1, 0x0D,
    0x03, 0x7E, 0x7E, 0xBD
};
AES_init_ctx_iv(&ctx, key, iv);

printf("[Information] Initializing Compressor...\n");
FL2_CCtx* cctx = FL2_createCCtxMt(8);
FL2_CCtx_setParameter(cctx, FL2_p_compressionLevel, 9);
FL2_CCtx_setParameter(cctx, FL2_p_dictionarySize, 1024);

vector<uint8_t> data_buffer;
data_buffer.resize(input_pe_file_buffer.size());

printf("[Information] Compressing Buffer...\n");
size_t original_size = input_pe_file_buffer.size();
size_t compressed_size = FL2_compressCCtx(cctx, data_buffer.data(), data_buffer.size(),
                                          input_pe_file_buffer.data(), original_size, 9);
data_buffer.resize(compressed_size);

// Add Padding Before Encryption
for (size_t i = 0; i < 16; i++) data_buffer.insert(data_buffer.begin(), 0x0);
for (size_t i = 0; i < 16; i++) data_buffer.push_back(0x0);

printf("[Information] Encrypting Buffer...\n");
AES_CBC_encrypt_buffer(&ctx, data_buffer.data(), data_buffer.size());

// Log Compression Information
printf("[Information] Original PE Size :  %ld bytes\n", input_pe_file_buffer.size());
printf("[Information] Packed PE Size   :  %ld bytes\n", data_buffer.size());

// Calculate Compression Ratio
float ratio = 
    (1.0f - ((float)data_buffer.size() / (float)input_pe_file_buffer.size())) * 100.f;
printf("[Information] Compression Ratio : %.2f%%\n", (roundf(ratio * 100.0f) * 0.01f));

NOTE : As I said before we're not going to perform pe packer routine that used by most of pe packers, we don't

encrypt/compress the code section and recover it at the runtime and we don't manipulate input pe file.

Instead of the routine we use a pe loader to load and map the entire pe file to the memory and call the entrypoint.

Writing Data to PE File and Updating Alignments

Now we need to write the packed data into the generated pe file, Follow the steps :

Add these macros to the global scope :

#define file_alignment_size         512   // Default Hard Disk Block Size (0x200)
#define memory_alignment_size       4096  // Default Memory Page Size (0x1000)

Add this function to the global scope :

C++

inline DWORD _align(DWORD size, DWORD align, DWORD addr = 0) 
{
    if (!(size % align)) return addr + size;
    return addr + (size / align + 1) * align;
}

Alignment is a very important operation while working on pe files, learning it is very useful!

Update the following values and codes using alignments:

C++

nt_h.OptionalHeader.SectionAlignment                    = memory_alignment_size;
nt_h.OptionalHeader.FileAlignment                       = file_alignment_size;

C++

d_sec.Misc.VirtualSize          = _align(data_buffer.size(), memory_alignment_size);
d_sec.VirtualAddress            = c_sec.VirtualAddress + c_sec.Misc.VirtualSize;
d_sec.SizeOfRawData             = _align(data_buffer.size(), file_alignment_size);
d_sec.PointerToRawData          = c_sec.PointerToRawData + c_sec.SizeOfRawData;

C++

// Write Data Section
size_t current_pos = pe_writter.tellp();
pe_writter.write((char*)data_buffer.data(), data_buffer.size());
while (pe_writter.tellp() != current_pos + d_sec.SizeOfRawData) pe_writter.put(0x0);

// Releasing And Finalizing
vector<uint8_t>().swap(input_pe_file_buffer);
vector<uint8_t>().swap(data_buffer);
CONSOLE_COLOR_SUCCSESS;
printf("[Information] PE File Packed Successfully.");
return EXIT_SUCCESS;

Build the project and test it, your packer should generate a valid working pe file that contains the packed data.

Unpacker : Stub Implementation

Alright! If you're still with me, it's time to generate the unpacker machine code and put it inside the code section, to perform this we need to generate a unpacker stub, Open unpacker.cpp and add fast-lzma2 and tiny-aes-c to the project just like you did for the packer and setup the values and keys, Now we need to create some variables that we can modify and manipulate from the packer :

C++

volatile PVOID data_ptr                 = (void*)0xAABBCCDD;
volatile DWORD data_size                = 0xEEFFAADD;
volatile DWORD actual_data_size         = 0xA0B0C0D0;

why volatile keyword? simple... to stop the compiler from optimizing them away and keep optimization at the same time, it's a win-win ;)

Code should look like this :

C++

// unpacker.cpp (unpacker_stub project)
#include <Windows.h>

// Encryption Library
extern "C"
{
    #include "aes.h"
}

// Compression Library
#include "lzma2\fast-lzma2.h"

// WARNING : If you faced error using pragma, try adding lib file in linker settings
#pragma comment(lib, "lzma2\\fast-lzma2.lib") 

// Merge Data With Code
#pragma comment(linker, "/merge:.rdata=.text")

// Entrypoint
void func_unpack()
{
    // Internal Data [ Signatures ]
    volatile PVOID data_ptr                 = (void*)0xAABBCCDD;
    volatile DWORD data_size                = 0xEEFFAADD;
    volatile DWORD actual_data_size         = 0xA0B0C0D0;
    volatile DWORD header_size              = 0xF0E0D0A0;
    
    // Initializing Resolvers
    k32_init(); crt_init();

    // Getting BaseAddress of Module
    intptr_t imageBase = (intptr_t)GetModuleHandleA(0);
    data_ptr = (void*)((intptr_t)data_ptr + imageBase);

    // Initializing Cryptor
    struct AES_ctx ctx;
    const unsigned char key[32] = {
    0xD6, 0x23, 0xB8, 0xEF, 0x62, 0x26, 0xCE, 0xC3, 0xE2, 0x4C, 0x55, 0x12,
    0x7D, 0xE8, 0x73, 0xE7, 0x83, 0x9C, 0x77, 0x6B, 0xB1, 0xA9, 0x3B, 0x57,
    0xB2, 0x5F, 0xDB, 0xEA, 0x0D, 0xB6, 0x8E, 0xA2
    };
    const unsigned char iv[16] = {
        0x18, 0x42, 0x31, 0x2D, 0xFC, 0xEF, 0xDA, 0xB6, 0xB9, 0x49, 0xF1, 0x0D,
        0x03, 0x7E, 0x7E, 0xBD
    };
    AES_init_ctx_iv(&ctx, key, iv);

    // Casting PVOID to BYTE
    uint8_t* data_ptr_byte = (uint8_t*)data_ptr;

    // Decrypting Buffer
    AES_CBC_decrypt_buffer(&ctx, data_ptr_byte, data_size);

    // Allocating Code Buffer
    uint8_t* code_buffer = (uint8_t*)malloc(actual_data_size);

    // Decompressing Buffer
    FL2_decompress(code_buffer, actual_data_size, &data_ptr_byte[16], data_size - 32);
    memset(data_ptr, 0, data_size);
}

NOTE : We don't use lzma2 multi-threaded decompression because threading in shellcode is a very bad idea!

Unpacker : C Runtime and WinAPI Resolver

Ok now if you try to build unpacker_stub project you will face lots of unresolved external symbol errors.

This happens because we removed all the standard libraries such as msvcrt and kernel32, there's one solution for this and it's called lazy importing.

Lazy Importing Technique

In lazy importing we invoke system functions on the fly to use a function dynamically, to use this technique you will need this amazing single header library from a real genius Justas Masiulis.

First step you need to do is loading a library like this :

C++

uintptr_t msvcrtLib = reinterpret_cast<uintptr_t>(LI_FIND(LoadLibraryA)(_S("msvcrt.dll")));

Then invoke the functions of library like this :

C++

LI_GET(msvcrtLib, printf)("This is a message from dynamically loaded printf.\n");

And that's it! You can use any library and any function without a footprint in your pe image but the issue here is we have lots of functions in fast-lzma2 and replacing all of them with LI_GET function can be brutally time consuming!

Also it can produce lots of issues in the library code so I came up with this idea, What if I develop resolvers? It worked!

Developing Resolver

What is a resolver and how can we use it as a solution? Simple, we reimplement all the c runtime and winapi functions inside a simulated msvrct.lib and kernel32.lib (can be used on any other lib), Then we invoke all the original functions and redirect their function parameters into them then we return the result, this let us to create static libraries from any dynamic library!

For example this is how we resolve memcpy :

C++

// resolver.h
void crt_init();
void* ___memcpy(void* dst, const void* src, size_t size);

C++

// resolver.cpp
uintptr_t msvcrtLib = 0;
#define _VCRTFunc(fn) LI_GET(msvcrtLib,fn)
void crt_init()
{
    msvcrtLib = reinterpret_cast<uintptr_t>(LI_FIND(LoadLibraryA)(_S("msvcrt.dll")));
}

// Dynamic memcpy
void* ___memcpy(void* dst, const void* src, size_t size) 
{
    return _VCRTFunc(memcpy)(dst, src, size);
}

C++

// resolver_export.cpp
#include "resolver.h"

#define RESOLVER extern "C" 
RESOLVER void* __cdecl memcpy(void* dst, const void* src, size_t size) 
{ 
    return ___memcpy(dst, src, size); 
}

To reduce the size of article I avoid showing how to resolve all the needed functions or the process but you can easily do it by the provided example code, Also I included pre-built static lib files of my resolvers in the project source so feel free to save some time and use them.

Static Linking to Resolvers

Due to linkage ordering avoid using pragma to link against resolvers, instead use linker properties :

Go to the config of unpacker_stub project and head to Linker > General > Additional Library Directories and change it to ".\resolvers"
Go to Linker > Input > Additional Dependencies and add "msvrcrt.lib" and "kernel32.lib"
Go to VC++ Directories and and clear Library Directories and Library WinRT Directories to avoid linking against original libraries.

Create external functions header :

C++

// Resolvers Functions
extern "C" void crt_init();
extern "C" void k32_init();

Initialize resolvers after internal values and before initializing the cryptor :

C++
```
// Initializing Resolvers
k32_init();
crt_init();
```

Update section merging pragmas to this :

C++

// Merge Data With Code
#pragma comment(linker, "/merge:.rdata=.text")
#pragma comment(linker, "/merge:.data=.text")

Go to Linker > Command Line and enter "/EMITPOGOPHASEINFO /SECTION:.text,EWR" in Additional Options.
Go to Linker > Advanced and change Randomized Base Address to No (/DYNAMICBASE:NO)
Go to Linker > Advanced and change Fixed Base Address to Yes (/FIXED), This option prevents generation of relocation directory which causes code dependent to the stub pe file.

Now build and magic happens... Unpacker stub gets compiled successfully!

Unpacker : PE Loader/Mapper

It's time to add a pe loader/mapper to the unpacker and finalize the unpacker stub code, for this operation we use mmLoader library which is developed in pure C.

After adding library to the project and file add the following code to the end of unpacker stub code :

C++

// PE Loader Library
#include "mmLoader.h"

...

// Loading PE File
DWORD pe_loader_result = 0;
HMEMMODULE pe_module = LoadMemModule(code_buffer, true, &pe_loader_result);

This is it! Now build the project and you should get unpacker_stub.exe which contains just two sections :

.text : unpacker machine code
.pdata : contains exception directory which we don't need

Extract .text data using CFF Explorer or Hex Editor and convert it to a byte array like this :

C++

// unpacker_stub.h (pe_packer project)
unsigned char unpacker_stub[175104] = {
    0x63, 0x7C, 0x77, 0x7B, 0xF2, 0x6B, 0x6F, 0xC5, 0x30, 0x01, 0x67, 0x2B,
    0xFE, 0xD7, 0xAB, 0x76, 0xCA, 0x82, 0xC9, 0x7D, 0xFA, 0x59, 0x47, 0xF0,
    0xAD, 0xD4, 0xA2, 0xAF, 0x9C, 0xA4, 0x72, 0xC0, 0xB7, 0xFD, 0x93, 0x26
    ...

You can download the source of finished unpacker stub here.

Download 03_pe_packer_tutorial_packer_chapter2_vs16_x64.zip

Packer : Stub Generation

Include unpacker_stub.h in packer.cpp and apply the following changes to the code.

Add byte pattern searching helper function for finding and patching signatures in the unpacker stub :

C++

#include <algorithm>

...

inline DWORD _find(uint8_t* data, size_t data_size, DWORD& value)
{
    for (size_t i = 0; i < data_size; i++)
        if (memcmp(&data[i], &value, sizeof DWORD) == 0) return i;
    return -1;
}

Change section headers to this :

C++

// Initializing Section [ Code ]
IMAGE_SECTION_HEADER    c_sec;
memset(&c_sec, NULL, sizeof IMAGE_SECTION_HEADER);
c_sec.Name[0] = '[';
c_sec.Name[1] = ' ';
c_sec.Name[2] = 'H';
c_sec.Name[3] = '.';
c_sec.Name[4] = 'M';
c_sec.Name[5] = ' ';
c_sec.Name[6] = ']';
c_sec.Name[7] = 0x0;
c_sec.Misc.VirtualSize = _align(sizeof unpacker_stub, memory_alignment_size);
c_sec.VirtualAddress = memory_alignment_size;
c_sec.SizeOfRawData = sizeof unpacker_stub;
c_sec.PointerToRawData = file_alignment_size;
c_sec.Characteristics = 
    IMAGE_SCN_MEM_EXECUTE | IMAGE_SCN_MEM_READ |
    IMAGE_SCN_MEM_WRITE | IMAGE_SCN_CNT_CODE;

// Initializing Section [ Data ]
IMAGE_SECTION_HEADER    d_sec;
memset(&d_sec, NULL, sizeof IMAGE_SECTION_HEADER);
d_sec.Name[0] = '[';
d_sec.Name[1] = ' ';
d_sec.Name[2] = 'H';
d_sec.Name[3] = '.';
d_sec.Name[4] = 'M';
d_sec.Name[5] = ' ';
d_sec.Name[6] = ']';
d_sec.Name[7] = 0x0;
d_sec.Misc.VirtualSize = _align(data_buffer.size(), memory_alignment_size);
d_sec.VirtualAddress = c_sec.VirtualAddress + c_sec.Misc.VirtualSize;
d_sec.SizeOfRawData = _align(data_buffer.size(), file_alignment_size);
d_sec.PointerToRawData = c_sec.PointerToRawData + c_sec.SizeOfRawData;
d_sec.Characteristics = IMAGE_SCN_CNT_INITIALIZED_DATA |
    IMAGE_SCN_MEM_READ | IMAGE_SCN_MEM_WRITE;

Update generated pe headers, to perform this add the following code just after section headers :

C++

// Update PE Image Size
printf("[Information] Updating PE Information...\n");
nt_h.OptionalHeader.SizeOfImage = 
    _align(d_sec.VirtualAddress + d_sec.Misc.VirtualSize, memory_alignment_size);

// Update PE Informations
nt_h.FileHeader.Characteristics = in_pe_nt_header->FileHeader.Characteristics;
nt_h.FileHeader.TimeDateStamp = in_pe_nt_header->FileHeader.TimeDateStamp;
nt_h.OptionalHeader.CheckSum = 0xFFFFFFFF;
nt_h.OptionalHeader.SizeOfCode = c_sec.SizeOfRawData;
nt_h.OptionalHeader.SizeOfInitializedData = d_sec.SizeOfRawData;
nt_h.OptionalHeader.Subsystem = in_pe_nt_header->OptionalHeader.Subsystem;

// Update PE Entrypoint ( Taken from .map file )
nt_h.OptionalHeader.AddressOfEntryPoint = 0x00005940;

To get entrypoint offset from .map file simply search for func_unpacker and you find the offset there or you can simply copy the entrypoint from the unpacker_stub.exe using CFF Explorer.

Now we need to find unpacker stub signatures and patch them, Update pe writer code to the following code :

C++

// Create/Open PE File
printf("[Information] Writing Generated PE to Disk...\n");
fstream pe_writter;
pe_writter.open(output_pe_file, ios::binary | ios::out);

// Write DOS Header
pe_writter.write((char*)&dos_h, sizeof dos_h);

// Write NT Header
pe_writter.write((char*)&nt_h, sizeof nt_h);

// Write Headers of Sections
pe_writter.write((char*)&c_sec, sizeof c_sec);
pe_writter.write((char*)&d_sec, sizeof d_sec);

// Add Padding
while (pe_writter.tellp() != c_sec.PointerToRawData) pe_writter.put(0x0);

// Find Singuatures in Unpacker Stub
DWORD data_ptr_sig              = 0xAABBCCDD;
DWORD data_size_sig             = 0xEEFFAADD;
DWORD actual_data_size_sig      = 0xA0B0C0D0;
DWORD header_size_sig           = 0xF0E0D0A0;
DWORD data_ptr_offset           = _find(unpacker_stub, sizeof unpacker_stub, data_ptr_sig);
DWORD data_size_offset          = _find(unpacker_stub, sizeof unpacker_stub, data_size_sig);
DWORD actual_data_size_offset   = _find(unpacker_stub, sizeof unpacker_stub, actual_data_size_sig);
DWORD header_size_offset        = _find(unpacker_stub, sizeof unpacker_stub, header_size_sig);

// Log Singuatures Information
if (data_ptr_offset != -1)
    printf("[Information] Signature A Found at :  %X\n", data_ptr_offset);
if (data_size_offset != -1)
    printf("[Information] Signature B Found at :  %X\n", data_size_offset);
if (actual_data_size_offset != -1)
    printf("[Information] Signature C Found at :  %X\n", actual_data_size_offset);
if (header_size_offset != -1)
    printf("[Information] Signature D Found at :  %X\n", header_size_offset);

// Update Code Section
printf("[Information] Updating Offset Data...\n");
memcpy(&unpacker_stub[data_ptr_offset], &d_sec.VirtualAddress, sizeof DWORD);
memcpy(&unpacker_stub[data_size_offset], &d_sec.SizeOfRawData,  sizeof DWORD);
DWORD pe_file_actual_size = (DWORD)input_pe_file_buffer.size();
memcpy(&unpacker_stub[actual_data_size_offset], &pe_file_actual_size, sizeof DWORD);
memcpy(&unpacker_stub[header_size_offset], &nt_h.OptionalHeader.BaseOfCode, sizeof DWORD);

// Write Code Section
printf("[Information] Writing Code Data...\n");
pe_writter.write((char*)&unpacker_stub, sizeof unpacker_stub);

// Write Data Section
printf("[Information] Writing Packed Data...\n");
size_t current_pos = pe_writter.tellp();
pe_writter.write((char*)data_buffer.data(), data_buffer.size());
while (pe_writter.tellp() != current_pos + d_sec.SizeOfRawData) pe_writter.put(0x0);

// Close PE File
pe_writter.close();

Here we go, now let's try the packer... and... Congrats! You made your first pe packer!

You can download the full source code of packer and unpacker stub here.

Download 04_pe_packer_tutorial_packer_chapter3_vs16_x64.zip

Packer : Dynamic Linking Support + Export Table Creation

Now our packer can pack an exe file and produce a new working exe file but what if we want to pack a dll with its exports? To perform this we need to create an export table for our output pe file and then redirect the calls to the actual module.

This process is not easy as previous parts, In fact it's very complex and needs a iron brain to solve it but don't worry I crashed my mind to solve it for you, so let's start and add dll support to our packer!

A ) Update & Make Unpacker Stub DLL Friendly

Currently our unpacker stub code isn't designed for dll entrypoint, We need to change it to make sure it will pass the dll initialization routine properly, also we need to add two extra values which are explained in next section.

New unpacker stub must look like this :

C++

// unpacker.cpp (unpacker_stub project)
// WinAPI Functions
#include <Windows.h>
#include <winnt.h>
EXTERN_C IMAGE_DOS_HEADER __ImageBase;

// Resolvers Functions
EXTERN_C void crt_init();
EXTERN_C void k32_init();

// Encryption Library
extern "C"
{
    #include "aes.h"
}

// Compression Library
#include "lzma2\fast-lzma2.h"

// PE Loader Library
#include "mmLoader.h"

// Merge Data With Code
#pragma comment(linker, "/merge:.rdata=.text")
#pragma comment(linker, "/merge:.data=.text")

// Cross Section Value
EXTERN_C static volatile uintptr_t      moduleImageBase = 0xBCEAEFBA;
EXTERN_C static volatile FARPROC        functionForwardingPtr = (FARPROC)0xCAFEBABE;

// External Functions
EXTERN_C BOOL CallModuleEntry(void* pMemModule_d, DWORD dwReason);

// Multi-Accessing Values
HMEMMODULE pe_module = 0;

// Entrypoint (EXE/DLL)
BOOL func_unpack(void*, int reason, void*)
{
    // Releasing DLL PE Module
    if (reason == DLL_PROCESS_DETACH) 
    { CallModuleEntry(pe_module, DLL_PROCESS_DETACH); FreeMemModule(pe_module); return TRUE; };

    // Handling DLL Thread Events
    if (reason == DLL_THREAD_ATTACH) return CallModuleEntry(pe_module, DLL_THREAD_ATTACH);
    if (reason == DLL_THREAD_DETACH) return CallModuleEntry(pe_module, DLL_THREAD_DETACH);

    // Internal Data [ Signatures ]
    volatile PVOID data_ptr = (void*)0xAABBCCDD;
    volatile DWORD data_size = 0xEEFFAADD;
    volatile DWORD actual_data_size = 0xA0B0C0D0;
    volatile DWORD header_size = 0xF0E0D0A0;

    // Initializing Resolvers
    k32_init(); crt_init();

    // Getting BaseAddress of Module
    intptr_t imageBase = (intptr_t)&__ImageBase;
    data_ptr = (void*)((intptr_t)data_ptr + imageBase);

    // Initializing Cryptor
    struct AES_ctx ctx;
    const unsigned char key[32] = {
    0xD6, 0x23, 0xB8, 0xEF, 0x62, 0x26, 0xCE, 0xC3, 0xE2, 0x4C, 0x55, 0x12,
    0x7D, 0xE8, 0x73, 0xE7, 0x83, 0x9C, 0x77, 0x6B, 0xB1, 0xA9, 0x3B, 0x57,
    0xB2, 0x5F, 0xDB, 0xEA, 0x0D, 0xB6, 0x8E, 0xA2
    };
    const unsigned char iv[16] = {
    0x18, 0x42, 0x31, 0x2D, 0xFC, 0xEF, 0xDA, 0xB6, 0xB9, 0x49, 0xF1, 0x0D,
    0x03, 0x7E, 0x7E, 0xBD
    };
    AES_init_ctx_iv(&ctx, key, iv);

    // Casting PVOID to BYTE
    uint8_t* data_ptr_byte = (uint8_t*)data_ptr;

    // Decrypting Buffer
    AES_CBC_decrypt_buffer(&ctx, data_ptr_byte, data_size);

    // Allocating Code Buffer
    uint8_t* code_buffer = (uint8_t*)malloc(actual_data_size);

    // Decompressing Buffer
    FL2_decompress(code_buffer, actual_data_size, &data_ptr_byte[16], data_size - 32);
    memset(data_ptr, 0, data_size);

    // Loading PE Module
    DWORD pe_loader_result = 0;
    pe_module = LoadMemModule(code_buffer, false, &pe_loader_result);

    // Set Image Base
    moduleImageBase = (uintptr_t)*pe_module;
    functionForwardingPtr = 0;

    // Call Entrypoint
    return CallModuleEntry(pe_module, DLL_PROCESS_ATTACH);
}

Now let me explain the updated parts for you ;)

We changed the func_unpack return type to BOOL and we add 3 parameters (dllmain routine) :

C++
```
BOOL func_unpack(void*, int reason, void*)
```
We should update the way we obtain image address, in exe we can just use GetModuleHandle but in dll we can't, so we use __ImageBase external value to obtain it, yes we can use first parameter of the func_unpack function which is hInstance but it only works for dlls, by using __ImageBase we will get the right value in any kind of pe file.

C++
```
#include <winnt.h>
EXTERN_C IMAGE_DOS_HEADER __ImageBase;
...
// Getting BaseAddress of Module
intptr_t imageBase = (intptr_t)&__ImageBase;
```
We need to get control over entrypoint calling of our dynamically loaded module, so we should make some simple changes to mmLoader and make CallModuleEntry function public, then we use it to call entrypoint manually after we loaded our module from memory :

C++
```
// External Functions
EXTERN_C BOOL CallModuleEntry(void* pMemModule_d, DWORD dwReason);
...
// Changes in mmLoader.c
BOOL CallModuleEntry(void* pMemModule_d, DWORD dwReason) 
{
  PMEM_MODULE pMemModule = pMemModule_d;
...
```

We should handle dll events to avoid memory leak, crash or data loss on detaching :

C++

// Releasing DLL PE Module
if (reason == DLL_PROCESS_DETACH) 
{ CallModuleEntry(pe_module, DLL_PROCESS_DETACH); FreeMemModule(pe_module); return TRUE; };

// Handling DLL Thread Events
if (reason == DLL_THREAD_ATTACH) return CallModuleEntry(pe_module, DLL_THREAD_ATTACH);
if (reason == DLL_THREAD_DETACH) return CallModuleEntry(pe_module, DLL_THREAD_DETACH);

We added two static values which we use and access them in next steps, this values can be accessed cross the pe sections.

C++

// Cross Section Value
EXTERN_C static volatile uintptr_t      moduleImageBase = 0xBCEAEFBA;
EXTERN_C static volatile FARPROC        functionForwardingPtr = (FARPROC)0xCAFEBABE;

And finally we updated pe loading flow and set our values, what are this values? Keep reading!

C++

// Loading PE Module
DWORD pe_loader_result = 0;
pe_module = LoadMemModule(code_buffer, false, &pe_loader_result);

// Set Image Base
moduleImageBase = (uintptr_t)*pe_module;
functionForwardingPtr = 0;

// Call Entrypoint
return CallModuleEntry(pe_module, DLL_PROCESS_ATTACH);

Don't forget to move out pe_module to global scope so it can be accessed on each event call. After compiling the unpacker stub and update the raw array in packer project and update entrypoint offset, it should work for both exe and dll, now let's go for exporting table.

B ) Adding Pattern Search in Packer for New Unpacker Stub Values

Alright now head to packer.cpp and add new pattern search code just after the line we updated entrypoint value

C++

// Update PE Entrypoint ( Taken from .map file )
nt_h.OptionalHeader.AddressOfEntryPoint = 0x00005F10;

// Get Const Values Offset In Unpacker
DWORD imagebase_value_sig = 0xBCEAEFBA;
DWORD imageBaseValueOffset = _find(unpacker_stub, sizeof unpacker_stub, imagebase_value_sig);
memset(&unpacker_stub[imageBaseValueOffset], NULL, sizeof uintptr_t);
if (imageBaseValueOffset != -1)
    printf("[Information] ImageBase Value Signature Found at :  %X\n", imageBaseValueOffset);
DWORD forwarding_value_sig = 0xCAFEBABE;
DWORD forwarding_value_offset = _find(unpacker_stub, sizeof unpacker_stub, forwarding_value_sig);
memset(&unpacker_stub[forwarding_value_offset], NULL, sizeof FARPROC);
if (imageBaseValueOffset != -1)
    printf("[Information] Function Forwading Value Signature Found at :  %X\n", forwarding_value_offset);

C ) Adding Export Section/Table/Code Generation Step

Now it's time to add a step that detect if we're packing a dll file, Add the following code just after pattern search :

C++

// Create Export Table ( Section [ Export ] )
IMAGE_SECTION_HEADER et_sec;
memset(&et_sec, NULL, sizeof IMAGE_SECTION_HEADER);
bool hasExports = false; vector<uint8_t> et_buffer;

if (isDLL)
{
    // We Generate Export Section, Export Table and Export Code Here
}

D ) Extracting Export Information from Input PE File

We're all set to start working on exports, the first step we need to do is finding out if input pe file has any export :

C++

if (isDLL)
{
    uint8_t export_section_index = 0;
    int export_section_raw_addr = -1;

    // Get Export Table Information
    IMAGE_DATA_DIRECTORY ex_table = 
        in_pe_nt_header->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT];
    if (ex_table.VirtualAddress != 0) hasExports = true;

    printf("[Information] Has Exports : %s\n", BOOL_STR(hasExports));
    
    if (hasExports)
    {
        printf("[Information] Creating Export Table...\n");
        // We Have Exports on Input PE File!
    }
}

Now we get the RVA (Relative Virtual Address) of input pe export directory and we calculate at which virtual address our export section is located :

C++

// Export Directory RVA
DWORD e_dir_rva = ex_table.VirtualAddress;
DWORD et_sec_virtual_address = d_sec.VirtualAddress + d_sec.Misc.VirtualSize;

printf("[Information] Input PE File Section Count : %d\n", in_pe_nt_header->FileHeader.NumberOfSections);

Then we iterate over input pe sections and find out which one contains dll exports data :

C++

// Get Section Macro
#define GET_SECTION(h,s) (uintptr_t)IMAGE_FIRST_SECTION(h) + ((s) * sizeof IMAGE_SECTION_HEADER)

...

// Find Export Section in Input PE File
for (size_t i = 0; i < in_pe_nt_header->FileHeader.NumberOfSections; i++)
{
    IMAGE_SECTION_HEADER* get_sec = (PIMAGE_SECTION_HEADER)(GET_SECTION(in_pe_nt_header, i));
    IMAGE_SECTION_HEADER* get_next_sec = (PIMAGE_SECTION_HEADER)(GET_SECTION(in_pe_nt_header, i + 1));

    if (e_dir_rva > get_sec->VirtualAddress &&
        e_dir_rva < get_next_sec->VirtualAddress &&
        (i + 1) <= in_pe_nt_header->FileHeader.NumberOfSections)
    {
        export_section_index = i; break;
    };
}

printf("[Information] Export Section Found At %dth Section\n", export_section_index + 1);

if (export_section_index != -1)
{
    // Actual Export Generation Happens Here
}

Alright, Let's talk about how we're going to perform the process of dll exports generation before we step into the dragon's mouth...

E ) Understanding The Concept of DLL Export Forwarding

Before we continue you need to know how DLL Exports work and what's the design behind them, DLL Exports are made of:

Export Directory : It's a image directory that contains two values, RVA of export table and size of it, from RVA we can find out which section contains the export table.
Export Section : It's a section which contains export table and export data, Also it can contain export code too.
Export Table : It's a structure that contains basic information about dll exports, where are they located and how many are they, where's export data located and what's it's RVA.
Export Data : It contains a list of functions Names RVA, Names, Ordinals and RVAs.
- Names RVA : It's a RVA that points to the string name of the function which ends with a null string literal.
- Function RVA : It's a RVA that points to the machine code of the function, export code!
Export Code : It's an array of functions machine code, this code can be inside .text section or any other section, In our packer we will generate the code inside the same export section using a base machine code.

NOTE : In this article I mentioned there's no need to assembly knowledge but this part needs a little bit assembly but since it's not complex nor dynamic we just use a small pre-generated machine code.

What is Function Forwarding? In programming function forwarding means a jump from a function call to another without messing with the function parameters.

It can be performed by several techniques known as dll hijacking, dll proxying, machine code redirection and etc. In our pe packer we generate a small piece of machine code (32 bytes) which locates loaded module image base address then sum it up with real function offset and finally we add a jump to it.

This is the assembly code we will use for function forwarding :

Assembly

PUSH RCX
PUSH RAX
MOV RAX,QWORD PTR DS:[(Image Base Address)]
MOV ECX, (Function Offset)
ADD RAX,RCX
MOV QWORD PTR DS:[(Function Offset + Image Base Address)],RAX
POP RAX                                    
POP RCX
JMP QWORD PTR DS:[(Function Offset + Image Base Address)] /* < Jump */

So basically after we set the image base of dynamic module in unpacker to Image Base Address we can easily add the Function Offset to it and after we sum it up we set the value to the second static value holder Function Offset + Image Base Address and we jump into it, that's it!

F ) Cloning Input PE Export Table, Make Changes and Rebase

Alright now that you know how the gears work it's time to start the hard part, before keep going add this useful macros to ease the process :

#define GET_SECTION(h,s) (uintptr_t)IMAGE_FIRST_SECTION(h) + ((s) * sizeof IMAGE_SECTION_HEADER)
#define RVA_TO_FILE_OFFSET(rva,membase,filebase) ((rva - membase) + filebase)
#define RVA2OFS_EXP(rva) (input_pe_file_buffer.data() +  \
    (RVA_TO_FILE_OFFSET(rva, in_pe_exp_sec->VirtualAddress, in_pe_exp_sec->PointerToRawData)))
#define REBASE_RVA(rva) ((rva - in_pe_exp_sec->VirtualAddress + et_sec_virtual_address) - \
                            (e_dir_rva - in_pe_exp_sec->VirtualAddress))

Now we parse input pe export section like this and now we have access to export table data :

C++

printf("[Information] Parsing Input PE Export Section...\n");

// Get Export Directory
PIMAGE_SECTION_HEADER in_pe_exp_sec = (PIMAGE_SECTION_HEADER)(GET_SECTION(in_pe_nt_header, export_section_index));
PIMAGE_EXPORT_DIRECTORY e_dir = (PIMAGE_EXPORT_DIRECTORY)RVA2OFS_EXP(e_dir_rva);
DWORD e_dir_size = in_pe_nt_header->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT].Size;

printf("[Information] Export Section Name : %s\n", in_pe_exp_sec->Name);

// Extracting Input Binary Export Table
PULONG  in_et_fn_tab = (PULONG)RVA2OFS_EXP(e_dir->AddressOfFunctions);
PULONG  in_et_name_tab = (PULONG)RVA2OFS_EXP(e_dir->AddressOfNames);
PUSHORT in_et_ordianl_tab = (PUSHORT)RVA2OFS_EXP(e_dir->AddressOfNameOrdinals);
uintptr_t in_et_data_start = (uintptr_t)in_et_fn_tab;
DWORD in_et_last_fn_name_size = strlen((char*)RVA2OFS_EXP(in_et_name_tab[e_dir->NumberOfNames - 1])) + 1;
uintptr_t in_et_data_end = (uintptr_t)(RVA2OFS_EXP(in_et_name_tab[e_dir->NumberOfNames - 1]) + in_et_last_fn_name_size);

Then we simply rebase them using our macro like this :

C++

// Rebase Export Table Addresses
printf("[Information] Rebasing Expor Table Addresses...\n");
e_dir->AddressOfFunctions = REBASE_RVA(e_dir->AddressOfFunctions);
e_dir->AddressOfNames = REBASE_RVA(e_dir->AddressOfNames);
e_dir->AddressOfNameOrdinals = REBASE_RVA(e_dir->AddressOfNameOrdinals);
for (size_t i = 0; i < e_dir->NumberOfNames; i++) in_et_name_tab[i] = REBASE_RVA(in_et_name_tab[i]);

After we rebased them we copy the export directory data to our new pe file :

C++

// Generate Export Table Direcotry Data
et_buffer.resize(e_dir_size);
memcpy(et_buffer.data(), e_dir, sizeof IMAGE_EXPORT_DIRECTORY);

G ) Generating Exports Machine Code

Now we're ready to generate machine code for our exports, to perform this simply add this small piece of machine code template to your source code right after helpers :

C++

// Machine Code
unsigned char func_forwarding_code[32] = 
{
    0x51, 0x50,                                         // PUSH RCX, PUSH RAX
    0x48, 0x8B, 0x05,   0x00, 0x00, 0x00, 0x00,         // MOV RAX,QWORD PTR DS:[OFFSET]
    0xB9,               0x00, 0x00, 0x00, 0x00,         // MOV ECX,VALUE
    0x48, 0x03, 0xC1,                                   // ADD RAX,RCX
    0x48, 0x89, 0x05,   0x00, 0x00, 0x00, 0x00,         // MOV QWORD PTR DS:[OFFSET],RAX
    0x58, 0x59,                                         // POP RAX, POP RCX
    0xFF, 0x25,         0x00, 0x00, 0x00, 0x00,         // JMP QWORD PTR DS:[OFFSET]
};

After this we need to allocate a temporary buffer, calculate image base RVA, current code block RVA and offsets, then we simply set values in machine code byte array and add it to the temporary buffer and after we're done, we add it to our export section :

C++

// Generate Export Table Codes
printf("[Information] Generating Function Forwarding Code...\n");
DWORD ff_code_buffer_size = sizeof func_forwarding_code * e_dir->NumberOfFunctions;
uint8_t* ff_code_buffer = (uint8_t*)malloc(ff_code_buffer_size);
DWORD image_base_rva = c_sec.VirtualAddress + imageBaseValueOffset;
DWORD ff_value_rva = c_sec.VirtualAddress + forwarding_value_offset;
for (size_t i = 0; i < e_dir->NumberOfFunctions; i++)
{
    DWORD func_offset = in_et_fn_tab[in_et_ordianl_tab[i]];
    DWORD machine_code_offset = i * sizeof func_forwarding_code;
    DWORD machine_code_rva = et_buffer.size() + machine_code_offset + et_sec_virtual_address;

    // Machine Code Data
    int32_t* offset_to_image_base       = (int32_t*)&func_forwarding_code[5];
    int32_t* function_offset_value      = (int32_t*)&func_forwarding_code[10];
    int32_t* offset_to_func_addr        = (int32_t*)&func_forwarding_code[20];
    int32_t* offset_to_func_addr2       = (int32_t*)&func_forwarding_code[28];

    offset_to_image_base[0]     = (image_base_rva - machine_code_rva) - (5 + sizeof int32_t);
    function_offset_value[0]    = func_offset;
    offset_to_func_addr[0]      = (ff_value_rva - machine_code_rva) - (20 + sizeof int32_t);
    offset_to_func_addr2[0]     = (ff_value_rva - machine_code_rva) - (28 + sizeof int32_t);
    memcpy(&ff_code_buffer[machine_code_offset], func_forwarding_code, sizeof func_forwarding_code);

    // Update Function Address
    in_et_fn_tab[i] = et_sec_virtual_address + et_buffer.size() + (i * sizeof func_forwarding_code);
}

// Copy Updated Export Table Data
DWORD et_data_size = in_et_data_end - in_et_data_start;
memcpy(&et_buffer.data()[sizeof IMAGE_EXPORT_DIRECTORY], (void*)in_et_data_start, et_data_size);

// Merge Export Table and Export Data Buffers
DWORD size_of_export_table = et_buffer.size();
et_buffer.resize(size_of_export_table + ff_code_buffer_size);
memcpy(&et_buffer.data()[size_of_export_table], (void*)ff_code_buffer, ff_code_buffer_size);
free(ff_code_buffer);

That's it! We generated everything we needed and now we just need to generate a new section header for exports :

C++

// Generate Export Table Section
et_sec.Name[0] = '[';
et_sec.Name[1] = ' ';
et_sec.Name[2] = 'H';
et_sec.Name[3] = '.';
et_sec.Name[4] = 'M';
et_sec.Name[5] = ' ';
et_sec.Name[6] = ']';
et_sec.Name[7] = 0x0;
et_sec.Misc.VirtualSize = _align(et_buffer.size(), memory_alignment_size);
et_sec.VirtualAddress = et_sec_virtual_address;
et_sec.SizeOfRawData = _align(et_buffer.size(), file_alignment_size);
et_sec.PointerToRawData = d_sec.PointerToRawData + d_sec.SizeOfRawData;
et_sec.Characteristics = IMAGE_SCN_CNT_INITIALIZED_DATA | IMAGE_SCN_MEM_READ | IMAGE_SCN_MEM_EXECUTE | IMAGE_SCN_CNT_CODE;

And update our export table directory :

C++

// Update Export Table Directory
nt_h.OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT].VirtualAddress = et_sec.VirtualAddress;
nt_h.OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT].Size = e_dir_size;

And update section counts and size of image :

C++

// Update PE Headers
nt_h.FileHeader.NumberOfSections = 3;

// Update PE Image Size
nt_h.OptionalHeader.SizeOfImage =
    _align(et_sec.VirtualAddress + et_sec.Misc.VirtualSize, memory_alignment_size);

And it's done! Wasn't that so hard right? 🍎

H ) Writing DLL Exports to PE File

Of course we need to make some changes to write exports to dll file properly, just after finishing writing data section to file we write export section, also we should share current_pos value :

C++

// Write Export Section
if (et_buffer.size() != 0 && hasExports)
{
    printf("[Information] Writing Export Table Data...\n");
    current_pos = pe_writter.tellp();
    pe_writter.write((char*)et_buffer.data(), et_buffer.size());
    while (pe_writter.tellp() != current_pos + et_sec.SizeOfRawData) pe_writter.put(0x0);
}

So the final pe file generation must look like this :

C++

// Create/Open PE File
printf("[Information] Writing Generated PE to Disk...\n");
fstream pe_writter;
size_t current_pos;
pe_writter.open(output_pe_file, ios::binary | ios::out);

// Write DOS Header
pe_writter.write((char*)&dos_h, sizeof dos_h);

// Write NT Header
pe_writter.write((char*)&nt_h, sizeof nt_h);

// Write Headers of Sections
pe_writter.write((char*)&c_sec, sizeof c_sec);
pe_writter.write((char*)&d_sec, sizeof d_sec);
if(nt_h.FileHeader.NumberOfSections == 3) pe_writter.write((char*)&et_sec, sizeof et_sec);

// Add Padding
while (pe_writter.tellp() != c_sec.PointerToRawData) pe_writter.put(0x0);

// Find Singuatures in Unpacker Stub
DWORD data_ptr_sig              = 0xAABBCCDD;
DWORD data_size_sig             = 0xEEFFAADD;
DWORD actual_data_size_sig      = 0xA0B0C0D0;
DWORD header_size_sig           = 0xF0E0D0A0;
DWORD data_ptr_offset           = _find(unpacker_stub, sizeof unpacker_stub, data_ptr_sig);
DWORD data_size_offset          = _find(unpacker_stub, sizeof unpacker_stub, data_size_sig);
DWORD actual_data_size_offset   = _find(unpacker_stub, sizeof unpacker_stub, actual_data_size_sig);
DWORD header_size_offset        = _find(unpacker_stub, sizeof unpacker_stub, header_size_sig);

...

// Update Code Section
printf("[Information] Updating Offset Data...\n");
memcpy(&unpacker_stub[data_ptr_offset], &d_sec.VirtualAddress, sizeof DWORD);
memcpy(&unpacker_stub[data_size_offset], &d_sec.SizeOfRawData,  sizeof DWORD);
DWORD pe_file_actual_size = (DWORD)input_pe_file_buffer.size();
memcpy(&unpacker_stub[actual_data_size_offset], &pe_file_actual_size, sizeof DWORD);
memcpy(&unpacker_stub[header_size_offset], &nt_h.OptionalHeader.BaseOfCode, sizeof DWORD);

// Write Code Section
printf("[Information] Writing Code Data...\n");
current_pos = pe_writter.tellp();
pe_writter.write((char*)&unpacker_stub, sizeof unpacker_stub);
while (pe_writter.tellp() != current_pos + c_sec.SizeOfRawData) pe_writter.put(0x0);

// Write Data Section
printf("[Information] Writing Packed Data...\n");
current_pos = pe_writter.tellp();
pe_writter.write((char*)data_buffer.data(), data_buffer.size());
while (pe_writter.tellp() != current_pos + d_sec.SizeOfRawData) pe_writter.put(0x0);

// Write Export Section
if (et_buffer.size() != 0 && hasExports)
{
    printf("[Information] Writing Export Table Data...\n");
    current_pos = pe_writter.tellp();
    pe_writter.write((char*)et_buffer.data(), et_buffer.size());
    while (pe_writter.tellp() != current_pos + et_sec.SizeOfRawData) pe_writter.put(0x0);
}

// Close PE File
pe_writter.close();

Now our packer supports DLL files too! 🎉

You can download the full source code of packer and unpacker stub here.

Download 05_pe_packer_tutorial_packer_chapter4_vs16_x64.zip

Packer : File Version Generation

Alright, this is the last part of the article, Of course we can do a lot more and add even more features but I believe it became a very long article already, At the last part of article we use some post processing on our final pe file, We will add file information and a Icon.

You can use many libraries on GitHub for this part of article, I use my own resource library.

Link against utilities\hmrclib64_vc16.lib which can be found in next chapter source code zip file.

Add function definitions to packer.cpp right after compression library headers :

C++

// PE Info Ediotr
void  HMResKit_LoadPEFile(const char* peFile);
void  HMResKit_SetFileInfo(const char* key, const char* value);
void  HMResKit_SetPEVersion(const char* peFile);
void  HMResKit_ChangeIcon(const char* iconPath);
void  HMResKit_CommitChanges(const char* sectionName);

Add information and icon like this :

C++

// Post-Process [ Add Information & Icon ]
printf("[Information] Adding File Information and Icon...\n");
HMResKit_LoadPEFile(output_pe_file);
HMResKit_SetFileInfo("ProductName", "Custom PE Packer");
HMResKit_SetFileInfo("CompanyName", "MemarDesign™ LLC.");
HMResKit_SetFileInfo("LegalTrademarks", "MemarDesign™ LLC.");
HMResKit_SetFileInfo("Comments", "Developed by Hamid.Memar");
HMResKit_SetFileInfo("FileDescription", "A PE File Packed by HMPacker");
HMResKit_SetFileInfo("ProductVersion", "1.0.0.1");
HMResKit_SetFileInfo("FileVersion", "1.0.0.1");
HMResKit_SetFileInfo("InternalName", "packed-pe-file");
HMResKit_SetFileInfo("OriginalFilename", "packed-pe-file");
HMResKit_SetFileInfo("LegalCopyright", "Copyright MemarDesign™ LLC. © 2021-2022");
HMResKit_SetFileInfo("PrivateBuild", "Packed PE");
HMResKit_SetFileInfo("SpecialBuild", "Packed PE");
HMResKit_SetPEVersion("1.0.0.1");
if (!isDLL) HMResKit_ChangeIcon("app.ico");
HMResKit_CommitChanges("[ H.M ]");

NOTE : We don't cover icon and file info extraction from input pe file in this article, it can be done easily but somehow since we need to parse resource section and I don't want to make the article any longer I will suffice with just one tip.

You can take a look at this handy article.

You can download the full source code of final version of packer and unpacker stub here.

Download 06_pe_packer_tutorial_packer_chapter_final_vs16_x64.zip

Packer : Extras + Improvement Tips

Here's some tips and extra guides on pe packer improvement that you can use.

Tip 1 : Updating Checksum

After you finished the entire post processing it's time to update pe file checksum which is located at :

C++

OptionalHeader.CheckSum = 0xFFFFFFFF;

Valid checksum is very important for getting better results from malware scanners. You can check this article about checksum calculation for pe files.

Tip 2 : Adding Code Signing and Signature

Our packed pe file doesn't follow any standards of famous compilers and this may cause some troubles with AVs, If you're a valid programmer code signing helps a lot to fix the issue, Get a valid certificate and use signtool.exe in post process code.

Tip 3 : Manifest Support

If you want to clone input file manifest to add extra details to packed pe like when it requires admin privileges or etc. You should parse resource directory and extract it from there.

Tip 4 : .NET Support

To add .NET support you can go with a hard way ( manipulating .net pe structure ) or use native CLR Hosting which is recommended, You can check out my article on clr hosting, By the way the article is old and It's possible to host clr in a much much better ways now, Maybe I make an article on that in future, Who knows? :)

So, pack .net assembly into data section and use clr hosting in unpacker stub to load it from memory, also you can use .Net Core Hosting as well.

Tip 5 : Multi-Layer PE Packing

One of the positive things about our packer is it doesn't mess with input pe structure to produce packed pe file, so it means you can use any other packer on the output pe file as a secondary layer of compression/protection!

Yes! You can simply create your own protection system and compression and then use a famous packer on it too, so attacker will face two phase of reverse engineering which makes life a little bit harder for them!

Also another funny thing about our packer is you can pack the packed result pe file with the same pe packer for unlimited times over and over again or even random the key and iv each time!

Tip 6 : Higher PE Compression

Remember packers only can reduce pe files with large sizes to get the best result and don't forget unpacker stub has a size itself, for example if you pack a 1KB dll with a pe packer output is larger than input file but if you pack a 100MB dll you get a very small packed file with high compression ratio!

Anyway even in this situation you can use UPX on final packed pe file to compress unpacker stub as well.

A Crazy Note for Crazy People :

Even if you want to go more crazy get upx source code and customize it to add extra encryption layer to it! :v

Tip 7 : Code Virtualization

To improve the security of your pe file try some products that offer code virtualization, if you use the virtual machine on unpacker stub code it makes reverse engineering process very difficult.

Extra : A Note on Relocation, Non-Standard PE Files

Our pe packer needs more parts to be added like handling relocations and non-ordinal function exports. It's highly recommended to not trying packer on non standard or signed pe files like d3dcompiler_47.dll

You can add new features to the packer and commit to HMPacker GitHub repository.

Extra : A Note on Multilingual PE Files

This packer is not tested with multilingual PE files, However by theory it should work fine, Don't use it on pe files without a backup, to add fully featured multilingual support you need to do some resource cloning.

Extra : Real World Test On Marmoset Toolbag 3

Let's try our pe packer on an AAA software, Marmoset Toolbag 3! Marmoset 3 has four pe files :

toolbag.exe : main application file with size of 19,763,288 bytes
substance_linker.dll : a library file with size of 378,368 bytes
substance_sse2_blend.dll : a library file with size of 958,976 bytes
python36.dll : python library with size of 3,555,992 bytes

OK, Now let's try our packer on them...

Shell

pack_marmoset.bat :
"%cd%\pe_packer.exe" "%cd%\toolbag.exe" "%cd%\toolbag_packed.exe"
"%cd%\pe_packer.exe" "%cd%\substance_sse2_blend.dll" "%cd%\substance_sse2_blend_packed.dll"
"%cd%\pe_packer.exe" "%cd%\substance_linker.dll" "%cd%\substance_linker_packed.dll"
"%cd%\pe_packer.exe" "%cd%\python36.dll" "%cd%\python36_packed.dll"

Result :

Awesome! Our packer reduced toolbag.exe from 19,763,288 bytes to 5,169,152 bytes! Let's test the software to see if it works properly or not...

And it works perfectly! No crash while usage, No performance drop and Very clean...

Congrats Again 🍻

Extra : Checking Packed Binary With AVs

VirusTotal

Here's a Scan with 67 AVs using VirusTotal which only 2 AVs detected packed toolbag as false positive which can be fixed with adding fake naked software code to a fake section, Some AVs specially AI-Powered ones like SecureAge APEX will flag every pe file without clear instructions in their sections so our packed file which is heavily encrypted gets a flag, However by adding some raw C++ code flag will be gone.

BE NICE!

NOTE : This trick doesn't work on applications that contains real malicious code, also be a good person don't use science against people, that's not just nice.

AntiScan

Here's a Scan with 26 AVs using AntiScan which none of AVs detected packed toolbag as false positive!

Extra : Take a Closer Look at Packed Binary

Before finishing the article, Let's take a closer and technical look at generated binary by our packer.

None of the famous pe detectors recognized packed toolbag :
Our pe file has custom sections, no import table and no imports from any other dependencies :
Our pe file has 99% entropy and it means it's heavily compressed.
Our pe packed only has ~12.5MB memory overhead compared to original pe executable.

Bonus : Dark Version of the Article

Article can be found as single html file with dark github theme here.

Credits

I hope you enjoyed the article and it help you learn more. Feel free to translate the article in your language just don't forget to mention the original one link on CodeProject and author's name.

Licensed under the MIT License :

A short and simple permissive license with conditions only requiring preservation of copyright and license notices. Licensed works, modifications, and larger works may be distributed under different terms and without source code.

Authored, Developed and Published By Hamid.Memar

13 November 2021

License

This article, along with any associated source code and files, is licensed under The MIT License