In this article I will teach you to create your own very pe packer/protector from scratch using only visual studio and C/C++ without the need for assembly knowledge, We start with basics and explore more advanced areas to the end of the article, This article is a perfect touch for the people who wants to gain deeper understanding of computer science, If you're ready grab a cup of tea and take a fascinating journey with me!
Introduction
Remember the time people had fun using PE Detectors like Pied, exeinfo, die, RDG, ... to detect what packer/protector the developer used?
Once upon a time pe packers/protectors was very popular and people was using them to reduce their binary size efficiently and add some protection layers to their code.
With the advancement of technology and tools in reverse engineering, protectors became very fragile and defeatable but the war between good and evil continued...
However, pe packers can still be very useful, both in security and in reducing code size but making a packer is a very difficult and it's a complex task. It requires very precise low-level programming knowledge which makes very few people able to complete it successfully.
This article will teach you how to create your very own pe packer using only VC++ and the good news is there's no need for assembly knowledge!
Background
You may ask why do I need a custom packer for my own when there's hundreds of them out there, to get the answer to this question you need to know how pe packers work.
A PE packer/protector gets a pe file, analyzes it and extracts all the information of input pe file, then it modifies the pe file and recreates it using its own structure, it may compress all of your sections into a new one and add its decompression code as the entrypoint and when pe is launched it will decompress the data dynamically into a memory space and restores the original entrypoint and calls it, also it may encrypt the code as well so the original raw code only is accessible and readable at runtime.
Now the structure of a packer is always the same and after a while it becomes a easy target to attack when all the people have access to it, they start packing different pe files and then search for a signature inside it, the signature becomes a mark that can be used to create unpacker/unprotector for the packer.
For example if you get an exe file which is packed using ASPack you can easily unpack it using an OllyDbg script or downloadable unpacker tool like ASPackDie with just one click!
So the point of creating custom packers is :
- Only you have the packer and it's only for your product which makes analyses harder because it's unique
- Packer is only in your hands and attacker cannot download it from a public site to analyze its functionality
- You control how the program restore and launch, compression/encryption algorithms and etc.
- You can use extra anti reverse engineering techniques and whatever you want!
- You can quickly change the signature and structure when current version is attacked
- You can hide useful information that the attacker may use for their analyses
Also, In this tutorial we're not going to develop a regular kind of pe packer, instead of manipulating existing exe/dll file we create a new one just like a linker based on the input pe file.
NOTE : This article is the second part of a previous article on how to build shellcodes using Visual Studio
Preparing Development Environment
- Visual Studio 2019
- VC++ Build Tools ( C++ 17+ Support )
- CFF Explorer ( PE Viewer/Editor )
- HxD (Hex Editor)
2. Creating Empty Projects
-
Open Visual Studio 2019
-
Create two empty C++ projects.
-
Name one pe_packer
and other one unpacker_stub
-
Set pe_packer
Configuration Type to "Application (.exe)"
-
Set unpacker_stub
Configuration Type to "Application (.exe)"
-
Setup unpacker_stub
independent on CRT (C Runtime) and Windows Kernel, If you don't know how read the previous article, Also in this article unpacker_stub is an exe so you need to remove /NOENTRY option.
-
Set projects to x64 and Release mode.
-
Add two .cpp files to the projects, one for packer and one for unpacker with following code setups :
#include <Windows.h>
#include <iostream>
#include <fstream>
using namespace std;
int main(int argc, char* argv[])
{
if (argc != 3) return EXIT_FAILURE;
char* input_pe_file = argv[1];
char* output_pe_file = argv[2];
return EXIT_SUCCESS;
}
#include <Windows.h>
void func_unpack()
{
}
Alright, now we're all set and ready to start developing!
NOTE : For speeding up packer testing you can create a pe_packer_tester.bat file with following content :
"%cd%\pe_packer.exe" "%cd%\input_pe.exe" "%cd%\output_pe.exe"
You can download the basic setup source here.
Packer : Parsing + Validating Input PE
Ok, For now we have one input path (input_pe_file
) and one output pe path (output_pe_file
) passed by the user to our packer, the first step is to validate input file and make sure it's a valid pe file and also make sure it meets the standards that our packer needs.
To perform validation we need to parse the pe file :
ifstream input_pe_file_reader(argv[1], ios::binary);
vector<uint8_t> input_pe_file_buffer(istreambuf_iterator<char>(input_pe_file_reader), {});
PIMAGE_DOS_HEADER in_pe_dos_header = (PIMAGE_DOS_HEADER)input_pe_file_buffer.data();
PIMAGE_NT_HEADERS in_pe_nt_header = (PIMAGE_NT_HEADERS)(input_pe_file_buffer.data() + in_pe_dos_header->e_lfanew);
Then we validate properties like this :
bool isPE = in_pe_dos_header->e_magic == IMAGE_DOS_SIGNATURE;
bool is64 = in_pe_nt_header->FileHeader.Machine == IMAGE_FILE_MACHINE_AMD64 &&
in_pe_nt_header->OptionalHeader.Magic == IMAGE_NT_OPTIONAL_HDR64_MAGIC;
bool isDLL = in_pe_nt_header->FileHeader.Characteristics & IMAGE_FILE_DLL;
bool isNET = in_pe_nt_header->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_COM_DESCRIPTOR].Size != 0;
After adding checking and actions, packer code should look like this :
#include <Windows.h>
#include <iostream>
#include <fstream>
#include <vector>
using namespace std;
#define BOOL_STR(b) b ? "true" : "false"
#define CONSOLE_COLOR_DEFAULT SetConsoleTextAttribute(hConsole, 0x09);
#define CONSOLE_COLOR_ERROR SetConsoleTextAttribute(hConsole, 0x0C);
int main(int argc, char* argv[])
{
HANDLE hConsole = GetStdHandle(STD_OUTPUT_HANDLE);
SetConsoleTitle("Custom x64 PE Packer by H.M v1.0");
FlushConsoleInputBuffer(hConsole);
CONSOLE_COLOR_DEFAULT;
if (argc != 3) return EXIT_FAILURE;
char* input_pe_file = argv[1];
char* output_pe_file = argv[2];
ifstream input_pe_file_reader(argv[1], ios::binary);
vector<uint8_t> input_pe_file_buffer(istreambuf_iterator<char>(input_pe_file_reader), {});
PIMAGE_DOS_HEADER in_pe_dos_header = (PIMAGE_DOS_HEADER)input_pe_file_buffer.data();
PIMAGE_NT_HEADERS in_pe_nt_header = (PIMAGE_NT_HEADERS)(input_pe_file_buffer.data() + in_pe_dos_header->e_lfanew);
bool isPE = in_pe_dos_header->e_magic == IMAGE_DOS_SIGNATURE;
bool is64 = in_pe_nt_header->FileHeader.Machine == IMAGE_FILE_MACHINE_AMD64 &&
in_pe_nt_header->OptionalHeader.Magic == IMAGE_NT_OPTIONAL_HDR64_MAGIC;
bool isDLL = in_pe_nt_header->FileHeader.Characteristics & IMAGE_FILE_DLL;
bool isNET = in_pe_nt_header->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_COM_DESCRIPTOR].Size != 0;
printf("[Validation] Is PE File : %s\n", BOOL_STR(isPE));
printf("[Validation] Is 64bit : %s\n", BOOL_STR(is64));
printf("[Validation] Is DLL : %s\n", BOOL_STR(isDLL));
printf("[Validation] Is COM or .Net : %s\n", BOOL_STR(isNET));
if (!isPE)
{
CONSOLE_COLOR_ERROR;
printf("[Error] Input PE file is invalid. (Signature Mismatch)\n");
return EXIT_FAILURE;
}
if (!is64)
{
CONSOLE_COLOR_ERROR;
printf("[Error] This packer only supports x64 PE files.\n");
return EXIT_FAILURE;
}
if (isNET)
{
CONSOLE_COLOR_ERROR;
printf("[Error] This packer currently doesn't support .NET/COM assemblies.\n");
return EXIT_FAILURE;
}
return EXIT_SUCCESS;
}
Packer : Developing PE Generator
Alright, Now that we know our input pe file is valid, It's time to create a pe generator that produce a valid empty pe file and for this operation we use Windows API.
Creating DOS Header
Each pe file begins with dos header which contains the magic number (file signature) and basic information about the pe file like where the main header is or address of relocation table, to create the dos header we need to initialize a IMAGE_DOS_HEADER
struct and set the values :
IMAGE_DOS_HEADER dos_h;
memset(&dos_h, NULL, sizeof IMAGE_DOS_HEADER);
dos_h.e_magic = IMAGE_DOS_SIGNATURE;
dos_h.e_cblp = 0x0090;
dos_h.e_cp = 0x0003;
dos_h.e_crlc = 0x0000;
dos_h.e_cparhdr = 0x0004;
dos_h.e_minalloc = 0x0000;
dos_h.e_maxalloc = 0xFFFF;
dos_h.e_ss = 0x0000;
dos_h.e_sp = 0x00B8;
dos_h.e_csum = 0x0000; dos_h.e_ip = 0x0000;
dos_h.e_cs = 0x0000;
dos_h.e_lfarlc = 0x0040;
dos_h.e_ovno = 0x0000;
dos_h.e_oemid = 0x0000;
dos_h.e_oeminfo = 0x0000;
dos_h.e_lfanew = 0x0040;
Creating NT Header
After we created the dos header the next header must nt header which contains all the important information about the pe file, a nt header contains :
-
Signature
-
File Header
-
Optional Header
All 3 parts is included in a single struct which is MAGE_NT_HEADERS
and to create that we simply initialize it and set the following values :
IMAGE_NT_HEADERS nt_h;
memset(&nt_h, NULL, sizeof IMAGE_NT_HEADERS);
nt_h.Signature = IMAGE_NT_SIGNATURE;
nt_h.FileHeader.Machine = IMAGE_FILE_MACHINE_AMD64;
nt_h.FileHeader.NumberOfSections = 2;
nt_h.FileHeader.TimeDateStamp = 0x00000000; nt_h.FileHeader.PointerToSymbolTable = 0x0;
nt_h.FileHeader.NumberOfSymbols = 0x0;
nt_h.FileHeader.SizeOfOptionalHeader = 0x00F0;
nt_h.FileHeader.Characteristics = 0x0022; nt_h.OptionalHeader.Magic = IMAGE_NT_OPTIONAL_HDR64_MAGIC;
nt_h.OptionalHeader.MajorLinkerVersion = 10;
nt_h.OptionalHeader.MinorLinkerVersion = 0x05;
nt_h.OptionalHeader.SizeOfCode = 0x00000200; nt_h.OptionalHeader.SizeOfInitializedData = 0x00000200; nt_h.OptionalHeader.SizeOfUninitializedData = 0x0;
nt_h.OptionalHeader.AddressOfEntryPoint = 0x00001000; nt_h.OptionalHeader.BaseOfCode = 0x00001000;
nt_h.OptionalHeader.ImageBase = 0x0000000140000000;
nt_h.OptionalHeader.SectionAlignment = 0x00001000;
nt_h.OptionalHeader.FileAlignment = 0x00000200;
nt_h.OptionalHeader.MajorOperatingSystemVersion = 0x0;
nt_h.OptionalHeader.MinorOperatingSystemVersion = 0x0;
nt_h.OptionalHeader.MajorImageVersion = 0x0006;
nt_h.OptionalHeader.MinorImageVersion = 0x0000;
nt_h.OptionalHeader.MajorSubsystemVersion = 0x0006;
nt_h.OptionalHeader.MinorSubsystemVersion = 0x0000;
nt_h.OptionalHeader.Win32VersionValue = 0x0;
nt_h.OptionalHeader.SizeOfImage = 0x00003000; nt_h.OptionalHeader.SizeOfHeaders = 0x00000200;
nt_h.OptionalHeader.CheckSum = 0xFFFFFFFF; nt_h.OptionalHeader.Subsystem = IMAGE_SUBSYSTEM_WINDOWS_CUI;
nt_h.OptionalHeader.DllCharacteristics = 0x0120;
nt_h.OptionalHeader.SizeOfStackReserve = 0x0000000000100000;
nt_h.OptionalHeader.SizeOfStackCommit = 0x0000000000001000;
nt_h.OptionalHeader.SizeOfHeapReserve = 0x0000000000100000;
nt_h.OptionalHeader.SizeOfHeapCommit = 0x0000000000001000;
nt_h.OptionalHeader.LoaderFlags = 0x00000000;
nt_h.OptionalHeader.NumberOfRvaAndSizes = 0x00000010;
NOTE : MAGE_NT_HEADERS is based on CPU architecture you set for the project.
in this article it produces MAGE_NT_HEADERS64.
Creating Sections
Now we have a dos header and a nt header and the only thing is left are sections! Sections contain every data in the pe file and they have their headers too, So we need to initialize a header for them and then write the data at the addressed offsets, for creating headers we use IMAGE_SECTION_HEADER
struct :
IMAGE_SECTION_HEADER c_sec;
memset(&c_sec, NULL, sizeof IMAGE_SECTION_HEADER);
c_sec.Name[0] = '[';
c_sec.Name[1] = ' ';
c_sec.Name[2] = 'H';
c_sec.Name[3] = '.';
c_sec.Name[4] = 'M';
c_sec.Name[5] = ' ';
c_sec.Name[6] = ']';
c_sec.Name[7] = 0x0;
c_sec.Misc.VirtualSize = 0x00001000; c_sec.VirtualAddress = 0x00001000; c_sec.SizeOfRawData = 0x00000600; c_sec.PointerToRawData = 0x00000200; c_sec.PointerToRelocations = 0x00000000; c_sec.PointerToLinenumbers = 0x00000000; c_sec.NumberOfRelocations = 0x00000000; c_sec.NumberOfLinenumbers = 0x00000000; c_sec.Characteristics = IMAGE_SCN_MEM_EXECUTE |
IMAGE_SCN_MEM_READ |
IMAGE_SCN_CNT_CODE ;
IMAGE_SECTION_HEADER d_sec;
memset(&d_sec, NULL, sizeof IMAGE_SECTION_HEADER);
d_sec.Name[0] = '[';
d_sec.Name[1] = ' ';
d_sec.Name[2] = 'H';
d_sec.Name[3] = '.';
d_sec.Name[4] = 'M';
d_sec.Name[5] = ' ';
d_sec.Name[6] = ']';
d_sec.Name[7] = 0x0;
d_sec.Misc.VirtualSize = 0x00000200; d_sec.VirtualAddress = 0x00002000; d_sec.SizeOfRawData = 0x00000200; d_sec.PointerToRawData = 0x00000800; d_sec.PointerToRelocations = 0x00000000; d_sec.PointerToLinenumbers = 0x00000000; d_sec.NumberOfRelocations = 0x00000000; d_sec.NumberOfLinenumbers = 0x00000000; d_sec.Characteristics = IMAGE_SCN_CNT_INITIALIZED_DATA |
IMAGE_SCN_MEM_READ;
Creating PE File
Great! Now we are all set and ready to write the pe file to disk, to perform this use the following code :
fstream pe_writter;
pe_writter.open(output_pe_file, ios::binary | ios::out);
pe_writter.write((char*)&dos_h, sizeof dos_h);
pe_writter.write((char*)&nt_h, sizeof nt_h);
pe_writter.write((char*)&c_sec, sizeof c_sec);
pe_writter.write((char*)&d_sec, sizeof d_sec);
while (pe_writter.tellp() != c_sec.PointerToRawData) pe_writter.put(0x0);
pe_writter.put(0xC3); for (size_t i = 0; i < c_sec.SizeOfRawData - 1; i++) pe_writter.put(0x0);
for (size_t i = 0; i < d_sec.SizeOfRawData; i++) pe_writter.put(0x0);
pe_writter.close();
Now run your packer and see the magic!
Packer : Main Implementation
Ok now we have our pe parser and pe generator, It's time to develop the packer itself, to perform this operation we use fast-lzma2 for compression and AES-256 for encryption, then we will write the data to pe file.
NOTE : I chose fast-lzma2 for compression because it's fast and produce very high ratio compression.
You can use zlib or any compression library you want.
Adding Required Libraries
-
Clone fast-lzma2 repo and add it to your project using static linking.
-
Clone tiny-aes-c repo and add it to your project.
Also you can use tiny-aes-c shellcodes that we generated in previous part of the article.
Add libraries headers and libs like this :
extern "C"
{
#include "aes.h"
}
#include "lzma2\fast-lzma2.h"
#pragma comment(lib, "lzma2\\fast-lzma2.lib")
Compressing/Encrypting Data
And finally we compress and encrypt the entire input pe file like this:
printf("[Information] Initializing AES Cryptor...\n");
struct AES_ctx ctx;
const unsigned char key[32] = {
0xD6, 0x23, 0xB8, 0xEF, 0x62, 0x26, 0xCE, 0xC3, 0xE2, 0x4C, 0x55, 0x12,
0x7D, 0xE8, 0x73, 0xE7, 0x83, 0x9C, 0x77, 0x6B, 0xB1, 0xA9, 0x3B, 0x57,
0xB2, 0x5F, 0xDB, 0xEA, 0x0D, 0xB6, 0x8E, 0xA2
};
const unsigned char iv[16] = {
0x18, 0x42, 0x31, 0x2D, 0xFC, 0xEF, 0xDA, 0xB6, 0xB9, 0x49, 0xF1, 0x0D,
0x03, 0x7E, 0x7E, 0xBD
};
AES_init_ctx_iv(&ctx, key, iv);
printf("[Information] Initializing Compressor...\n");
FL2_CCtx* cctx = FL2_createCCtxMt(8);
FL2_CCtx_setParameter(cctx, FL2_p_compressionLevel, 9);
FL2_CCtx_setParameter(cctx, FL2_p_dictionarySize, 1024);
vector<uint8_t> data_buffer;
data_buffer.resize(input_pe_file_buffer.size());
printf("[Information] Compressing Buffer...\n");
size_t original_size = input_pe_file_buffer.size();
size_t compressed_size = FL2_compressCCtx(cctx, data_buffer.data(), data_buffer.size(),
input_pe_file_buffer.data(), original_size, 9);
data_buffer.resize(compressed_size);
for (size_t i = 0; i < 16; i++) data_buffer.insert(data_buffer.begin(), 0x0);
for (size_t i = 0; i < 16; i++) data_buffer.push_back(0x0);
printf("[Information] Encrypting Buffer...\n");
AES_CBC_encrypt_buffer(&ctx, data_buffer.data(), data_buffer.size());
printf("[Information] Original PE Size : %ld bytes\n", input_pe_file_buffer.size());
printf("[Information] Packed PE Size : %ld bytes\n", data_buffer.size());
float ratio =
(1.0f - ((float)data_buffer.size() / (float)input_pe_file_buffer.size())) * 100.f;
printf("[Information] Compression Ratio : %.2f%%\n", (roundf(ratio * 100.0f) * 0.01f));
NOTE : As I said before we're not going to perform pe packer routine that used by most of pe packers, we don't
encrypt/compress the code section and recover it at the runtime and we don't manipulate input pe file.
Instead of the routine we use a pe loader to load and map the entire pe file to the memory and call the entrypoint.
Writing Data to PE File and Updating Alignments
Now we need to write the packed data into the generated pe file, Follow the steps :
-
Add these macros to the global scope :
#define file_alignment_size 512 // Default Hard Disk Block Size (0x200)
#define memory_alignment_size 4096 // Default Memory Page Size (0x1000)
-
Add this function to the global scope :
inline DWORD _align(DWORD size, DWORD align, DWORD addr = 0)
{
if (!(size % align)) return addr + size;
return addr + (size / align + 1) * align;
}
Alignment is a very important operation while working on pe files, learning it is very useful!
-
Update the following values and codes using alignments:
nt_h.OptionalHeader.SectionAlignment = memory_alignment_size;
nt_h.OptionalHeader.FileAlignment = file_alignment_size;
d_sec.Misc.VirtualSize = _align(data_buffer.size(), memory_alignment_size);
d_sec.VirtualAddress = c_sec.VirtualAddress + c_sec.Misc.VirtualSize;
d_sec.SizeOfRawData = _align(data_buffer.size(), file_alignment_size);
d_sec.PointerToRawData = c_sec.PointerToRawData + c_sec.SizeOfRawData;
size_t current_pos = pe_writter.tellp();
pe_writter.write((char*)data_buffer.data(), data_buffer.size());
while (pe_writter.tellp() != current_pos + d_sec.SizeOfRawData) pe_writter.put(0x0);
vector<uint8_t>().swap(input_pe_file_buffer);
vector<uint8_t>().swap(data_buffer);
CONSOLE_COLOR_SUCCSESS;
printf("[Information] PE File Packed Successfully.");
return EXIT_SUCCESS;
-
Build the project and test it, your packer should generate a valid working pe file that contains the packed data.
Unpacker : Stub Implementation
Alright! If you're still with me, it's time to generate the unpacker machine code and put it inside the code section, to perform this we need to generate a unpacker stub, Open unpacker.cpp and add fast-lzma2 and tiny-aes-c to the project just like you did for the packer and setup the values and keys, Now we need to create some variables that we can modify and manipulate from the packer :
volatile PVOID data_ptr = (void*)0xAABBCCDD;
volatile DWORD data_size = 0xEEFFAADD;
volatile DWORD actual_data_size = 0xA0B0C0D0;
why volatile
keyword? simple... to stop the compiler from optimizing them away and keep optimization at the same time, it's a win-win ;)
Code should look like this :
#include <Windows.h>
extern "C"
{
#include "aes.h"
}
#include "lzma2\fast-lzma2.h"
#pragma comment(lib, "lzma2\\fast-lzma2.lib")
#pragma comment(linker, "/merge:.rdata=.text")
void func_unpack()
{
volatile PVOID data_ptr = (void*)0xAABBCCDD;
volatile DWORD data_size = 0xEEFFAADD;
volatile DWORD actual_data_size = 0xA0B0C0D0;
volatile DWORD header_size = 0xF0E0D0A0;
k32_init(); crt_init();
intptr_t imageBase = (intptr_t)GetModuleHandleA(0);
data_ptr = (void*)((intptr_t)data_ptr + imageBase);
struct AES_ctx ctx;
const unsigned char key[32] = {
0xD6, 0x23, 0xB8, 0xEF, 0x62, 0x26, 0xCE, 0xC3, 0xE2, 0x4C, 0x55, 0x12,
0x7D, 0xE8, 0x73, 0xE7, 0x83, 0x9C, 0x77, 0x6B, 0xB1, 0xA9, 0x3B, 0x57,
0xB2, 0x5F, 0xDB, 0xEA, 0x0D, 0xB6, 0x8E, 0xA2
};
const unsigned char iv[16] = {
0x18, 0x42, 0x31, 0x2D, 0xFC, 0xEF, 0xDA, 0xB6, 0xB9, 0x49, 0xF1, 0x0D,
0x03, 0x7E, 0x7E, 0xBD
};
AES_init_ctx_iv(&ctx, key, iv);
uint8_t* data_ptr_byte = (uint8_t*)data_ptr;
AES_CBC_decrypt_buffer(&ctx, data_ptr_byte, data_size);
uint8_t* code_buffer = (uint8_t*)malloc(actual_data_size);
FL2_decompress(code_buffer, actual_data_size, &data_ptr_byte[16], data_size - 32);
memset(data_ptr, 0, data_size);
}
NOTE : We don't use lzma2 multi-threaded decompression because threading in shellcode is a very bad idea!
Unpacker : C Runtime and WinAPI Resolver
Ok now if you try to build unpacker_stub project you will face lots of unresolved external symbol errors.
This happens because we removed all the standard libraries such as msvcrt and kernel32, there's one solution for this and it's called lazy importing.
Lazy Importing Technique
In lazy importing we invoke system functions on the fly to use a function dynamically, to use this technique you will need this amazing single header library from a real genius Justas Masiulis.
First step you need to do is loading a library like this :
uintptr_t msvcrtLib = reinterpret_cast<uintptr_t>(LI_FIND(LoadLibraryA)(_S("msvcrt.dll")));
Then invoke the functions of library like this :
LI_GET(msvcrtLib, printf)("This is a message from dynamically loaded printf.\n");
And that's it! You can use any library and any function without a footprint in your pe image but the issue here is we have lots of functions in fast-lzma2 and replacing all of them with LI_GET
function can be brutally time consuming!
Also it can produce lots of issues in the library code so I came up with this idea, What if I develop resolvers? It worked!
Developing Resolver
What is a resolver and how can we use it as a solution? Simple, we reimplement all the c runtime and winapi functions inside a simulated msvrct.lib and kernel32.lib (can be used on any other lib), Then we invoke all the original functions and redirect their function parameters into them then we return the result, this let us to create static libraries from any dynamic library!
For example this is how we resolve memcpy :
void crt_init();
void* ___memcpy(void* dst, const void* src, size_t size);
uintptr_t msvcrtLib = 0;
#define _VCRTFunc(fn) LI_GET(msvcrtLib,fn)
void crt_init()
{
msvcrtLib = reinterpret_cast<uintptr_t>(LI_FIND(LoadLibraryA)(_S("msvcrt.dll")));
}
void* ___memcpy(void* dst, const void* src, size_t size)
{
return _VCRTFunc(memcpy)(dst, src, size);
}
#include "resolver.h"
#define RESOLVER extern "C"
RESOLVER void* __cdecl memcpy(void* dst, const void* src, size_t size)
{
return ___memcpy(dst, src, size);
}
To reduce the size of article I avoid showing how to resolve all the needed functions or the process but you can easily do it by the provided example code, Also I included pre-built static lib files of my resolvers in the project source so feel free to save some time and use them.
Static Linking to Resolvers
Due to linkage ordering avoid using pragma to link against resolvers, instead use linker properties :
-
Go to the config of unpacker_stub
project and head to Linker > General > Additional Library Directories and change it to ".\resolvers"
-
Go to Linker > Input > Additional Dependencies and add "msvrcrt.lib" and "kernel32.lib"
-
Go to VC++ Directories and and clear Library Directories and Library WinRT Directories to avoid linking against original libraries.
-
Create external functions header :
extern "C" void crt_init();
extern "C" void k32_init();
-
Initialize resolvers after internal values and before initializing the cryptor :
k32_init();
crt_init();
-
Update section merging pragmas to this :
#pragma comment(linker, "/merge:.rdata=.text")
#pragma comment(linker, "/merge:.data=.text")
-
Go to Linker > Command Line and enter "/EMITPOGOPHASEINFO /SECTION:.text,EWR" in Additional Options.
-
Go to Linker > Advanced and change Randomized Base Address to No (/DYNAMICBASE:NO)
-
Go to Linker > Advanced and change Fixed Base Address to Yes (/FIXED), This option prevents generation of relocation directory which causes code dependent to the stub pe file.
Now build and magic happens... Unpacker stub gets compiled successfully!
Unpacker : PE Loader/Mapper
It's time to add a pe loader/mapper to the unpacker and finalize the unpacker stub code, for this operation we use mmLoader library which is developed in pure C.
After adding library to the project and file add the following code to the end of unpacker stub code :
#include "mmLoader.h"
...
DWORD pe_loader_result = 0;
HMEMMODULE pe_module = LoadMemModule(code_buffer, true, &pe_loader_result);
This is it! Now build the project and you should get unpacker_stub.exe which contains just two sections :
Extract .text data using CFF Explorer or Hex Editor and convert it to a byte array like this :
unsigned char unpacker_stub[175104] = {
0x63, 0x7C, 0x77, 0x7B, 0xF2, 0x6B, 0x6F, 0xC5, 0x30, 0x01, 0x67, 0x2B,
0xFE, 0xD7, 0xAB, 0x76, 0xCA, 0x82, 0xC9, 0x7D, 0xFA, 0x59, 0x47, 0xF0,
0xAD, 0xD4, 0xA2, 0xAF, 0x9C, 0xA4, 0x72, 0xC0, 0xB7, 0xFD, 0x93, 0x26
...
You can download the source of finished unpacker stub here.
Packer : Stub Generation
Include unpacker_stub.h
in packer.cpp and apply the following changes to the code.
-
Add byte pattern searching helper function for finding and patching signatures in the unpacker stub :
#include <algorithm>
...
inline DWORD _find(uint8_t* data, size_t data_size, DWORD& value)
{
for (size_t i = 0; i < data_size; i++)
if (memcmp(&data[i], &value, sizeof DWORD) == 0) return i;
return -1;
}
-
Change section headers to this :
IMAGE_SECTION_HEADER c_sec;
memset(&c_sec, NULL, sizeof IMAGE_SECTION_HEADER);
c_sec.Name[0] = '[';
c_sec.Name[1] = ' ';
c_sec.Name[2] = 'H';
c_sec.Name[3] = '.';
c_sec.Name[4] = 'M';
c_sec.Name[5] = ' ';
c_sec.Name[6] = ']';
c_sec.Name[7] = 0x0;
c_sec.Misc.VirtualSize = _align(sizeof unpacker_stub, memory_alignment_size);
c_sec.VirtualAddress = memory_alignment_size;
c_sec.SizeOfRawData = sizeof unpacker_stub;
c_sec.PointerToRawData = file_alignment_size;
c_sec.Characteristics =
IMAGE_SCN_MEM_EXECUTE | IMAGE_SCN_MEM_READ |
IMAGE_SCN_MEM_WRITE | IMAGE_SCN_CNT_CODE;
IMAGE_SECTION_HEADER d_sec;
memset(&d_sec, NULL, sizeof IMAGE_SECTION_HEADER);
d_sec.Name[0] = '[';
d_sec.Name[1] = ' ';
d_sec.Name[2] = 'H';
d_sec.Name[3] = '.';
d_sec.Name[4] = 'M';
d_sec.Name[5] = ' ';
d_sec.Name[6] = ']';
d_sec.Name[7] = 0x0;
d_sec.Misc.VirtualSize = _align(data_buffer.size(), memory_alignment_size);
d_sec.VirtualAddress = c_sec.VirtualAddress + c_sec.Misc.VirtualSize;
d_sec.SizeOfRawData = _align(data_buffer.size(), file_alignment_size);
d_sec.PointerToRawData = c_sec.PointerToRawData + c_sec.SizeOfRawData;
d_sec.Characteristics = IMAGE_SCN_CNT_INITIALIZED_DATA |
IMAGE_SCN_MEM_READ | IMAGE_SCN_MEM_WRITE;
-
Update generated pe headers, to perform this add the following code just after section headers :
printf("[Information] Updating PE Information...\n");
nt_h.OptionalHeader.SizeOfImage =
_align(d_sec.VirtualAddress + d_sec.Misc.VirtualSize, memory_alignment_size);
nt_h.FileHeader.Characteristics = in_pe_nt_header->FileHeader.Characteristics;
nt_h.FileHeader.TimeDateStamp = in_pe_nt_header->FileHeader.TimeDateStamp;
nt_h.OptionalHeader.CheckSum = 0xFFFFFFFF;
nt_h.OptionalHeader.SizeOfCode = c_sec.SizeOfRawData;
nt_h.OptionalHeader.SizeOfInitializedData = d_sec.SizeOfRawData;
nt_h.OptionalHeader.Subsystem = in_pe_nt_header->OptionalHeader.Subsystem;
nt_h.OptionalHeader.AddressOfEntryPoint = 0x00005940;
To get entrypoint offset from .map file simply search for func_unpacker
and you find the offset there or you can simply copy the entrypoint from the unpacker_stub.exe using CFF Explorer.
-
Now we need to find unpacker stub signatures and patch them, Update pe writer code to the following code :
printf("[Information] Writing Generated PE to Disk...\n");
fstream pe_writter;
pe_writter.open(output_pe_file, ios::binary | ios::out);
pe_writter.write((char*)&dos_h, sizeof dos_h);
pe_writter.write((char*)&nt_h, sizeof nt_h);
pe_writter.write((char*)&c_sec, sizeof c_sec);
pe_writter.write((char*)&d_sec, sizeof d_sec);
while (pe_writter.tellp() != c_sec.PointerToRawData) pe_writter.put(0x0);
DWORD data_ptr_sig = 0xAABBCCDD;
DWORD data_size_sig = 0xEEFFAADD;
DWORD actual_data_size_sig = 0xA0B0C0D0;
DWORD header_size_sig = 0xF0E0D0A0;
DWORD data_ptr_offset = _find(unpacker_stub, sizeof unpacker_stub, data_ptr_sig);
DWORD data_size_offset = _find(unpacker_stub, sizeof unpacker_stub, data_size_sig);
DWORD actual_data_size_offset = _find(unpacker_stub, sizeof unpacker_stub, actual_data_size_sig);
DWORD header_size_offset = _find(unpacker_stub, sizeof unpacker_stub, header_size_sig);
if (data_ptr_offset != -1)
printf("[Information] Signature A Found at : %X\n", data_ptr_offset);
if (data_size_offset != -1)
printf("[Information] Signature B Found at : %X\n", data_size_offset);
if (actual_data_size_offset != -1)
printf("[Information] Signature C Found at : %X\n", actual_data_size_offset);
if (header_size_offset != -1)
printf("[Information] Signature D Found at : %X\n", header_size_offset);
printf("[Information] Updating Offset Data...\n");
memcpy(&unpacker_stub[data_ptr_offset], &d_sec.VirtualAddress, sizeof DWORD);
memcpy(&unpacker_stub[data_size_offset], &d_sec.SizeOfRawData, sizeof DWORD);
DWORD pe_file_actual_size = (DWORD)input_pe_file_buffer.size();
memcpy(&unpacker_stub[actual_data_size_offset], &pe_file_actual_size, sizeof DWORD);
memcpy(&unpacker_stub[header_size_offset], &nt_h.OptionalHeader.BaseOfCode, sizeof DWORD);
printf("[Information] Writing Code Data...\n");
pe_writter.write((char*)&unpacker_stub, sizeof unpacker_stub);
printf("[Information] Writing Packed Data...\n");
size_t current_pos = pe_writter.tellp();
pe_writter.write((char*)data_buffer.data(), data_buffer.size());
while (pe_writter.tellp() != current_pos + d_sec.SizeOfRawData) pe_writter.put(0x0);
pe_writter.close();
Here we go, now let's try the packer... and... Congrats! You made your first pe packer!
You can download the full source code of packer and unpacker stub here.
Packer : Dynamic Linking Support + Export Table Creation
Now our packer can pack an exe file and produce a new working exe file but what if we want to pack a dll with its exports? To perform this we need to create an export table for our output pe file and then redirect the calls to the actual module.
This process is not easy as previous parts, In fact it's very complex and needs a iron brain to solve it but don't worry I crashed my mind to solve it for you, so let's start and add dll support to our packer!
A ) Update & Make Unpacker Stub DLL Friendly
Currently our unpacker stub code isn't designed for dll entrypoint, We need to change it to make sure it will pass the dll initialization routine properly, also we need to add two extra values which are explained in next section.
New unpacker stub must look like this :
#include <Windows.h>
#include <winnt.h>
EXTERN_C IMAGE_DOS_HEADER __ImageBase;
EXTERN_C void crt_init();
EXTERN_C void k32_init();
extern "C"
{
#include "aes.h"
}
#include "lzma2\fast-lzma2.h"
#include "mmLoader.h"
#pragma comment(linker, "/merge:.rdata=.text")
#pragma comment(linker, "/merge:.data=.text")
EXTERN_C static volatile uintptr_t moduleImageBase = 0xBCEAEFBA;
EXTERN_C static volatile FARPROC functionForwardingPtr = (FARPROC)0xCAFEBABE;
EXTERN_C BOOL CallModuleEntry(void* pMemModule_d, DWORD dwReason);
HMEMMODULE pe_module = 0;
BOOL func_unpack(void*, int reason, void*)
{
if (reason == DLL_PROCESS_DETACH)
{ CallModuleEntry(pe_module, DLL_PROCESS_DETACH); FreeMemModule(pe_module); return TRUE; };
if (reason == DLL_THREAD_ATTACH) return CallModuleEntry(pe_module, DLL_THREAD_ATTACH);
if (reason == DLL_THREAD_DETACH) return CallModuleEntry(pe_module, DLL_THREAD_DETACH);
volatile PVOID data_ptr = (void*)0xAABBCCDD;
volatile DWORD data_size = 0xEEFFAADD;
volatile DWORD actual_data_size = 0xA0B0C0D0;
volatile DWORD header_size = 0xF0E0D0A0;
k32_init(); crt_init();
intptr_t imageBase = (intptr_t)&__ImageBase;
data_ptr = (void*)((intptr_t)data_ptr + imageBase);
struct AES_ctx ctx;
const unsigned char key[32] = {
0xD6, 0x23, 0xB8, 0xEF, 0x62, 0x26, 0xCE, 0xC3, 0xE2, 0x4C, 0x55, 0x12,
0x7D, 0xE8, 0x73, 0xE7, 0x83, 0x9C, 0x77, 0x6B, 0xB1, 0xA9, 0x3B, 0x57,
0xB2, 0x5F, 0xDB, 0xEA, 0x0D, 0xB6, 0x8E, 0xA2
};
const unsigned char iv[16] = {
0x18, 0x42, 0x31, 0x2D, 0xFC, 0xEF, 0xDA, 0xB6, 0xB9, 0x49, 0xF1, 0x0D,
0x03, 0x7E, 0x7E, 0xBD
};
AES_init_ctx_iv(&ctx, key, iv);
uint8_t* data_ptr_byte = (uint8_t*)data_ptr;
AES_CBC_decrypt_buffer(&ctx, data_ptr_byte, data_size);
uint8_t* code_buffer = (uint8_t*)malloc(actual_data_size);
FL2_decompress(code_buffer, actual_data_size, &data_ptr_byte[16], data_size - 32);
memset(data_ptr, 0, data_size);
DWORD pe_loader_result = 0;
pe_module = LoadMemModule(code_buffer, false, &pe_loader_result);
moduleImageBase = (uintptr_t)*pe_module;
functionForwardingPtr = 0;
return CallModuleEntry(pe_module, DLL_PROCESS_ATTACH);
}
Now let me explain the updated parts for you ;)
-
We changed the func_unpack
return type to BOOL
and we add 3 parameters (dllmain routine) :
BOOL func_unpack(void*, int reason, void*)
-
We should update the way we obtain image address, in exe we can just use GetModuleHandle
but in dll we can't, so we use __ImageBase
external value to obtain it, yes we can use first parameter of the func_unpack
function which is hInstance
but it only works for dlls, by using __ImageBase
we will get the right value in any kind of pe file.
#include <winnt.h>
EXTERN_C IMAGE_DOS_HEADER __ImageBase;
...
intptr_t imageBase = (intptr_t)&__ImageBase;
-
We need to get control over entrypoint calling of our dynamically loaded module, so we should make some simple changes to mmLoader and make CallModuleEntry
function public, then we use it to call entrypoint manually after we loaded our module from memory :
EXTERN_C BOOL CallModuleEntry(void* pMemModule_d, DWORD dwReason);
...
BOOL CallModuleEntry(void* pMemModule_d, DWORD dwReason)
{
PMEM_MODULE pMemModule = pMemModule_d;
...
-
We should handle dll events to avoid memory leak, crash or data loss on detaching :
if (reason == DLL_PROCESS_DETACH)
{ CallModuleEntry(pe_module, DLL_PROCESS_DETACH); FreeMemModule(pe_module); return TRUE; };
if (reason == DLL_THREAD_ATTACH) return CallModuleEntry(pe_module, DLL_THREAD_ATTACH);
if (reason == DLL_THREAD_DETACH) return CallModuleEntry(pe_module, DLL_THREAD_DETACH);
-
We added two static values which we use and access them in next steps, this values can be accessed cross the pe sections.
EXTERN_C static volatile uintptr_t moduleImageBase = 0xBCEAEFBA;
EXTERN_C static volatile FARPROC functionForwardingPtr = (FARPROC)0xCAFEBABE;
-
And finally we updated pe loading flow and set our values, what are this values? Keep reading!
DWORD pe_loader_result = 0;
pe_module = LoadMemModule(code_buffer, false, &pe_loader_result);
moduleImageBase = (uintptr_t)*pe_module;
functionForwardingPtr = 0;
return CallModuleEntry(pe_module, DLL_PROCESS_ATTACH);
Don't forget to move out pe_module
to global scope so it can be accessed on each event call. After compiling the unpacker stub and update the raw array in packer project and update entrypoint offset, it should work for both exe and dll, now let's go for exporting table.
B ) Adding Pattern Search in Packer for New Unpacker Stub Values
Alright now head to packer.cpp and add new pattern search code just after the line we updated entrypoint value
nt_h.OptionalHeader.AddressOfEntryPoint = 0x00005F10;
DWORD imagebase_value_sig = 0xBCEAEFBA;
DWORD imageBaseValueOffset = _find(unpacker_stub, sizeof unpacker_stub, imagebase_value_sig);
memset(&unpacker_stub[imageBaseValueOffset], NULL, sizeof uintptr_t);
if (imageBaseValueOffset != -1)
printf("[Information] ImageBase Value Signature Found at : %X\n", imageBaseValueOffset);
DWORD forwarding_value_sig = 0xCAFEBABE;
DWORD forwarding_value_offset = _find(unpacker_stub, sizeof unpacker_stub, forwarding_value_sig);
memset(&unpacker_stub[forwarding_value_offset], NULL, sizeof FARPROC);
if (imageBaseValueOffset != -1)
printf("[Information] Function Forwading Value Signature Found at : %X\n", forwarding_value_offset);
C ) Adding Export Section/Table/Code Generation Step
Now it's time to add a step that detect if we're packing a dll file, Add the following code just after pattern search :
IMAGE_SECTION_HEADER et_sec;
memset(&et_sec, NULL, sizeof IMAGE_SECTION_HEADER);
bool hasExports = false; vector<uint8_t> et_buffer;
if (isDLL)
{
}
D ) Extracting Export Information from Input PE File
We're all set to start working on exports, the first step we need to do is finding out if input pe file has any export :
if (isDLL)
{
uint8_t export_section_index = 0;
int export_section_raw_addr = -1;
IMAGE_DATA_DIRECTORY ex_table =
in_pe_nt_header->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT];
if (ex_table.VirtualAddress != 0) hasExports = true;
printf("[Information] Has Exports : %s\n", BOOL_STR(hasExports));
if (hasExports)
{
printf("[Information] Creating Export Table...\n");
}
}
Now we get the RVA (Relative Virtual Address) of input pe export directory and we calculate at which virtual address our export section is located :
DWORD e_dir_rva = ex_table.VirtualAddress;
DWORD et_sec_virtual_address = d_sec.VirtualAddress + d_sec.Misc.VirtualSize;
printf("[Information] Input PE File Section Count : %d\n", in_pe_nt_header->FileHeader.NumberOfSections);
Then we iterate over input pe sections and find out which one contains dll exports data :
#define GET_SECTION(h,s) (uintptr_t)IMAGE_FIRST_SECTION(h) + ((s) * sizeof IMAGE_SECTION_HEADER)
...
for (size_t i = 0; i < in_pe_nt_header->FileHeader.NumberOfSections; i++)
{
IMAGE_SECTION_HEADER* get_sec = (PIMAGE_SECTION_HEADER)(GET_SECTION(in_pe_nt_header, i));
IMAGE_SECTION_HEADER* get_next_sec = (PIMAGE_SECTION_HEADER)(GET_SECTION(in_pe_nt_header, i + 1));
if (e_dir_rva > get_sec->VirtualAddress &&
e_dir_rva < get_next_sec->VirtualAddress &&
(i + 1) <= in_pe_nt_header->FileHeader.NumberOfSections)
{
export_section_index = i; break;
};
}
printf("[Information] Export Section Found At %dth Section\n", export_section_index + 1);
if (export_section_index != -1)
{
}
Alright, Let's talk about how we're going to perform the process of dll exports generation before we step into the dragon's mouth...
E ) Understanding The Concept of DLL Export Forwarding
Before we continue you need to know how DLL Exports work and what's the design behind them, DLL Exports are made of:
-
Export Directory : It's a image directory that contains two values, RVA of export table and size of it, from RVA we can find out which section contains the export table.
-
Export Section : It's a section which contains export table and export data, Also it can contain export code too.
-
Export Table : It's a structure that contains basic information about dll exports, where are they located and how many are they, where's export data located and what's it's RVA.
-
Export Data : It contains a list of functions Names RVA, Names, Ordinals and RVAs.
-
Names RVA : It's a RVA that points to the string name of the function which ends with a null string literal.
-
Function RVA : It's a RVA that points to the machine code of the function, export code!
-
Export Code : It's an array of functions machine code, this code can be inside .text section or any other section, In our packer we will generate the code inside the same export section using a base machine code.
NOTE : In this article I mentioned there's no need to assembly knowledge but this part needs a little bit assembly but since it's not complex nor dynamic we just use a small pre-generated machine code.
What is Function Forwarding? In programming function forwarding means a jump from a function call to another without messing with the function parameters.
It can be performed by several techniques known as dll hijacking, dll proxying, machine code redirection and etc. In our pe packer we generate a small piece of machine code (32 bytes) which locates loaded module image base address then sum it up with real function offset and finally we add a jump to it.
This is the assembly code we will use for function forwarding :
PUSH RCX
PUSH RAX
MOV RAX,QWORD PTR DS:[(Image Base Address)]
MOV ECX, (Function Offset)
ADD RAX,RCX
MOV QWORD PTR DS:[(Function Offset + Image Base Address)],RAX
POP RAX
POP RCX
JMP QWORD PTR DS:[(Function Offset + Image Base Address)]
So basically after we set the image base of dynamic module in unpacker to Image Base Address
we can easily add the Function Offset
to it and after we sum it up we set the value to the second static value holder Function Offset + Image Base Address
and we jump into it, that's it!
F ) Cloning Input PE Export Table, Make Changes and Rebase
Alright now that you know how the gears work it's time to start the hard part, before keep going add this useful macros to ease the process :
#define GET_SECTION(h,s) (uintptr_t)IMAGE_FIRST_SECTION(h) + ((s) * sizeof IMAGE_SECTION_HEADER)
#define RVA_TO_FILE_OFFSET(rva,membase,filebase) ((rva - membase) + filebase)
#define RVA2OFS_EXP(rva) (input_pe_file_buffer.data() + \
(RVA_TO_FILE_OFFSET(rva, in_pe_exp_sec->VirtualAddress, in_pe_exp_sec->PointerToRawData)))
#define REBASE_RVA(rva) ((rva - in_pe_exp_sec->VirtualAddress + et_sec_virtual_address) - \
(e_dir_rva - in_pe_exp_sec->VirtualAddress))
Now we parse input pe export section like this and now we have access to export table data :
printf("[Information] Parsing Input PE Export Section...\n");
PIMAGE_SECTION_HEADER in_pe_exp_sec = (PIMAGE_SECTION_HEADER)(GET_SECTION(in_pe_nt_header, export_section_index));
PIMAGE_EXPORT_DIRECTORY e_dir = (PIMAGE_EXPORT_DIRECTORY)RVA2OFS_EXP(e_dir_rva);
DWORD e_dir_size = in_pe_nt_header->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT].Size;
printf("[Information] Export Section Name : %s\n", in_pe_exp_sec->Name);
PULONG in_et_fn_tab = (PULONG)RVA2OFS_EXP(e_dir->AddressOfFunctions);
PULONG in_et_name_tab = (PULONG)RVA2OFS_EXP(e_dir->AddressOfNames);
PUSHORT in_et_ordianl_tab = (PUSHORT)RVA2OFS_EXP(e_dir->AddressOfNameOrdinals);
uintptr_t in_et_data_start = (uintptr_t)in_et_fn_tab;
DWORD in_et_last_fn_name_size = strlen((char*)RVA2OFS_EXP(in_et_name_tab[e_dir->NumberOfNames - 1])) + 1;
uintptr_t in_et_data_end = (uintptr_t)(RVA2OFS_EXP(in_et_name_tab[e_dir->NumberOfNames - 1]) + in_et_last_fn_name_size);
Then we simply rebase them using our macro like this :
printf("[Information] Rebasing Expor Table Addresses...\n");
e_dir->AddressOfFunctions = REBASE_RVA(e_dir->AddressOfFunctions);
e_dir->AddressOfNames = REBASE_RVA(e_dir->AddressOfNames);
e_dir->AddressOfNameOrdinals = REBASE_RVA(e_dir->AddressOfNameOrdinals);
for (size_t i = 0; i < e_dir->NumberOfNames; i++) in_et_name_tab[i] = REBASE_RVA(in_et_name_tab[i]);
After we rebased them we copy the export directory data to our new pe file :
et_buffer.resize(e_dir_size);
memcpy(et_buffer.data(), e_dir, sizeof IMAGE_EXPORT_DIRECTORY);
G ) Generating Exports Machine Code
Now we're ready to generate machine code for our exports, to perform this simply add this small piece of machine code template to your source code right after helpers :
unsigned char func_forwarding_code[32] =
{
0x51, 0x50, 0x48, 0x8B, 0x05, 0x00, 0x00, 0x00, 0x00, 0xB9, 0x00, 0x00, 0x00, 0x00, 0x48, 0x03, 0xC1, 0x48, 0x89, 0x05, 0x00, 0x00, 0x00, 0x00, 0x58, 0x59, 0xFF, 0x25, 0x00, 0x00, 0x00, 0x00, };
After this we need to allocate a temporary buffer, calculate image base RVA, current code block RVA and offsets, then we simply set values in machine code byte array and add it to the temporary buffer and after we're done, we add it to our export section :
printf("[Information] Generating Function Forwarding Code...\n");
DWORD ff_code_buffer_size = sizeof func_forwarding_code * e_dir->NumberOfFunctions;
uint8_t* ff_code_buffer = (uint8_t*)malloc(ff_code_buffer_size);
DWORD image_base_rva = c_sec.VirtualAddress + imageBaseValueOffset;
DWORD ff_value_rva = c_sec.VirtualAddress + forwarding_value_offset;
for (size_t i = 0; i < e_dir->NumberOfFunctions; i++)
{
DWORD func_offset = in_et_fn_tab[in_et_ordianl_tab[i]];
DWORD machine_code_offset = i * sizeof func_forwarding_code;
DWORD machine_code_rva = et_buffer.size() + machine_code_offset + et_sec_virtual_address;
int32_t* offset_to_image_base = (int32_t*)&func_forwarding_code[5];
int32_t* function_offset_value = (int32_t*)&func_forwarding_code[10];
int32_t* offset_to_func_addr = (int32_t*)&func_forwarding_code[20];
int32_t* offset_to_func_addr2 = (int32_t*)&func_forwarding_code[28];
offset_to_image_base[0] = (image_base_rva - machine_code_rva) - (5 + sizeof int32_t);
function_offset_value[0] = func_offset;
offset_to_func_addr[0] = (ff_value_rva - machine_code_rva) - (20 + sizeof int32_t);
offset_to_func_addr2[0] = (ff_value_rva - machine_code_rva) - (28 + sizeof int32_t);
memcpy(&ff_code_buffer[machine_code_offset], func_forwarding_code, sizeof func_forwarding_code);
in_et_fn_tab[i] = et_sec_virtual_address + et_buffer.size() + (i * sizeof func_forwarding_code);
}
DWORD et_data_size = in_et_data_end - in_et_data_start;
memcpy(&et_buffer.data()[sizeof IMAGE_EXPORT_DIRECTORY], (void*)in_et_data_start, et_data_size);
DWORD size_of_export_table = et_buffer.size();
et_buffer.resize(size_of_export_table + ff_code_buffer_size);
memcpy(&et_buffer.data()[size_of_export_table], (void*)ff_code_buffer, ff_code_buffer_size);
free(ff_code_buffer);
That's it! We generated everything we needed and now we just need to generate a new section header for exports :
et_sec.Name[0] = '[';
et_sec.Name[1] = ' ';
et_sec.Name[2] = 'H';
et_sec.Name[3] = '.';
et_sec.Name[4] = 'M';
et_sec.Name[5] = ' ';
et_sec.Name[6] = ']';
et_sec.Name[7] = 0x0;
et_sec.Misc.VirtualSize = _align(et_buffer.size(), memory_alignment_size);
et_sec.VirtualAddress = et_sec_virtual_address;
et_sec.SizeOfRawData = _align(et_buffer.size(), file_alignment_size);
et_sec.PointerToRawData = d_sec.PointerToRawData + d_sec.SizeOfRawData;
et_sec.Characteristics = IMAGE_SCN_CNT_INITIALIZED_DATA | IMAGE_SCN_MEM_READ | IMAGE_SCN_MEM_EXECUTE | IMAGE_SCN_CNT_CODE;
And update our export table directory :
nt_h.OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT].VirtualAddress = et_sec.VirtualAddress;
nt_h.OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT].Size = e_dir_size;
And update section counts and size of image :
nt_h.FileHeader.NumberOfSections = 3;
nt_h.OptionalHeader.SizeOfImage =
_align(et_sec.VirtualAddress + et_sec.Misc.VirtualSize, memory_alignment_size);
And it's done! Wasn't that so hard right? 🍎
H ) Writing DLL Exports to PE File
Of course we need to make some changes to write exports to dll file properly, just after finishing writing data section to file we write export section, also we should share current_pos
value :
if (et_buffer.size() != 0 && hasExports)
{
printf("[Information] Writing Export Table Data...\n");
current_pos = pe_writter.tellp();
pe_writter.write((char*)et_buffer.data(), et_buffer.size());
while (pe_writter.tellp() != current_pos + et_sec.SizeOfRawData) pe_writter.put(0x0);
}
So the final pe file generation must look like this :
printf("[Information] Writing Generated PE to Disk...\n");
fstream pe_writter;
size_t current_pos;
pe_writter.open(output_pe_file, ios::binary | ios::out);
pe_writter.write((char*)&dos_h, sizeof dos_h);
pe_writter.write((char*)&nt_h, sizeof nt_h);
pe_writter.write((char*)&c_sec, sizeof c_sec);
pe_writter.write((char*)&d_sec, sizeof d_sec);
if(nt_h.FileHeader.NumberOfSections == 3) pe_writter.write((char*)&et_sec, sizeof et_sec);
while (pe_writter.tellp() != c_sec.PointerToRawData) pe_writter.put(0x0);
DWORD data_ptr_sig = 0xAABBCCDD;
DWORD data_size_sig = 0xEEFFAADD;
DWORD actual_data_size_sig = 0xA0B0C0D0;
DWORD header_size_sig = 0xF0E0D0A0;
DWORD data_ptr_offset = _find(unpacker_stub, sizeof unpacker_stub, data_ptr_sig);
DWORD data_size_offset = _find(unpacker_stub, sizeof unpacker_stub, data_size_sig);
DWORD actual_data_size_offset = _find(unpacker_stub, sizeof unpacker_stub, actual_data_size_sig);
DWORD header_size_offset = _find(unpacker_stub, sizeof unpacker_stub, header_size_sig);
...
printf("[Information] Updating Offset Data...\n");
memcpy(&unpacker_stub[data_ptr_offset], &d_sec.VirtualAddress, sizeof DWORD);
memcpy(&unpacker_stub[data_size_offset], &d_sec.SizeOfRawData, sizeof DWORD);
DWORD pe_file_actual_size = (DWORD)input_pe_file_buffer.size();
memcpy(&unpacker_stub[actual_data_size_offset], &pe_file_actual_size, sizeof DWORD);
memcpy(&unpacker_stub[header_size_offset], &nt_h.OptionalHeader.BaseOfCode, sizeof DWORD);
printf("[Information] Writing Code Data...\n");
current_pos = pe_writter.tellp();
pe_writter.write((char*)&unpacker_stub, sizeof unpacker_stub);
while (pe_writter.tellp() != current_pos + c_sec.SizeOfRawData) pe_writter.put(0x0);
printf("[Information] Writing Packed Data...\n");
current_pos = pe_writter.tellp();
pe_writter.write((char*)data_buffer.data(), data_buffer.size());
while (pe_writter.tellp() != current_pos + d_sec.SizeOfRawData) pe_writter.put(0x0);
if (et_buffer.size() != 0 && hasExports)
{
printf("[Information] Writing Export Table Data...\n");
current_pos = pe_writter.tellp();
pe_writter.write((char*)et_buffer.data(), et_buffer.size());
while (pe_writter.tellp() != current_pos + et_sec.SizeOfRawData) pe_writter.put(0x0);
}
pe_writter.close();
Now our packer supports DLL files too! 🎉
You can download the full source code of packer and unpacker stub here.
Packer : File Version Generation
Alright, this is the last part of the article, Of course we can do a lot more and add even more features but I believe it became a very long article already, At the last part of article we use some post processing on our final pe file, We will add file information and a Icon.
You can use many libraries on GitHub for this part of article, I use my own resource library.
-
Link against utilities\hmrclib64_vc16.lib
which can be found in next chapter source code zip file.
-
Add function definitions to packer.cpp right after compression library headers :
void HMResKit_LoadPEFile(const char* peFile);
void HMResKit_SetFileInfo(const char* key, const char* value);
void HMResKit_SetPEVersion(const char* peFile);
void HMResKit_ChangeIcon(const char* iconPath);
void HMResKit_CommitChanges(const char* sectionName);
-
Add information and icon like this :
printf("[Information] Adding File Information and Icon...\n");
HMResKit_LoadPEFile(output_pe_file);
HMResKit_SetFileInfo("ProductName", "Custom PE Packer");
HMResKit_SetFileInfo("CompanyName", "MemarDesign™ LLC.");
HMResKit_SetFileInfo("LegalTrademarks", "MemarDesign™ LLC.");
HMResKit_SetFileInfo("Comments", "Developed by Hamid.Memar");
HMResKit_SetFileInfo("FileDescription", "A PE File Packed by HMPacker");
HMResKit_SetFileInfo("ProductVersion", "1.0.0.1");
HMResKit_SetFileInfo("FileVersion", "1.0.0.1");
HMResKit_SetFileInfo("InternalName", "packed-pe-file");
HMResKit_SetFileInfo("OriginalFilename", "packed-pe-file");
HMResKit_SetFileInfo("LegalCopyright", "Copyright MemarDesign™ LLC. © 2021-2022");
HMResKit_SetFileInfo("PrivateBuild", "Packed PE");
HMResKit_SetFileInfo("SpecialBuild", "Packed PE");
HMResKit_SetPEVersion("1.0.0.1");
if (!isDLL) HMResKit_ChangeIcon("app.ico");
HMResKit_CommitChanges("[ H.M ]");
NOTE : We don't cover icon and file info extraction from input pe file in this article, it can be done easily but somehow since we need to parse resource section and I don't want to make the article any longer I will suffice with just one tip.
You can take a look at this handy article.
You can download the full source code of final version of packer and unpacker stub here.
Packer : Extras + Improvement Tips
Here's some tips and extra guides on pe packer improvement that you can use.
Tip 1 : Updating Checksum
After you finished the entire post processing it's time to update pe file checksum which is located at :
OptionalHeader.CheckSum = 0xFFFFFFFF;
Valid checksum is very important for getting better results from malware scanners. You can check this article about checksum calculation for pe files.
Tip 2 : Adding Code Signing and Signature
Our packed pe file doesn't follow any standards of famous compilers and this may cause some troubles with AVs, If you're a valid programmer code signing helps a lot to fix the issue, Get a valid certificate and use signtool.exe
in post process code.
Tip 3 : Manifest Support
If you want to clone input file manifest to add extra details to packed pe like when it requires admin privileges or etc. You should parse resource directory and extract it from there.
Tip 4 : .NET Support
To add .NET support you can go with a hard way ( manipulating .net pe structure ) or use native CLR Hosting which is recommended, You can check out my article on clr hosting, By the way the article is old and It's possible to host clr in a much much better ways now, Maybe I make an article on that in future, Who knows? :)
So, pack .net assembly into data section and use clr hosting in unpacker stub to load it from memory, also you can use .Net Core Hosting as well.
Tip 5 : Multi-Layer PE Packing
One of the positive things about our packer is it doesn't mess with input pe structure to produce packed pe file, so it means you can use any other packer on the output pe file as a secondary layer of compression/protection!
Yes! You can simply create your own protection system and compression and then use a famous packer on it too, so attacker will face two phase of reverse engineering which makes life a little bit harder for them!
Also another funny thing about our packer is you can pack the packed result pe file with the same pe packer for unlimited times over and over again or even random the key and iv each time!
Tip 6 : Higher PE Compression
Remember packers only can reduce pe files with large sizes to get the best result and don't forget unpacker stub has a size itself, for example if you pack a 1KB dll with a pe packer output is larger than input file but if you pack a 100MB dll you get a very small packed file with high compression ratio!
Anyway even in this situation you can use UPX on final packed pe file to compress unpacker stub as well.
A Crazy Note for Crazy People :
Even if you want to go more crazy get upx source code and customize it to add extra encryption layer to it! :v
Tip 7 : Code Virtualization
To improve the security of your pe file try some products that offer code virtualization, if you use the virtual machine on unpacker stub code it makes reverse engineering process very difficult.
Extra : A Note on Relocation, Non-Standard PE Files
Our pe packer needs more parts to be added like handling relocations and non-ordinal function exports. It's highly recommended to not trying packer on non standard or signed pe files like d3dcompiler_47.dll
You can add new features to the packer and commit to HMPacker GitHub repository.
Extra : A Note on Multilingual PE Files
This packer is not tested with multilingual PE files, However by theory it should work fine, Don't use it on pe files without a backup, to add fully featured multilingual support you need to do some resource cloning.
Extra : Real World Test On Marmoset Toolbag 3
Let's try our pe packer on an AAA software, Marmoset Toolbag 3! Marmoset 3 has four pe files :
-
toolbag.exe : main application file with size of 19,763,288 bytes
-
substance_linker.dll : a library file with size of 378,368 bytes
-
substance_sse2_blend.dll : a library file with size of 958,976 bytes
-
python36.dll : python library with size of 3,555,992 bytes
OK, Now let's try our packer on them...
pack_marmoset.bat :
"%cd%\pe_packer.exe" "%cd%\toolbag.exe" "%cd%\toolbag_packed.exe"
"%cd%\pe_packer.exe" "%cd%\substance_sse2_blend.dll" "%cd%\substance_sse2_blend_packed.dll"
"%cd%\pe_packer.exe" "%cd%\substance_linker.dll" "%cd%\substance_linker_packed.dll"
"%cd%\pe_packer.exe" "%cd%\python36.dll" "%cd%\python36_packed.dll"
Result :
Awesome! Our packer reduced toolbag.exe from 19,763,288 bytes to 5,169,152 bytes! Let's test the software to see if it works properly or not...
And it works perfectly! No crash while usage, No performance drop and Very clean...
Congrats Again 🍻
Extra : Checking Packed Binary With AVs
VirusTotal
Here's a Scan with 67 AVs using VirusTotal which only 2 AVs detected packed toolbag as false positive which can be fixed with adding fake naked software code to a fake section, Some AVs specially AI-Powered ones like SecureAge APEX will flag every pe file without clear instructions in their sections so our packed file which is heavily encrypted gets a flag, However by adding some raw C++ code flag will be gone.
BE NICE!
NOTE : This trick doesn't work on applications that contains real malicious code, also be a good person don't use science against people, that's not just nice.
AntiScan
Here's a Scan with 26 AVs using AntiScan which none of AVs detected packed toolbag as false positive!
Extra : Take a Closer Look at Packed Binary
Before finishing the article, Let's take a closer and technical look at generated binary by our packer.
-
None of the famous pe detectors recognized packed toolbag :
-
Our pe file has custom sections, no import table and no imports from any other dependencies :
-
Our pe file has 99% entropy and it means it's heavily compressed.
-
Our pe packed only has ~12.5MB memory overhead compared to original pe executable.
Bonus : Dark Version of the Article
Article can be found as single html file with dark github theme here.
Credits
I hope you enjoyed the article and it help you learn more. Feel free to translate the article in your language just don't forget to mention the original one link on CodeProject and author's name.
Licensed under the MIT License :
A short and simple permissive license with conditions only requiring preservation of copyright and license notices. Licensed works, modifications, and larger works may be distributed under different terms and without source code.
Authored, Developed and Published By Hamid.Memar
13 November 2021