Introduction
In this article I will show how to write a file packer/unpacker and how to
make a self-extracting version of the archive (SFX).
Please note this article and code has been written for learning purposes and
not for complex functionality, thus the following limitations apply:
- Only packing of files (binding them into one file) and no compression
- Packer doesn't pack files in subdirectories
- Packer header is not really optimized - just enough for our purposes
- All code presented here compiles as a console application and no GUI version
is provided
The Archive File Format
The idea is to build a structure/format that will allow us to hold a file
list and file contents in one file in such a way that we will be able to restore
the files to their original state.
Thus this design of the pack header:
-
Signature
- Offset 0x02/DWORD
This will occupy
the first 4 bytes of the header. It will contain a simple signature that will
allow us to identify our packed files.
-
NumOfFiles
- Offset 0x04/DWORD
Here we stored a
DWORD
holding the number of files in a subject.
-
FilesInfo
- Offset 0x08/sizeof(packdata_t)
Here
we start storing the file information in a sequence defined as the array
packdata_t FileInfo[NumOfFiles]
.
The packdata_t
structure is defined as:
struct packdata_t
{
char FileName[MAX_PATH];
long filesize;
}
As you noticed, we simply save the file's size and name. The
packdata_t
structure is not the optimal way of storing file names
or information, because we could have used a variable length
packdata_t
struct defined as
struct packdata_t
{
long filesize;
char filenameLength;
char FileName[1];
}
But, of course, managing this last struct is beyond the scope of this
article.
After the pack header we have the files' contents stored in sequence. So the
whole archive file format will look like this:
Signature |
NumOfFiles |
packdata_t Files[NumOfFiles] |
File1 content |
File2 content |
. |
. |
. |
File(NumOfFiles) content |
Writing the Packer
In order to make the code a little extensible, I have defined a structure
that will hold callback functions triggered from inside the packer/unpacker
routines. These callbacks are used for visual notifications and updates.
The callback struct is defined as:
typedef struct
{
void (*newfile)(char *name, long size);
void (*fileprogress)(long pos);
} packcallbacks_t;
The newfile()
callback is called whenever the packer/unpacker
encounters or processes a new file. It will be passed the file's name and size.
The fileprogress()
callback is called whenever an operation is
in progress. It will be passed the current position that the packer/unpacker is
currently processing.
Now, let us define the packfiles function prototype:
int packfilesEx(char *path, char *mask, char *archive,
packcallbacks_t * pcb = NULL);
- We need a
path
that will designate the source directory.
- The
mask
which will tell us what files to search for and pack.
- The
archive
which will hold the archive file name.
- An optional
pcb
which will hold a list of callbacks used for
visual notifications.
Before going to the code, here is the packfilesEx()
code
flow:
- Build
packdata_t
array of all files to be packed (storing their
names and size)
- Create the archive file and write in it the
Signature
and file
count
- Write the
packdata_t
array into the archive
- Start reading every file and write its content in the archive
- Loop (4) until all files are stored
- Close the archive file
This operation is enough to pack all files into one single archive file. Now
we go straight to the code:
int packfilesEx(char *path, char *mask, char *archive, packcallbacks_t *pcb)
{
TCHAR szCurDir[MAX_PATH];
std::vector<packdata_t> filesList;
GetCurrentDirectory(MAX_PATH, szCurDir);
if (!SetCurrentDirectory(path))
return packerrorPath;
WIN32_FIND_DATA fd;
HANDLE findHandle;
packdata_t pdata;
findHandle = FindFirstFile(mask, &fd);
if (findHandle == INVALID_HANDLE_VALUE)
return packerrorNoFiles;
long lTemp;
do
{
if ((fd.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY)
== FILE_ATTRIBUTE_DIRECTORY)
continue;
memset(&pdata, 0, sizeof(pdata));
strcpy(pdata.filename, fd.cFileName);
pdata.filesize = fd.nFileSizeLow;
filesList.push_back(pdata);
} while(FindNextFile(findHandle, &fd));
FindClose(findHandle);
FILE *fpArchive = fopen(archive, "wb");
if (!fpArchive)
return packerrorCannotCreateArchive;
lTemp = 'KCPL';
fwrite(&lTemp, sizeof(lTemp), 1, fpArchive);
lTemp = filesList.size();
fwrite(&lTemp, sizeof(lTemp), 1, fpArchive);
fwrite(&filesList[0], sizeof(pdata), filesList.size(), fpArchive);
for (unsigned int cnt=0;cnt<filesList.size();cnt++)
{
FILE *inFile = fopen(filesList[cnt].filename, "rb");
long size = filesList[cnt].filesize;
if (pcb && pcb->newfile)
pcb->newfile(filesList[cnt].filename, size);
long pos = 0;
while (size > 0)
{
char buffer[4096];
long toread = size > sizeof(buffer) ? sizeof(buffer) : size;
fread(buffer, toread, 1, inFile);
fwrite(buffer, toread, 1, fpArchive);
pos += toread;
size -= toread;
if (pcb && pcb->fileprogress)
pcb->fileprogress(pos);
}
fclose(inFile);
}
fclose(fpArchive);
SetCurrentDirectory(szCurDir);
return packerrorSuccess;
}
Writing the Unpacker
As the packing process has been explained in details, the unpacking part
become more obvious; therefore, only the code flow will be presented:
- Open archive file
- Read pack header
- Verify signature - if not valid - report and exit
- Having read the pack header (
Signature
,
NumOfFiles
, packdata_t
array) start extracting the
files
- Create a new file named
packdata_t[idx].FileName
and write its
contents from the archive file
- Process next file
- close archive file and exit
int unpackfileEx(char *archive, char *dest, packcallbacks_t * pcb,
long startPos)
{
FILE *fpArchive = fopen(archive, "rb");
if (!fpArchive)
return packerrorCouldNotOpenArchive;
long nFiles;
if (startPos)
fseek(fpArchive, startPos, SEEK_SET);
fread(&nFiles, sizeof(nFiles), 1, fpArchive);
if (nFiles != 'KCPL')
return (fclose(fpArchive), packerrorNotAPackedFile);
fread(&nFiles, sizeof(nFiles), 1, fpArchive);
if (!nFiles)
return (fclose(fpArchive), packerrorNoFiles);
std::vector<packdata_t> filesList(nFiles);
fread(&filesList[0], sizeof(packdata_t), nFiles, fpArchive);
for (unsigned int i=0;i<filesList.size();i++)
{
FILE *fpOut;
char Buffer[4096];
packdata_t *pdata = &filesList[i];
if (pcb && pcb->newfile)
pcb->newfile(pdata->filename, pdata->filesize);
strcpy(Buffer, dest);
strcat(Buffer, pdata->filename);
fpOut = fopen(Buffer, "wb");
if (!fpOut)
return (fclose(fpArchive), packerrorExtractError);
long size = pdata->filesize;
long pos = 0;
while (size > 0)
{
long toread = size > sizeof(Buffer) ? sizeof(Buffer) : size;
fread(Buffer, toread, 1, fpArchive);
fwrite(Buffer, toread, 1, fpOut);
pos += toread;
size -= toread;
if (pcb && pcb->fileprogress)
pcb->fileprogress(pos);
}
fclose(fpOut);
nFiles--;
}
fclose(fpArchive);
return packerrorSuccess;
}
Writing the Self-Extractor (SFX)
The SFX is simply a special version of the unpacker (we will call it
UnpackerStub) that instead of taking the archive file as command line it will
look for an archive file that is embedded into it.
If you are a math geek you
can think of an SFX as "UnpackerStub.exe + Archive.bin = UnpackerArchive.exe".
Now how to embed the archive file into the unpacker to form an SFX?
In order to do that we need to write some information in the UnpackerStub
that will help it locate the Archive.bin body.
For this purpose I use the e_res2
field in the
IMAGE_DOS_HEADER
to store a pointer to the archive data inside the
unpacker stub.
Every executable has a well documented and defined format that
will instruct and tell the OS how to load/run it. The
IMAGE_DOS_HEADER
(defined in WINNT.H) is located at offset
zero of every exectuable and has the following fields:
typedef struct _IMAGE_DOS_HEADER {
WORD e_magic;
WORD e_cblp;
WORD e_cp;
WORD e_crlc;
WORD e_cparhdr;
WORD e_minalloc;
WORD e_maxalloc;
WORD e_ss;
WORD e_sp;
WORD e_csum;
WORD e_ip;
WORD e_cs;
WORD e_lfarlc;
WORD e_ovno;
WORD e_res[4];
WORD e_oemid;
WORD e_oeminfo;
WORD e_res2[10];
LONG e_lfanew;
} IMAGE_DOS_HEADER, *PIMAGE_DOS_HEADER;
I store a pointer to the archive file address into the e_res2
field which is large enough to hold a DWORD. After storing the pointer to the
archive, I make sure to append the archive content into the UnpackerStub
at that pointer location.
Two functions has been written to get/store the pointer of the archive data:
int SfxSetInsertPos(char *filename, long pos)
{
FILE *fp = fopen(filename, "rb+");
if (fp == NULL)
return packerrorCouldNotOpenArchive;
IMAGE_DOS_HEADER idh;
fread((void *)&idh, sizeof(idh), 1, fp);
*(long *)&idh.e_res2[0] = pos;
rewind(fp);
fwrite((void *)&idh, sizeof(idh), 1, fp);
fclose(fp);
return packerrorSuccess;
}
This function will store the pointer. First it reads the header, updates the
e_res2
field then writes the header back again.
int SfxGetInsertPos(char *filename, long *pos)
{
FILE *fp = fopen(filename, "rb");
if (fp == NULL)
return packerrorCouldNotOpenArchive;
IMAGE_DOS_HEADER idh;
fread((void *)&idh, sizeof(idh), 1, fp);
fclose(fp);
*pos = *(long *)&idh.e_res2[0];
return packerrorSuccess;
}
This function will read the header and extract the value from the e_res2
field.
In short, the unpacker stub works like this:
- Call
SfxGetInsertPos()
to get the position of the archive file
- Call the
UnpackFilesEx()
while passing the position (start of
embedded archive.bin) of the archive file and the archive filename which
is itself (computed by calling GetModuleFileName(NULL, ...)
Now I continue to describe how the Packer builds the SFX:
if (GetFileAttributes(sfxStubFile) == (DWORD)-1)
{
printf("SFX stub file not found!");
return 1;
}
FILE *fpArc = fopen(argv[3], "rb");
if (!fpArc)
{
printf("Failed to open archive!\n");
return 1;
}
fseek(fpArc, 0, SEEK_END);
long arcSize = ftell(fpArc);
rewind(fpArc);
char sfxName[MAX_PATH];
strcpy(sfxName, argv[3]);
strcat(sfxName, ".sfx.exe");
if (!CopyFile(sfxStubFile, sfxName, FALSE))
{
fclose(fpArc);
printf("Could not create SFX file!\n");
return 1;
}
FILE *fpSfx = fopen(sfxName, "rb+");
fseek(fpSfx, 0, SEEK_END);
long sfxSize = ftell(fpSfx);
char Buffer[4096 * 2];
while (arcSize > 0)
{
long rw = arcSize > sizeof(Buffer) ? sizeof(Buffer) : arcSize;
fread(Buffer, rw, 1, fpArc);
fwrite(Buffer, rw, 1, fpSfx);
arcSize -= rw;
}
fclose(fpArc);
fclose(fpSfx);
SfxSetInsertPos(sfxName, sfxSize);
DeleteFile(argv[3]);
printf("SFX created: %s\n", sfxName);
That's all!
Using the Code and Binaries
The article comes with Packer.cpp and Unpacker.cpp, two
examples demonstrating how to use the pack and unpack functionality.
Packer.exe usage
You should always specify full paths because relative paths are not
currently supported.
c:>packer e:\temp\bc *.* e:\test.bin
This will pack contents of e:\temp\bc\*.* to e:\test.bin
(archive)
If you add 'sfx' as:
c:>packer e:\temp\bc *.* e:\test.bin sfx
an SFX of name e:\test.bin.sfx.exe will be created
Unpacker.exe usage
Make sure you specify a valid output directory:
c:\>unpacker e:\test.bin e:\out
This will unpack contents of e:\test.bin to e:\out\
Sfx.exe usage
The sfx takes only one parameter which is the destination directory.
c:\>sfx.exe e:\out
This will extract to e:\out\
Final Notes
I hope you enjoyed reading this article and learned something new.