Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

How to Write a Simple Packer/Unpacker with a Self-Extractor (SFX)

0.00/5 (No votes)
21 Sep 2003 1  
An example of writing a self-extracting archive using pack and unpack routines.

Introduction

In this article I will show how to write a file packer/unpacker and how to make a self-extracting version of the archive (SFX).

Please note this article and code has been written for learning purposes and not for complex functionality, thus the following limitations apply:

  • Only packing of files (binding them into one file) and no compression
  • Packer doesn't pack files in subdirectories
  • Packer header is not really optimized - just enough for our purposes
  • All code presented here compiles as a console application and no GUI version is provided

The Archive File Format

The idea is to build a structure/format that will allow us to hold a file list and file contents in one file in such a way that we will be able to restore the files to their original state.

Thus this design of the pack header:

  • Signature - Offset 0x02/DWORD
    This will occupy the first 4 bytes of the header. It will contain a simple signature that will allow us to identify our packed files.

  • NumOfFiles - Offset 0x04/DWORD
    Here we stored a DWORD holding the number of files in a subject.

  • FilesInfo - Offset 0x08/sizeof(packdata_t)
    Here we start storing the file information in a sequence defined as the array packdata_t FileInfo[NumOfFiles].

    The packdata_t structure is defined as:

    struct packdata_t
    {
      char FileName[MAX_PATH];
      long filesize;
    }

    As you noticed, we simply save the file's size and name. The packdata_t structure is not the optimal way of storing file names or information, because we could have used a variable length packdata_t struct defined as

    struct packdata_t
    {
      long filesize;
      // Other file info, such as creation date , attributes, ...
    
      char filenameLength;
      char FileName[1];
    }

    But, of course, managing this last struct is beyond the scope of this article.

After the pack header we have the files' contents stored in sequence. So the whole archive file format will look like this:

Signature
NumOfFiles
packdata_t Files[NumOfFiles]
File1 content
File2 content
.
.
.
File(NumOfFiles) content

Writing the Packer

In order to make the code a little extensible, I have defined a structure that will hold callback functions triggered from inside the packer/unpacker routines. These callbacks are used for visual notifications and updates.

The callback struct is defined as:

typedef struct
{
  void (*newfile)(char *name, long size);
  void (*fileprogress)(long pos);
} packcallbacks_t;

The newfile() callback is called whenever the packer/unpacker encounters or processes a new file. It will be passed the file's name and size.

The fileprogress() callback is called whenever an operation is in progress. It will be passed the current position that the packer/unpacker is currently processing.

Now, let us define the packfiles function prototype:

int packfilesEx(char *path, char *mask, char *archive,
  packcallbacks_t * pcb = NULL);
  • We need a path that will designate the source directory.
  • The mask which will tell us what files to search for and pack.
  • The archive which will hold the archive file name.
  • An optional pcb which will hold a list of callbacks used for visual notifications.

Before going to the code, here is the packfilesEx() code flow:

  1. Build packdata_t array of all files to be packed (storing their names and size)
  2. Create the archive file and write in it the Signature and file count
  3. Write the packdata_t array into the archive
  4. Start reading every file and write its content in the archive
  5. Loop (4) until all files are stored
  6. Close the archive file

This operation is enough to pack all files into one single archive file. Now we go straight to the code:

int packfilesEx(char *path, char *mask, char *archive, packcallbacks_t *pcb)
{
  TCHAR szCurDir[MAX_PATH];

  // define a vector that will hold the packdata_t array.

  // STL Vectors are stored in contiquous memory.

  std::vector<packdata_t> filesList;
  
  // make sure the current source directory is valid 

  // and change working directory to it if so.


  // save current directory

  GetCurrentDirectory(MAX_PATH, szCurDir);

  // go to new working directory

  if (!SetCurrentDirectory(path))
    return packerrorPath;
    
  WIN32_FIND_DATA fd;
  HANDLE findHandle;
  packdata_t pdata;

  findHandle = FindFirstFile(mask, &fd);
  if (findHandle == INVALID_HANDLE_VALUE)
    return packerrorNoFiles;

  long lTemp;

  // this loop is for storing file's headers only

  // directories are omitted

  do
  {
    // skip directory entries

    if ((fd.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY)
      == FILE_ATTRIBUTE_DIRECTORY)
      continue;

    // clear record

    memset(&pdata, 0, sizeof(pdata));

    // fill packdata entry

    strcpy(pdata.filename, fd.cFileName);
    pdata.filesize = fd.nFileSizeLow;

    // save entry

    filesList.push_back(pdata);
  } while(FindNextFile(findHandle, &fd));
  FindClose(findHandle);

  FILE *fpArchive = fopen(archive, "wb");
  if (!fpArchive)
    return packerrorCannotCreateArchive;

  // write signature

  lTemp = 'KCPL'; // lallous pack! (L-PCK)

  fwrite(&lTemp, sizeof(lTemp), 1, fpArchive);

  // write entries count

  lTemp = filesList.size();
  fwrite(&lTemp, sizeof(lTemp), 1, fpArchive);

  // store files entries (since std::vector stores elements

  // in a linear manner)

  fwrite(&filesList[0], sizeof(pdata), filesList.size(), fpArchive);

  // process all files to copy

  for (unsigned int cnt=0;cnt<filesList.size();cnt++)
  {
      FILE *inFile = fopen(filesList[cnt].filename, "rb");
    long size = filesList[cnt].filesize;

    // if callback assigned then trigger it

    if (pcb && pcb->newfile)
      pcb->newfile(filesList[cnt].filename, size);

    // copy file name

    long pos = 0;
    while (size > 0)
    {
      char buffer[4096];
      long toread = size > sizeof(buffer) ? sizeof(buffer) : size;
      fread(buffer, toread, 1, inFile);
      fwrite(buffer, toread, 1, fpArchive);
      pos += toread;
      size -= toread;
      if (pcb && pcb->fileprogress)
        pcb->fileprogress(pos);
    }
    fclose(inFile);
  }

  // close archive and restore working directory

  fclose(fpArchive);

  SetCurrentDirectory(szCurDir);
  return packerrorSuccess;
}

Writing the Unpacker

As the packing process has been explained in details, the unpacking part become more obvious; therefore, only the code flow will be presented:

  1. Open archive file
  2. Read pack header
  3. Verify signature - if not valid - report and exit
  4. Having read the pack header (Signature, NumOfFiles, packdata_t array) start extracting the files
  5. Create a new file named packdata_t[idx].FileName and write its contents from the archive file
  6. Process next file
  7. close archive file and exit
int unpackfileEx(char *archive, char *dest, packcallbacks_t * pcb,
  long startPos)
{
  FILE *fpArchive = fopen(archive, "rb");

  // failed to open archive?

  if (!fpArchive)
    return packerrorCouldNotOpenArchive;

  long nFiles;

  if (startPos)
    fseek(fpArchive, startPos, SEEK_SET);

  // read signature

  fread(&nFiles, sizeof(nFiles), 1, fpArchive);
  if (nFiles != 'KCPL')
    return (fclose(fpArchive), packerrorNotAPackedFile);

  // read files entries count

  fread(&nFiles, sizeof(nFiles), 1, fpArchive);

  // no files?

  if (!nFiles)
    return (fclose(fpArchive), packerrorNoFiles);

  // read all files entries

  std::vector<packdata_t> filesList(nFiles);
  fread(&filesList[0], sizeof(packdata_t), nFiles, fpArchive);

  // loop in all files

  for (unsigned int i=0;i<filesList.size();i++)
  {
    FILE *fpOut;
    char Buffer[4096];
    packdata_t *pdata = &filesList[i];

    // trigger callback

    if (pcb && pcb->newfile)
      pcb->newfile(pdata->filename, pdata->filesize);

    strcpy(Buffer, dest);
    strcat(Buffer, pdata->filename);
    fpOut = fopen(Buffer, "wb");
    if (!fpOut)
      return (fclose(fpArchive), packerrorExtractError);

    // how many chunks of Buffer_Size is there is in filesize?

    long size = pdata->filesize;
    long pos = 0;
    while (size > 0)
    {
      long toread =  size > sizeof(Buffer) ? sizeof(Buffer) : size;
      fread(Buffer, toread, 1, fpArchive);
      fwrite(Buffer, toread, 1, fpOut);
      pos += toread;
      size -= toread;
      if (pcb && pcb->fileprogress)
        pcb->fileprogress(pos);
    }
    fclose(fpOut);
    nFiles--;
  }
  fclose(fpArchive);
  return packerrorSuccess;
}

Writing the Self-Extractor (SFX)

The SFX is simply a special version of the unpacker (we will call it UnpackerStub) that instead of taking the archive file as command line it will look for an archive file that is embedded into it.
If you are a math geek you can think of an SFX as "UnpackerStub.exe + Archive.bin = UnpackerArchive.exe".

Now how to embed the archive file into the unpacker to form an SFX?

In order to do that we need to write some information in the UnpackerStub that will help it locate the Archive.bin body.

For this purpose I use the e_res2 field in the IMAGE_DOS_HEADER to store a pointer to the archive data inside the unpacker stub.
Every executable has a well documented and defined format that will instruct and tell the OS how to load/run it. The IMAGE_DOS_HEADER (defined in WINNT.H) is located at offset zero of every exectuable and has the following fields:

typedef struct _IMAGE_DOS_HEADER {    // DOS .EXE header

  WORD   e_magic;                     // Magic number

  WORD   e_cblp;                      // Bytes on last page of file

  WORD   e_cp;                        // Pages in file

  WORD   e_crlc;                      // Relocations

  WORD   e_cparhdr;                   // Size of header in paragraphs

  WORD   e_minalloc;                  // Minimum extra paragraphs needed

  WORD   e_maxalloc;                  // Maximum extra paragraphs needed

  WORD   e_ss;                        // Initial (relative) SS value

  WORD   e_sp;                        // Initial SP value

  WORD   e_csum;                      // Checksum

  WORD   e_ip;                        // Initial IP value

  WORD   e_cs;                        // Initial (relative) CS value

  WORD   e_lfarlc;                    // File address of relocation table

  WORD   e_ovno;                      // Overlay number

  WORD   e_res[4];                    // Reserved words

  WORD   e_oemid;                     // OEM identifier (for e_oeminfo)

  WORD   e_oeminfo;                   // OEM information; e_oemid specific

  WORD   e_res2[10];                  // Reserved words

  LONG   e_lfanew;                    // File address of new exe header

} IMAGE_DOS_HEADER, *PIMAGE_DOS_HEADER;

I store a pointer to the archive file address into the e_res2 field which is large enough to hold a DWORD. After storing the pointer to the archive, I make sure to append the archive content into the UnpackerStub at that pointer location.

Two functions has been written to get/store the pointer of the archive data:

int SfxSetInsertPos(char *filename, long pos)
{
  FILE *fp = fopen(filename, "rb+");
  if (fp == NULL)               
    return packerrorCouldNotOpenArchive;

  IMAGE_DOS_HEADER idh;

  // read dos header

  fread((void *)&idh, sizeof(idh), 1, fp);

  // adjust position value in an unused MZ field

  *(long *)&idh.e_res2[0] = pos;

  // update header

  rewind(fp);
  fwrite((void *)&idh, sizeof(idh), 1, fp);
  fclose(fp);
  return packerrorSuccess;
}

This function will store the pointer. First it reads the header, updates the e_res2 field then writes the header back again.

int SfxGetInsertPos(char *filename, long *pos)
{
  FILE *fp = fopen(filename, "rb");
  if (fp == NULL)
    return packerrorCouldNotOpenArchive;

  IMAGE_DOS_HEADER idh;

  fread((void *)&idh, sizeof(idh), 1, fp);
  fclose(fp);
  *pos = *(long *)&idh.e_res2[0];
  return packerrorSuccess;
}

This function will read the header and extract the value from the e_res2 field.

In short, the unpacker stub works like this:

  1. Call SfxGetInsertPos() to get the position of the archive file
  2. Call the UnpackFilesEx() while passing the position (start of embedded archive.bin) of the archive file and the archive filename which is itself (computed by calling GetModuleFileName(NULL, ...)

Now I continue to describe how the Packer builds the SFX:

// check if unpackerstub.exe exists

  if (GetFileAttributes(sfxStubFile) == (DWORD)-1)
    {
      printf("SFX stub file not found!");
      return 1;
    }

    // open archive file

    FILE *fpArc = fopen(argv[3], "rb");
    if (!fpArc)
    {
      printf("Failed to open archive!\n");
      return 1;
    }
    // get archive size

    fseek(fpArc, 0, SEEK_END);
    long arcSize = ftell(fpArc);
    rewind(fpArc);

    // form output sfx file name

    char sfxName[MAX_PATH];
    strcpy(sfxName, argv[3]);
    strcat(sfxName, ".sfx.exe");

    // take a copy from SFX

    if (!CopyFile(sfxStubFile, sfxName, FALSE))
    {
      fclose(fpArc);
      printf("Could not create SFX file!\n");
      return 1;
    }

    // append data to SFX

    FILE *fpSfx = fopen(sfxName, "rb+");
    fseek(fpSfx, 0, SEEK_END);

    // get SFX size before archive appending

    long sfxSize = ftell(fpSfx);

    // start appending from archive file to the end of SFX file

    char Buffer[4096 * 2];
    while (arcSize > 0)
    {
      long rw = arcSize > sizeof(Buffer) ? sizeof(Buffer) : arcSize;
      fread(Buffer, rw, 1, fpArc);
      fwrite(Buffer, rw, 1, fpSfx);
      arcSize -= rw;
    }
    fclose(fpArc);
    fclose(fpSfx);

    // mark archive data position inside SFX

    SfxSetInsertPos(sfxName, sfxSize);

    // delete archive file while keeping only the SFX

    DeleteFile(argv[3]);

    printf("SFX created: %s\n", sfxName);

That's all!

Using the Code and Binaries

The article comes with Packer.cpp and Unpacker.cpp, two examples demonstrating how to use the pack and unpack functionality.

Packer.exe usage

You should always specify full paths because relative paths are not currently supported.

c:>packer e:\temp\bc *.* e:\test.bin

This will pack contents of e:\temp\bc\*.* to e:\test.bin (archive)

If you add 'sfx' as:

c:>packer e:\temp\bc *.* e:\test.bin sfx

an SFX of name e:\test.bin.sfx.exe will be created

Unpacker.exe usage

Make sure you specify a valid output directory:

c:\>unpacker e:\test.bin e:\out

This will unpack contents of e:\test.bin to e:\out\

Sfx.exe usage

The sfx takes only one parameter which is the destination directory.

c:\>sfx.exe e:\out

This will extract to e:\out\

Final Notes

I hope you enjoyed reading this article and learned something new.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here