Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / C

Creating a self extracting binary

3.93/5 (6 votes)
4 Mar 2010CPOL5 min read 2   1.7K  
This is a utility to compress files in a self extracting binary. The self extracting binary can be used without any installation on any system to extract files from it.

Introduction

Most of us must have used application installation programs which first extract the whole setup from itself and then start the installation process. For example, the Mozilla Firefox installation file has only one file which extracts all the setup first and then starts the installation process.

This application also creates a self extracting compressed binary. Once created, these binaries don't require any installation, and can extract all the files which were compressed inside.

Background

The application tries to demonstrate the following things:

  • Creates a self extracting binary.
  • Demonstrates the use of some common QT cross platform GUI C++ library controls. Using QT makes porting of the application to other Operating Systems a cake walk. Information about the library can be found here: http://qt.nokia.com/.
  • Uses the Zlib library for in-memory compression of data.
  • Uses the Blowfish encryption algorithm for encrypting the password.

Using the code

It's a very small code having only three classes.

  • The cselfextractor class is inherited from the QDialog class and implements the interface of the application.
  • The cAbout class is a very small class, and inherits QDialog to show the About dialog box of the application.
  • The cBinExt class implements the logic for creating the binary and extracting data from the binary.

To use the code and compile it, Qt should be installed on the system. If you are new to Qt, just open the .pro file from Qt and compile the code. It's easy.

Understanding the cselfextractor class is quite easy if you have done some C++ programming. Qt uses signals and slots for event handling. Signals are events like button clicks or key press etc., as in VC++ or any GUI based library. Slots are functions which handle these events. These are declared in the header file and implemented in the .cpp file.

C++
QObject::connect(m_btnAddFiles, SIGNAL(clicked()),this, SLOT(AddFiles()));

Here, clicking on the button m_btnAddFiles will call the AddFiles function. You can put your implementation for that even in this function.

C++
cBinExt     *m_BinExt;

There is a pointer to the class cBinExt in cselfextractor. The cBinExt class exposes some common functions so that compression or extraction can be done on the data which the user selects using QDialog. The user selects the files, and all these files are send to the cBinExt class for compression.

Let's talk about how all this is done. How a self extracting binary is created without breaking the addresses in the original binary. Most experiences programmers will know this. If we have an application (not necessarily a C++ application), there are addresses in the binary for everything, like for functions etc. If we write anything in the binary that will break the addresses, the binary may not run.

But if we write anything at the end of the binary, nothing will break and everything will run as usual, and the binary will not know that something is appended at the end. This is the idea behind creating a self-extracting binary which I have used.

Let's start with a normal binary which is not compressed.

C++
m_BinExt = new cBinExt;

if(m_BinExt->IsAttachmentPresent() == SUCCESS)
{
      m_AmICompressed = true;
}

Note: If you look at the code, you will notice that all the try-catch are commented. This is because I have compiled Qt using the -static and -fno-exceptions switch so that I can distribute my application without any dependency of DLLs or .sos. That's why you will notice that the size of the binary is around 8 MB. All the Qt code is statically linked.

Before understanding the above code, I will discuss the structure of the binary. Here we go:

File Structure (will help if we start from end):
         _______________
        |Executable data|
        |_______________|
        |Embeded file(s)|
        |_______________|___________________________________
        |encrypted keys, size = 2 * sizeof(unsigned long)   |
        |___________________________________________________|
        |Index telling the size and |
        |name of embeded file       |
        |This section will be 100kb |
        |___________________________|
        |Index structure            |
        |*IDX*\n                    |
        |FileName_1 \t size \n      |
        |FileName_2 \t size \n      |
        |FileName_n \t size \n      |
        |*EIDX*\n                   |
        |___________________________|

When I run the normal application, it has only the first part of the structure as shown above, i.e., executable data.

Now, the user selects some files using the GUI and gives a password. When the user presses the archive button, the following steps are taken:

  • The application creates a file at the specified path as supplied by the user.
  • Opens and reads itself and writes to the output file.
  • Opens each file in the list, reads them, compresses them using Zlib in-memory, and writes to the output file. Updates the index structure.
  • When all the file data is appended to the output file, encrypt the password. The encryption gives two keys. These are 8 bytes in length. Write them to the output.
  • After writing the password keys, write the index structure. I have considered its length as 100 KB.

So, the compressed file has the following structure. The beginning of the file is the same as that of the original application, so it can be executed without any problems. Now, let's revisit the line:

C++
if(m_BinExt->IsAttachmentPresent() == SUCCESS)
{
      m_AmICompressed = true;
}

The IsAttachmentPresent function will try to find the Index Structure at the end of any binary. If it finds the structure, this means it's a compressed binary and the application will try to extract the data. If it didn't find any structure, that means it's the original application, and it tries to compress the data. m_AmICompressed is set accordingly.

C++
if(m_AmICompressed)
{
    //some code ----
    QApplication::setOverrideCursor(QCursor(Qt::WaitCursor));
    bError = m_BinExt->ExtractFiles(m_edtOutputPath->text().toLocal8Bit().constData(),
                           m_linePassword->text().toLocal8Bit().constData(), this);
    QApplication::restoreOverrideCursor();
}
else
{
    if((m_TotalFileCount = m_lstFileList->count()) == 0)
    {
        m_BinExt->ShowMessage("I don't have any file to archive.", m_BinExt->INFO);
        break;
    }

    //some code -------    

    //store the file list which fail during archiving
    StringList FailedFileList;

    QApplication::setOverrideCursor(QCursor(Qt::WaitCursor));
    bError = m_BinExt->CreateArchive(m_FileList, 
               m_edtOutputPath->text().toLocal8Bit().constData(),
               FailedFileList, m_linePassword->text().toLocal8Bit().constData(), this);
    QApplication::restoreOverrideCursor();

    //some code -------

}

Now you can see, if the binary is compressed, it calls the extraction function. If the binary is not compressed, it calls the compression function.

Also, I am sending this pointer to the cBinExt class. This is because as the class progresses, it needs to update the progress bar. By exposing a couple of functions, it can update the progress bar.

Let's discuss the cBinExt class. The class has mainly three important functions:

  • One which tells if it's a compressed binary or not.
  • One which compresses the files as discussed above.
  • One which extracts the files already inside the binary.

The extraction is done in the cBinExt::ExtractFiles() function. The program starts from the end and reads the index first. ReadIndexFromBinary() does this work and returns a list of files with the sizes. The sizes are used when sending the data to the de-compression function of Zlib. Once decompressed, all the data is written to the file. All the files are then extracted.

Decompression only occures when the password is matched with the password keys stored in the binary. Encrypted keys are read by the ReadEncryptionKeys() function. The supplied password is encrypted and the keys generated are matched with the stored keys. If they match, it's a valid password.

History

This is the first version of the application. The application cannot compress folders. The next release of the application will support the compression of folders. Also, the application can fail if a file is very large and a single chunk of memory cannot be allocated for compression. This needs to be taken care of.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)