Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / C#

BinDiff - A tool to compare binary files

4.50/5 (5 votes)
13 Mar 2013CPOL5 min read 60.2K   1.9K  
Compare binary files.

Image 1

Introduction 

I needed a simple tool to compare two binary files, especially focusing on small sequences of added or removed bytes. The main purpose was to detect changes in an audio transmission, resulting from the clock drift of several gateways the audio stream was passed through. 

Most binary file comparison tools (like for example "BinCoMerge") are very slow when it's about comparing large files that have many small differences where bytes are added or removed. So I needed something quicker. Furthermore, I wanted to save the results of each comparison in a separate text file, for later analysis. That's why I wrote this program. 

What it does  

BinDiff reads File 1 in blocks of 16 bytes, and tries to find each block in File 2. For each block, the resulting text file will contain a line that shows whether these 16 bytes were found, and if so, where they were found. The idea of this procedure is that the bytes of File 1 will be displayed like in a normal hex editor, with 16 bytes per line, whereas the bytes of File 2 will be moved in a way so the user can easily compare File 1 and File 2. Additionally, the resulting text file will contain lines that show bytes that were added in File 2 (either before the first found block, or between two found blocks, or after the last found block).

Example

File 1:  

00000000  00 01 02 03  04 05 06 07   08 09 0A 0B  0C 0D 0E 0F
00000010  17 23 AC F7  59 DE FF DD   5C 74 DF A7  4B D9 58 26
00000020  10 11 12 13  14 15

File 2:   

00000000  FF 00 01 02  03 04 05 06   07 08 09 0A  0B 0C 0D 0E
00000010  0F 57 58 59  17 23 AC F7   59 DE FF DD  5C 74 DF A7
00000020  4B D9 58 26  AA BB CC DD   EE

Diff:

+                                                                     00000000  FF
   00000000  00 01 02 03  04 05 06 07   08 09 0A 0B  0C 0D 0E 0F      00000001  00 01 02 03  04 05 ..
+                                                                     00000011  57 58 59
   00000010  17 23 AC F7  59 DE FF DD   5C 74 DF A7  4B D9 58 26      00000014  17 23 AC F7  59 DE ..
-  00000020  10 11 12 13  14 15
+                                                                     00000024  AA BB CC DD  EE

Compare Algorithm

If a block is not found, it is marked with a "-", and written on the left side. If a block is found, it is not marked, and written on both sides, with respect to the different file positions. If there are some bytes between the previous found block and this found block, they are marked with a "+", and written on the right side. The same will happen for bytes before the first found block and after the last found block.

As a result of these requirements, the function DiffWriter.WriteDiffThread needs three sub routines, each of which can write a block of 16 bytes (or smaller) to the text file:

  • WriteBlockRemoved (if a block is not found, marked with a "-")
  • WriteBlockEqual (if a block is found)
  • WriteBlockAdded (if there are some bytes between two found blocks, marked with a "+")

These sub routines are so simple that I will not explain them in detail.

The important question is: How does the function DiffWriter.WriteDiffThread decide whether a block was found in File 2? This is where the function FindBlock is needed:

C#
private static Int32 FindBlock(Int32 nPos1,
                               Int32 nPos2,
                               Byte[] pFile1,
                               Byte[] pFile2,
                               Int32 nBytesToCompare,
                               Boolean bFoundFirstEqual)
{
    Int32 nStartOfEqualBlock2 = -1;

    Boolean bFound = false;
    Int32 i, j, nEqualBytes;

    for (i = nPos2; i < pFile2.Length; i++)
    {
        // Find block at position i:
        nEqualBytes = 0;
        for (j = i; j < (i + nBytesToCompare); j++)
        {
            if (j < pFile2.Length)
            {
                if (pFile2[j] == pFile1[nPos1 + nEqualBytes])
                {
                    nEqualBytes++;
                    if (nEqualBytes == nBytesToCompare)
                    {
                        bFound = true;
                        break;
                    }
                }
                else
                {
                     break;
                }
            }
            else
            {
                break;
            }
        }

        if (bFound) // Found block at position i.
        {
             nStartOfEqualBlock2 = i;
             break;
        }

        // If we have searched 1KB in advance,
        // and didn't find the block,
        // we declare it as "not found".
        // This way, we don't declare a very large block as "added"
        // if the block appears very much later in the file.
        // However, we search THE WHOLE FILE until we found the FIRST EQUAL BLOCK.
        if (bFoundFirstEqual && (i >= (nPos2 + 1024)))
        {
            break;
        }
    }

    return nStartOfEqualBlock2;
}

As you can see, this function takes six parameters:

  • nPos1 ... The position in File 1 where the block that shall be compared begins.
  • nPos2 ... The position in File 2 where the search for an equal block shall begin.
  • pFile1 ... A byte array that contains all bytes of File 1.
  • pFile2 ... A byte array that contains all bytes of File 2.
  • nBytesToCompare ... The number of bytes contained in the block that shall be compared (must be > 0).
  • bFoundFirstEqual ... A flag that specifies whether at least 1 equal block has already been found in an earlier call of FindBlock.

The function loops through File 2 in steps of 1 byte, and compares nBytesToCompare bytes, starting at the loop counter i. If all nBytesToCompare bytes are equal, the block has been found at position i. In this case, the loop ends, and the position is returned. 

If the flag bFoundFirstEqual is set, the loop will end as soon as more than 1KB data have been searched and no equal block was found. This means: After the first block has been found, the program will only search 1KB in advance for the next block, and declare the block as "not found" ("-") if it appears later than 1KB after the previous found block. This way, it does not declare a very large amount of bytes as "added" ("+") if the block being searched for is found very much later in File 2.

The function DiffWriter.WriteDiffThread loops through File 1 in steps of 16 bytes, and always reads 1 block of maximum 16 bytes. Each block is searched with a call to FindBlock. If the block has been found, the position nPos2 will be moved to 1 byte after the found block, so no block will be found twice.

Depending on the results of FindBlock, the functions WriteBlockRemoved, WriteBlockEqual and WriteBlockAdded will be called. If more than 16 bytes have been added, WriteBlockAdded will be called more than once. 

When all blocks of File 1 have been searched, the remaining bytes of File 2 must be written. This is also done with calls to WriteBlockAdded, beginning with nPos2, which is 1 byte after the last found block. 

Now both files have been processed completely.

Limitations of BinDiff  

BinDiff cannot compare files larger than 2GB. Notice that both files are loaded into the RAM before comparison starts. BinDiff will only work if file2 does not contain a sequence of added bytes that is larger than 1KB. If there is a sequence > 1KB added in one piece, the rest of File 2 will also be marked as "added" ("+") in the text file.

Thread Synchronization 

The class MainWindow does all GUI stuff, and the class DiffWriter does the comparison of File 1 and File 2.

If the user clicks the Start button, the function DiffWriter.StartWriteDiff is executed. It opens the files (if possible), and starts a thread called WriteDiffThread. This thread compares File 1 and File 2, writes the results to the text file, and raises the event DiffWriter.ThreadEnded when ready.

If the user clicks the Stop button, or closes the application, the function DiffWriter.Stop is executed. It stops the thread by setting the variable m_bStop = true. The thread realizes the change of m_bStop, and stops all loops immediately.

Event handling in MainWindow: The event handler DiffWriter_ThreadEnded shall cause the GUI to enable/disable buttons and textboxes, but it cannot access the GUI elements directly, because it's executed within a thread other than the GUI thread. Therefore, a simple workaround is used:   

C#
private delegate void ThreadEndedSyncHandler();
private ThreadEndedSyncHandler ThreadEnded;

private void MainWindow_Load(object sender, EventArgs e)
{
    // Subscribe for my own synchronous delegate
    // that will be invoked when the asynchronous event "DiffWriter.ThreadEnded" is raised:
    this.ThreadEnded += this_ThreadEnded;

    // Subscribe for an asynchronous event
    // that will be raised when the DiffWriter's thread has ended:
    DiffWriter.ThreadEnded += DiffWriter_ThreadEnded;
}

private void DiffWriter_ThreadEnded()
{
    if (!m_closing)
    {
        this.Invoke(ThreadEnded);
    }
}

private void this_ThreadEnded()
{
    // Enable all buttons except stop button:
    EnableButtons();
}

The event handler DiffWriter_ThreadEnded invokes a delegate called ThreadEnded, and the handler this_ThreadEnded is subscribed for this delegate. The method Form.Invoke does the thread synchronization stuff automatically, so the GUI elements can be accessed inside this_ThreadEnded

Points of Interest

You can use this program to quickly compare large binary files that have many small differences.

History

  • December 16, 2012
    • Published at CodeProject.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)