Introduction
I needed a simple tool to compare two binary files, especially focusing on small sequences of added or removed bytes. The main purpose was to detect changes in an audio transmission, resulting from the clock drift of several gateways the audio stream was passed through.
Most binary file comparison tools (like for example "BinCoMerge") are very slow when it's about comparing large files that have many small differences where bytes are added or removed. So I needed something quicker. Furthermore, I wanted to save the results of each comparison in a separate text file, for later analysis. That's why I wrote this program.
What it does
BinDiff reads File 1 in blocks of 16 bytes, and tries to find each block in File 2. For each block, the resulting text file will contain a line that shows whether these 16 bytes were found, and if so, where they were found. The idea of this procedure is that the bytes of File 1 will be displayed like in a normal hex editor, with 16 bytes per line, whereas the bytes of File 2 will be moved in a way so the user can easily compare File 1 and File 2. Additionally, the resulting text file will contain lines that show bytes that were added in File 2 (either before the first found block, or between
two found blocks, or after the last found block).
Example
File 1:
00000000 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
00000010 17 23 AC F7 59 DE FF DD 5C 74 DF A7 4B D9 58 26
00000020 10 11 12 13 14 15
File 2:
00000000 FF 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E
00000010 0F 57 58 59 17 23 AC F7 59 DE FF DD 5C 74 DF A7
00000020 4B D9 58 26 AA BB CC DD EE
Diff:
+ 00000000 FF
00000000 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F 00000001 00 01 02 03 04 05 ..
+ 00000011 57 58 59
00000010 17 23 AC F7 59 DE FF DD 5C 74 DF A7 4B D9 58 26 00000014 17 23 AC F7 59 DE ..
- 00000020 10 11 12 13 14 15
+ 00000024 AA BB CC DD EE
Compare Algorithm
If a block is not found, it is marked with a "-", and written on the left side. If a block is found, it is not marked, and written on both sides, with respect to the different file positions. If there are some bytes between the previous found block and this found block, they are marked with a "+", and written on the right side. The same will happen for bytes before the first found block and after the last found block.
As a result of these requirements, the function DiffWriter.WriteDiffThread
needs
three sub routines, each of which can write a block of 16 bytes (or smaller) to the text file:
WriteBlockRemoved
(if a block is not found, marked with a
"-") WriteBlockEqual
(if a block is found) WriteBlockAdded
(if there are some bytes between two found blocks, marked with a "+")
These sub routines are so simple that I will not explain them in detail.
The important question is: How does the function DiffWriter.WriteDiffThread
decide whether a block was found in File 2?
This is where the function FindBlock
is needed:
private static Int32 FindBlock(Int32 nPos1,
Int32 nPos2,
Byte[] pFile1,
Byte[] pFile2,
Int32 nBytesToCompare,
Boolean bFoundFirstEqual)
{
Int32 nStartOfEqualBlock2 = -1;
Boolean bFound = false;
Int32 i, j, nEqualBytes;
for (i = nPos2; i < pFile2.Length; i++)
{
nEqualBytes = 0;
for (j = i; j < (i + nBytesToCompare); j++)
{
if (j < pFile2.Length)
{
if (pFile2[j] == pFile1[nPos1 + nEqualBytes])
{
nEqualBytes++;
if (nEqualBytes == nBytesToCompare)
{
bFound = true;
break;
}
}
else
{
break;
}
}
else
{
break;
}
}
if (bFound)
{
nStartOfEqualBlock2 = i;
break;
}
if (bFoundFirstEqual && (i >= (nPos2 + 1024)))
{
break;
}
}
return nStartOfEqualBlock2;
}
As you can see, this function takes six parameters:
nPos1
... The position in File 1 where the block that shall be compared begins. nPos2
... The position in File 2 where the search for an equal block shall begin. pFile1
... A byte array that contains all bytes of File 1. pFile2
... A byte array that contains all bytes of File 2. nBytesToCompare
... The number of bytes contained in the block that shall be compared (must be > 0). bFoundFirstEqual
... A flag that specifies whether at least 1 equal block has already been found in an earlier call of
FindBlock
.
The function loops through File 2 in steps of 1 byte, and compares nBytesToCompare
bytes, starting at the loop counter i
. If all nBytesToCompare
bytes are equal, the block has been found at position i
. In this case, the loop ends, and the position is returned.
If the flag bFoundFirstEqual
is set, the loop will end as soon as more than 1KB data have been searched and no equal block was found. This means: After the first block has been found, the program will only search 1KB in advance for the next block, and declare the block as "not found" ("-") if it appears later than 1KB after the previous found block. This way, it does not declare a very large amount of bytes as "added" ("+") if the block being searched for is found very much later in File 2.
The function DiffWriter.WriteDiffThread
loops through File 1 in steps of 16 bytes, and always reads 1 block of maximum 16 bytes. Each block is searched with a call to FindBlock
. If the block has been found, the position
nPos2
will be moved to 1 byte after the found block, so no block will be found twice.
Depending on the results of FindBlock
, the functions WriteBlockRemoved
, WriteBlockEqual
and WriteBlockAdded
will be called.
If more than 16 bytes have been added, WriteBlockAdded
will be called more than once.
When all blocks of File 1 have been searched, the remaining bytes of File 2 must be written. This is also done with calls to WriteBlockAdded
, beginning with nPos2
, which is 1 byte after the last found block.
Now both files have been processed completely.
Limitations of BinDiff
BinDiff cannot compare files larger than 2GB. Notice that both files are loaded into the RAM before comparison starts. BinDiff will only work if file2 does not contain a sequence of added bytes that is larger than 1KB. If there is a sequence > 1KB added in one piece, the rest of File 2 will also be marked as "added" ("+") in the text file.
Thread Synchronization
The class MainWindow
does all GUI stuff, and the class DiffWriter
does the comparison of File 1 and File 2.
If the user clicks the Start button, the function DiffWriter.StartWriteDiff
is executed. It opens the files (if possible), and starts a thread called WriteDiffThread
. This thread compares File 1 and File 2, writes the results to the text file, and raises the event DiffWriter.ThreadEnded
when ready.
If the user clicks the Stop button, or closes the application, the function DiffWriter.Stop
is executed. It stops the thread by setting the variable m_bStop = true
. The thread realizes the change of m_bStop
, and stops all loops immediately.
Event handling in MainWindow
: The event handler DiffWriter_ThreadEnded
shall cause the GUI to enable/disable buttons and textboxes, but it cannot access the GUI elements directly, because it's executed within a thread other than the GUI thread. Therefore, a simple workaround is used:
private delegate void ThreadEndedSyncHandler();
private ThreadEndedSyncHandler ThreadEnded;
private void MainWindow_Load(object sender, EventArgs e)
{
this.ThreadEnded += this_ThreadEnded;
DiffWriter.ThreadEnded += DiffWriter_ThreadEnded;
}
private void DiffWriter_ThreadEnded()
{
if (!m_closing)
{
this.Invoke(ThreadEnded);
}
}
private void this_ThreadEnded()
{
EnableButtons();
}
The event handler DiffWriter_ThreadEnded
invokes a delegate called ThreadEnded
, and the handler this_ThreadEnded
is subscribed for this delegate. The method Form.Invoke
does the thread synchronization stuff automatically, so the GUI elements can be accessed
inside this_ThreadEnded
.
Points of Interest
You can use this program to quickly compare large binary files that have many small differences.
History
- December 16, 2012
- Published at CodeProject.