Introduction
Differ is a new file "diff" utility written entirely in C#. Its internal text file difference algorithm is borrowed from DiffEngine, another project available on CodeProject.
Differ is different for several reasons. First, the code is organized so that the actual directory tree scan-and-diff algorithm is platform-independent. It could be used in console programs, GUI apps or services because it communicates all its results via event dispatch.
The second primary difference with Differ is that it accepts an XML parameter file that specifies types of files and directory locations that should be ignored. This is helpful for developers because you can specify that build products (.exe, .dll, etc.) can be ignored during the scan. In addition, the parameter file indicates which file extensions are to be considered as "text", which avoids a direct file scan. The default XML parameter file is included, and Differ will generate one internally if necessary. These "ignore" lists can be specified as static strings or regular expressions.
Finally, as an example of the usefulness of this approach, the Differ utility can (optionally) generate a standard Windows batch file that synchronizes the contents of the two directory trees. I use the xcopy
, del
, rmdir
and attrib
commands in the batch file.
Background
After trying to use several other diffing utilities (including CygWin ports) and filtering their results through PERL scripts, I found there were many subtle errors that could occur in the process. Also, many types of files could be safely ignored. Since I remotely maintain a website, I need to keep my development tree and the web server tree in sync, and this project was my answer to that problem.
To understand how the text difference sets in a single file are discovered, please reference the original DiffEngine article. The version of the code I'm using (included in the download) has only one minor modification from the DiffEngine article as posted (see below).
Using the demo
Unzip the file DifferDemo.zip into a directory on your normal path. Then type "differ -?" for a list of options. If you type "differ -p" you'll see the contents of the default "ignore" lists and text extension mappings from the XML file differParams.xml. The format is simple enough that it should be obvious.
The file differParams.xml is located by default in the same directory where Differ.exe and its DLLs live. Since these are .NET binaries, there is no need to perform "regsvr".
Using the code
Download and build Differ as-is using Visual Studio 2003. The solution is Differ.sln in the Differ directory.
You may extend the utility by editing DifferMain.cs. It contains all the main console output, display logic and error handling.
Alternatively, you may use the DifferCore
class to create your own diff utility with whatever behaviors suit your environment. Since class DifferCore
(and the underlying DiffEngine support) does not access the Console
object, it could be embedded into a WinForms (GUI) application, a system service or even an ASPX page (depending upon security, of course). If you wish to do this, move DifferCore
into its own DLL project and reference that in your application.
In the DifferProject ZIP file you'll find a directory called CodeComments. These HTML pages were generated by the auto-documentation function of Visual Studio 2003. If you navigate your browser to the file Solution_Differ.HTM in that directory you'll be able to examine documentation for the entire project. (Note that recent changes to IE security settings may make the page render incorrect.)
There are three projects in the downloadable VS2003 .NET "solution":
- Differ. This contains the main and core modules for the Differ utility.
- DifferenceEngine. This contains the original file diff logic from CodeProject.
- ZipParams. This small project contains the file/directory "ignore" collections object and its XML serialization logic.
Briefly, the class DifferCommand
is a command-line utility shell object that creates a DifferCore
object, parameterizing it with the desired file or directory names. DifferCommand
then calls the DifferCore
's Execute
method.
Here's the heart of the DifferCommand
object:
DifferCore dcore = new DifferCore( zpb, sLeft, sRight );
dcore.ObeyIgnored = bIgnore;
dcore.DifferBinaryNotify += new
Differ.DifferCore.DifferBinaryNotifyEvent(differNotificationBinary);
dcore.DifferDirectoryNotify += new
Differ.DifferCore.DifferDirectoryEvent(differNotificationDirectory);
dcore.DifferTextNotify += new
Differ.DifferCore.DifferTextNotifyEvent(differNotificationText);
dcore.DifferExceptionNotify += new
Differ.DifferCore.DifferExceptionNotifyEvent(differExceptionNotify);
if ( bTracking )
dcore.DifferTrackNotify += new
Differ.DifferCore.DifferTrackNotifyEvent(differTrackNotify);
if ( bShowIgnore )
dcore.DifferIgnoreNotify += new
Differ.DifferCore.DifferIgnoreNotifyEvent(differIgnoreNotify);
em = dcore.Execute();
The single call to Execute
returns an indication of whether the directories match or not, and this is then used to set the value returned to the Windows command shell. All other information and state changes are communicated by events generated by DifferCore
. If an application doesn't need an event it should leave it "unhooked".
Points of interest
The binary 'diff' algorithm from DiffEngine was too slow for my needs so I created an alternative (trivial) match routine. You can force Differ to use the original algorithm from command line. However, the DifferCommand
object will not display binary file differences even though it receives notification of them.
The only change I made to the current version of DiffEngine was to set its maximum text line length to 4096 and expose that value via an accessor.
The Differ project demonstrates the following major elements of .NET and C#:
- XML serialization.
- File I/O, directory and attribute handling.
- Events and event dispatching.
- DLL Import declarations for Windows functions.
- Simple Regular Expression.
- Exception handling.
History