Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

DIffer: a reusable C# diffing utility and class library

0.00/5 (No votes)
30 Apr 2005 1  
Flexible C# directory tree comparison utility.

Introduction

Differ is a new file "diff" utility written entirely in C#. Its internal text file difference algorithm is borrowed from DiffEngine, another project available on CodeProject.

Differ is different for several reasons. First, the code is organized so that the actual directory tree scan-and-diff algorithm is platform-independent. It could be used in console programs, GUI apps or services because it communicates all its results via event dispatch.

The second primary difference with Differ is that it accepts an XML parameter file that specifies types of files and directory locations that should be ignored. This is helpful for developers because you can specify that build products (.exe, .dll, etc.) can be ignored during the scan. In addition, the parameter file indicates which file extensions are to be considered as "text", which avoids a direct file scan. The default XML parameter file is included, and Differ will generate one internally if necessary. These "ignore" lists can be specified as static strings or regular expressions.

Finally, as an example of the usefulness of this approach, the Differ utility can (optionally) generate a standard Windows batch file that synchronizes the contents of the two directory trees. I use the xcopy, del, rmdir and attrib commands in the batch file.

Background

After trying to use several other diffing utilities (including CygWin ports) and filtering their results through PERL scripts, I found there were many subtle errors that could occur in the process. Also, many types of files could be safely ignored. Since I remotely maintain a website, I need to keep my development tree and the web server tree in sync, and this project was my answer to that problem.

To understand how the text difference sets in a single file are discovered, please reference the original DiffEngine article. The version of the code I'm using (included in the download) has only one minor modification from the DiffEngine article as posted (see below).

Using the demo

Unzip the file DifferDemo.zip into a directory on your normal path. Then type "differ -?" for a list of options. If you type "differ -p" you'll see the contents of the default "ignore" lists and text extension mappings from the XML file differParams.xml. The format is simple enough that it should be obvious.

The file differParams.xml is located by default in the same directory where Differ.exe and its DLLs live. Since these are .NET binaries, there is no need to perform "regsvr".

Using the code

Download and build Differ as-is using Visual Studio 2003. The solution is Differ.sln in the Differ directory.

You may extend the utility by editing DifferMain.cs. It contains all the main console output, display logic and error handling.

Alternatively, you may use the DifferCore class to create your own diff utility with whatever behaviors suit your environment. Since class DifferCore (and the underlying DiffEngine support) does not access the Console object, it could be embedded into a WinForms (GUI) application, a system service or even an ASPX page (depending upon security, of course). If you wish to do this, move DifferCore into its own DLL project and reference that in your application.

In the DifferProject ZIP file you'll find a directory called CodeComments. These HTML pages were generated by the auto-documentation function of Visual Studio 2003. If you navigate your browser to the file Solution_Differ.HTM in that directory you'll be able to examine documentation for the entire project. (Note that recent changes to IE security settings may make the page render incorrect.)

There are three projects in the downloadable VS2003 .NET "solution":

  • Differ. This contains the main and core modules for the Differ utility.
  • DifferenceEngine. This contains the original file diff logic from CodeProject.
  • ZipParams. This small project contains the file/directory "ignore" collections object and its XML serialization logic.

Briefly, the class DifferCommand is a command-line utility shell object that creates a DifferCore object, parameterizing it with the desired file or directory names. DifferCommand then calls the DifferCore's Execute method.

Here's the heart of the DifferCommand object:

//

//  Create the Differ object; parameterize it and attach event listeners

//

DifferCore dcore = new DifferCore( zpb, sLeft, sRight );
//  Indicate wheter we want files/dirs ignored or not

dcore.ObeyIgnored = bIgnore;
//  Hook the standard events

dcore.DifferBinaryNotify += new 
  Differ.DifferCore.DifferBinaryNotifyEvent(differNotificationBinary);
dcore.DifferDirectoryNotify += new 
  Differ.DifferCore.DifferDirectoryEvent(differNotificationDirectory);
dcore.DifferTextNotify += new 
  Differ.DifferCore.DifferTextNotifyEvent(differNotificationText);
dcore.DifferExceptionNotify += new 
  Differ.DifferCore.DifferExceptionNotifyEvent(differExceptionNotify);
//  If we're to show tracking info, attach to the event

if ( bTracking )
    dcore.DifferTrackNotify += new 
       Differ.DifferCore.DifferTrackNotifyEvent(differTrackNotify);
if ( bShowIgnore )
    dcore.DifferIgnoreNotify += new 
       Differ.DifferCore.DifferIgnoreNotifyEvent(differIgnoreNotify);
//  Perform the recursive diff search

em = dcore.Execute();

The single call to Execute returns an indication of whether the directories match or not, and this is then used to set the value returned to the Windows command shell. All other information and state changes are communicated by events generated by DifferCore. If an application doesn't need an event it should leave it "unhooked".

Points of interest

The binary 'diff' algorithm from DiffEngine was too slow for my needs so I created an alternative (trivial) match routine. You can force Differ to use the original algorithm from command line. However, the DifferCommand object will not display binary file differences even though it receives notification of them.

The only change I made to the current version of DiffEngine was to set its maximum text line length to 4096 and expose that value via an accessor.

The Differ project demonstrates the following major elements of .NET and C#:

  • XML serialization.
  • File I/O, directory and attribute handling.
  • Events and event dispatching.
  • DLL Import declarations for Windows functions.
  • Simple Regular Expression.
  • Exception handling.

History

  • Initial release.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here