Introduction
This is a candidate entry for The Code Project's Lean and Mean competition to write an efficient text delta tool.
I was checking my emails while on vacation and I came across a Code Project daily mailing with a reminder for the LCS competition. I have enjoyed reading The Code Project articles for many years now, but until this moment I never had the opportunity to contribute.
When I first started with C# I decided to write a diff program in order to really learn the language. I call this program, DeltaScope. What started out as a simple console-based diff tool has gone through many iterations. When I finally stopped work on it, I had a reusable delta engine with GDI+ front-end controls.
Unfortunately, I have not touched this code since 2005. Thankfully my fiancé is very understanding of my “coding time” and I was able to take some time to write this article and bring the code up to C# 3.0 specifications.
Longest Common Subsequence
Many of the entries have excellent explanations of the LCS algorithm. When I wrote my DeltaScope I referred to this paper extensively. http://www.ics.uci.edu/~eppstein/161/960229.html[^]
I find the best way to understand the LCS algorithm, is to think of it like this:
The output of a delta operation is a sequence of instructions to transform file 1 into file 2.
Points of Interest
The core of the system is the DeltaEngine class. The DeltaScope application wraps up the engine in the DeltaScopeEngine class.
public class DeltaScopeEngine
{
public bool IgnoreCase { get; set; }
public bool IgnoreTabs { get; set; }
public TimeSpan ElapsedTime { get; private set; }
public long PeakMemoryUsage { get; private set; }
public List<DeltaBlock> Compare(string tsLeftFile, string tsRightFile)
{
var process = Process.GetCurrentProcess();
process.Refresh();
var peakWorkingSet64 = process.PeakWorkingSet64;
var timer = DateTime.Now;
var deltaEngine = new DeltaEngine();
var result = deltaEngine.GetDifferences(RipFile(tsLeftFile),
RipFile(tsRightFile), hashSideDelegate);
ElapsedTime = DateTime.Now - timer;
PeakMemoryUsage = process.PeakWorkingSet64 - peakWorkingSet64;
return result;
}
private static string RipFile(string tsFileName)
{
...
}
private List<DeltaString> hashSideDelegate(string sideValues)
{
const string blockLineTerminator = @"\r\n";
var values = new List<string>(Regex.Split(sideValues, blockLineTerminator));
var results = new List<DeltaString>();
for (var lnCnt = 0; lnCnt < values.Count; lnCnt++)
{
var lnHashCode = _GenerateHashFromString(values[lnCnt]);
if (lnHashCode >= 0)
results.Add(new DeltaString(values[lnCnt], lnHashCode));
}
return results;
}
private int _GenerateHashFromString(string tsLine)
{
var lnHash = 0;
var lnRealPtr = 0;
var lcaLine = IgnoreCase ?
tsLine.ToUpper().ToCharArray() : tsLine.ToCharArray();
var lnMaxLength = lcaLine.Length;
for (var lnCnt = 0; lnCnt < lnMaxLength; lnCnt++)
{
if (lcaLine[lnCnt] == '\t' && IgnoreTabs)
continue;
lnRealPtr++;
lnHash += ((byte)lcaLine[lnCnt] * lnRealPtr);
}
return (lnHash);
}
}
The hashAlgorithm delegate lets the developer control how each string is parsed into a list of DeltaStrings. In this simple example we can choose to make the difference case sensative or case insensative, and we can choose to ignore tab characters.
Timings and Memory Consumption
The average run timing is 13ms. I do not include any screen rendering in my timings. As for memory consumption, I always come up with 0 memory used. Which means I am either not using process.PeakWorkingSet64
correctly, or my code is magical. ;)
Future direction
Many diff tools allow the user to merge the 2 files into 1 new file. With the list of delta blocks that the DeltaEngine generates, it should be possible to allow the user to build a new file. This would require some sort of block choosing ability along with perhaps text editing. Unfortunately the current GDI+ controls are display only.
When I initially wrote this, GDI+ was a pretty new technology, and WPF did not yet exist. I think a good next step would be to use WPF as the display technology for the display. With WPF the swirl block control can be rewritten to include editing abilities.