PIEBALDdiff

PIEBALDconsult

3.95/5 (15 votes)

21 Sep 2009CPOL9 min read

37.2K

150

A line-by-line file diff utility

Download source and demo - 12.62 KB

Introduction

I originally wrote this code and article for the Code Lean and Mean[^] contest. The code presented in the first version of this article was specialized for working with strings (lines of text files). This version of the article presents a generic diff engine that can be used with characters, strings, or just about any other type. The included demo program still does a line-by-line diff of two text files.

Theory of this Algorithm

I began this project with the following postulates:

A file diff is a sequence of copy operations and edit operations.
The optimal edit distance is achieved by the longest copy operations and the shortest edit operations.
Every item (character or line) of the source and target must be accounted for.
Every source item must be copied or deleted.
Every target item must be copied or inserted.
Any sequence of copy operations can be reduced to one copy operation.
Any sequence of delete operations can be reduced to one delete operation.
Any sequence of insert operations can be reduced to one insert operation.
Any sequence of delete and insert operations can be reduced to one replace operation.
The order of delete and insert operations within a replace operation is not significant.
All possible sequences of delete and insert operations that describe a particular edit operation are equivalent.
While seeking the optimal edit operation, we need track only the number of delete and insert operations.

As we'll see, it's these last two postulates that yield the greatest savings in processing.

Two details of the implementation that are also very important are:

The use of a breadth-first search for the end of an edit section.
The desire to find the most-diagonal path by preferring replace operations.

First Things First

LibExt.Lines

I began with developing a character-by-character solution. This allowed me to test with strings, but I knew the eventual solution would need to work with text files. My first goal, then, was to enable my code to work with either strings or text files without needing to know the details. I knew that System.String implements the System.Collections.Generic.IEnumerable<char> interface, but I didn't find a file reader that did, so I wrote my own enumerator to do it. I then did likewise for System.Collections.Generic.IEnumerable<string>. Here are the Extension Methods that I use to enumerate the lines in a text file:

namespace PIEBALD.Lib.LibExt.Lines
{
    public static partial class LibExt
    {
        public static System.Collections.Generic.IEnumerable<string />
        Lines
        (
            this System.IO.FileInfo File
        )
        {
            return ( File.OpenRead().Lines() ) ;
        }

        public static System.Collections.Generic.IEnumerable<string />
        Lines
        (
            this System.IO.FileStream Stream
        )
        {
            return ( (new System.IO.StreamReader ( Stream )).Lines() ) ;
        }

        public static System.Collections.Generic.IEnumerable<string />
        Lines
        (
            this System.IO.StreamReader Stream
        )
        {
            string current ;

            while ( ( current = Stream.ReadLine() ) != null )
            {
                yield return ( current ) ;
            }

            yield break ;
        }
    }
}

ItemBuffer

The next order of business was to implement a way to buffer the items read from the enumerator. The buffer needed to hold only the items currently under consideration, once a decision was made, the old items could be removed. As with the above code, I first wrote a char-based buffer (using a StringBuilder), then used it as the basis for the following class:

public class ItemBuffer<T> : System.IDisposable
{
    protected readonly object pickle ;
 
    protected System.Collections.Generic.IEnumerator<T> source ;
 
    protected System.Collections.Generic.List<T> buffer ;
 
    public ItemBuffer
    (
        System.Collections.Generic.IEnumerable<T> Source
    )
    {
        if ( Source == null )
        {
            throw ( new System.ArgumentNullException ( "Source" , "You must provide a source" ) ) ;
        }
 
        this.pickle = new object() ;
 
        this.source = Source.GetEnumerator() ;
 
        this.buffer = new System.Collections.Generic.List<T>() ;
 
        return ;
    }
 
    public virtual int
    Count
    {
        get
        {
            lock ( this.pickle )
            {
                return ( this.buffer == null ? -1 : this.buffer.Count ) ;
            }
        }
    }
 
    public virtual bool
    TryGet
    (
        int   Index
    ,
        out T Value
    )
    {
        bool result = false ;
 
        Value = default(T) ;
 
        lock ( this.pickle )
        {
            if ( this.source != null )
            {
                while
                (
                    ( this.buffer.Count <= Index )
                &&
                    ( this.source.MoveNext() )
                )
                {
                    this.buffer.Add ( this.source.Current ) ;
                }
 
                if ( this.buffer.Count > Index )
                {
                    Value = this.buffer [ Index ] ;
 
                    result = true ;
                }
            }
        }
 
        return ( result ) ;
    }
 
    public virtual void
    Clear
    (
    )
    {
        lock ( this.pickle )
        {
            this.buffer.Clear() ;
        }
 
        return ;
    }
 
    public virtual System.Collections.Generic.IEnumerable<T>
    GetRange
    (
        int StartIndex
    ,
        int Length
    )
    {
        lock ( this.pickle )
        {
            return ( this.buffer.GetRange ( StartIndex , Length ).AsReadOnly() ) ;
        }
    }
 
    public virtual void
    RemoveRange
    (
        int StartIndex
    ,
        int Length
    )
    {
        lock ( this.pickle )
        {
            this.buffer.RemoveRange ( StartIndex , Length ) ;
        }
 
        return ;
    }
}

2009-09-20: Unsealed, changed the fields to protected, and changed the methods to virtual. This allows the following char-specialized subclass:

CharBuffer

This is a char-specialized ItemBuffer. It uses a StringBuilder rather than a List<char> to buffer the characters. It may or may not offer improved performance, but GetRange returns a string so I find it easier to use.

public sealed class CharBuffer : PIEBALD.Types.ItemBuffer<char>
{
    private System.Text.StringBuilder builder ;
 
    public CharBuffer
    (
        System.Collections.Generic.IEnumerable<char> Source
    )
    : base
    (
        Source
    )
    {
        this.buffer = null ;
 
        this.builder = new System.Text.StringBuilder() ;
 
        return ;
    }
 
    public override int
    Count
    {
        get
        {
            lock ( this.pickle )
            {
                return ( this.builder == null ? -1 : this.builder.Length ) ;
            }
        }
    }
 
    public override bool
    TryGet
    (
        int Index
    ,
        out char Value
    )
    {
        bool result = false ;
 
        Value = default(char) ;
 
        lock ( this.pickle )
        {
            if ( this.source != null )
            {
                while
                (
                    ( this.builder.Length <= Index )
                &&
                    ( this.source.MoveNext() )
                )
                {
                    this.builder.Append ( this.source.Current ) ;
                }
 
                if ( this.builder.Length > Index )
                {
                    Value = this.builder [ Index ] ;
 
                    result = true ;
                }
            }
 
            return ( result ) ;
        }
    }
 
    public override void
    Clear
    (
    )
    {
        lock ( this.pickle )
        {
            this.builder.Length = 0 ;
        }
 
        return ;
    }
 
    public override System.Collections.Generic.IEnumerable<char>
    GetRange
    (
        int StartIndex
    ,
        int Length
    )
    {
        lock ( this.pickle )
        {
            return ( this.builder.ToString ( StartIndex , Length ) ) ;
        }
    }
 
    public override void
    RemoveRange
    (
        int StartIndex
    ,
        int Length
    )
    {
        lock ( this.pickle )
        {
            this.builder.Remove ( StartIndex , Length ) ;
 
            return ;
        }
    }
}

LimitedQueue

I also make use of my LimitedQueue[^]. The diff algorithm doesn't need to limit the number of items in the queue, but it does need to access the items by index, and LimitedQueue allows that (as part of the "and mean" requirement of the contest :-D ).

Other Supporting Characters

DiffSectionType

Each DiffSection has a type:

[System.FlagsAttribute()]
public enum DiffSectionType : byte
{
    Copy = 0
,
    Delete = 1
,
    Insert = 2
,
    Replace = 3
}

DiffSectionBase<T>

An abstract base class for the various types of diff operations; it only needs to hold the type of operation it represents and the content associated with it. This class was one of the stumbling blocks in achieving a pleasing generic solution. To be generic, I want it to hold List<T>, but when writing a specialized diff to perform character-by-character diffs, I want it to hold a string. I eventually decided to use IEnumerable<T> so I can store a List<T> or a string as appropriate. This has the added benefit of matching the type that is sent into the diff engine -- IEnumerable<T> in, IEnumerable<T> out.

public abstract class DiffSectionBase<T>
{
    protected DiffSectionBase
    (
        DiffSectionType                           Type
    ,
        System.Collections.Generic.IEnumerable<T> SourceContent
    ,
        System.Collections.Generic.IEnumerable<T> TargetContent
    )
    {
        this.Type          = Type          ;
        this.SourceContent = SourceContent ;
        this.TargetContent = TargetContent ;

        return ;
    }

    public DiffSectionType Type
    {
        get ;
        private set ;
    }

    public System.Collections.Generic.IEnumerable<T> SourceContent
    {
        get ;
        private set ;
    }

    public System.Collections.Generic.IEnumerable<T> TargetContent
    {
        get ;
        private set ;
    }
}

CopyDiffSection

The classes for the four types of diff operations are implemented like this:

public sealed class CopyDiffSection<T> : DiffSectionBase<T>
{
    public CopyDiffSection
    (
        System.Collections.Generic.IEnumerable<T> SourceContent
    )
    : base
    (
        DiffSectionType.Copy
    ,
        SourceContent
    ,
        null
    )
    {
        return ;
    }
}

DiffCandidate

While seeking the optimal edit operation, each sequence of deletes and inserts under consideration is held in one of these. As mentioned earlier, only the type of operation and the number of deletes and inserts needs to be held by each candidate. A great deal of memory and processing is saved by storing a hash and using it to determine whether or not the candidate is unique. This should satisfy the "lean" requirement of the contest.

This class is also used for copy operations, but that's not as interesting.

This class limits the implementation to tracking copy/edit sections of up to System.Int16.MaxValue (32767) items. This ought to be enough for most diff operations and the application may hit an OutOfMemoryException long before it gets to that point. The current implementation does not protect against OutOfMemory situations, but if the limit of 32767 items is reached, the implementation will simply produce what it has and continue.

public sealed class DiffCandidate
{
    public const int Limit = System.Int16.MaxValue ;
 
    public readonly int Hash ;
 
    internal DiffCandidate
    (
        PIEBALD.Types.DiffSectionType Type
    ,
        int                           SourceIndex
    ,
        int                           TargetIndex
    )
    {
        this.Hash = (int) Type ;
        this.Hash <<= 15 ;
        this.Hash |= SourceIndex ;
        this.Hash <<= 15 ;
        this.Hash |= TargetIndex ;
 
        return ;
    }
 
    public PIEBALD.Types.DiffSectionType
    Type
    {
        get
        {
            return ( (PIEBALD.Types.DiffSectionType) ( ( this.Hash >> 30 ) & 3 ) ) ;
        }
    }
 
    public int
    SourceIndex
    {
        get
        {
            return ( ( this.Hash >> 15 ) & Limit ) ;
        }
    }
 
    public int
    TargetIndex
    {
        get
        {
            return ( this.Hash & Limit ) ;
        }
    }
}

2009-09-20: Changed from using ushort to int -- this avoids a bunch of casting later.

DiffEngine

This class implements the diff algorithm. It should work for any type T where T implements System.IComparable<T>; it definitely works for char and string.

The tracker field is used to hold the various edit sequences that are under consideration. It needs to be FIFO and indexable, hence the use of my LimitedQueue.

The Truncate method is used after a decision has been made about a section of the diff, it prepares the fields for the next section.

public sealed class DiffEngine<T> : System.IDisposable
where T : System.IComparable<T>
{
    [System.FlagsAttribute()]
    private enum EngineState : byte
    {
        None = 0
    ,
        ReadSource = 1
    ,
        ReadTarget = 2
    ,
        ReadBoth = 3
    ,
        Equal = 4 + ReadBoth
    ,
        GotCandidate = 8
    }

    private readonly PIEBALD.Types.LimitedQueue<DiffCandidate> tracker ;

    private PIEBALD.Types.DiffSectionType type ;

    private readonly PIEBALD.Types.ItemBuffer<T> source ;
    private readonly PIEBALD.Types.ItemBuffer<T> target ;

    private int sourceindex ;
    private int targetindex ;

    private EngineState state ;

    private DiffEngine
    (
        PIEBALD.Types.ItemBuffer<T> Source
    ,
        PIEBALD.Types.ItemBuffer<T> Target
    )
    {
        this.tracker = new PIEBALD.Types.LimitedQueue<DiffCandidate>() ;

        this.source = Source ;
        this.target = Target ;

        this.Truncate() ;

        return ;
    }

    private bool
    HasCandidates
    {
        get
        {
            return ( this.tracker.Count > 0 ) ;
        }
    }

    private void
    Truncate
    (
    )
    {
        this.tracker.Clear() ;

        this.type = PIEBALD.Types.DiffSectionType.Copy ;

        this.source.RemoveRange ( 0 , this.sourceindex ) ;
        this.target.RemoveRange ( 0 , this.targetindex ) ;

        this.sourceindex = 0 ;
        this.targetindex = 0 ;

        return ;
    }

    ...
}

2009-09-20: Changed sourceindex and targetindex from ushort to int.

AddCandidate

This method will instantiate a new DiffCandidate, check to see if it's unique, and if so put it in the tracker. If the new candidate is not unique, it will be dropped.

private void
AddCandidate
(
    PIEBALD.Types.DiffSectionType Type
,
    int                           SourceIndex
,
    int                           TargetIndex
)
{
    PIEBALD.Types.DiffCandidate temp ;

    temp = new PIEBALD.Types.DiffCandidate
    (
        Type
    ,
        SourceIndex
    ,
        TargetIndex
    ) ;

    for ( int i = this.tracker.Count - 1 ; i >= 0 ; i-- )
    {
        if ( this.tracker [ i ].Hash == temp.Hash )
        {
            return ;
        }
    }

    this.tracker.Enqueue ( temp ) ;

    return ;
}

GetCandidate

This method retrieves the next candidate from the queue, gets the next pieces of text from the buffers, and determines whether or not the text matches.

private void
GetCandidate
(
)
{
    this.state = EngineState.None ;

    if ( this.HasCandidates )
    {
        PIEBALD.Types.DiffCandidate Candidate = this.tracker.Dequeue() ;

        this.type = Candidate.Type ;

        this.sourceindex = Candidate.SourceIndex ;
        this.targetindex = Candidate.TargetIndex ;

        this.state |= EngineState.GotCandidate ;
    }

    T sourcecontent ;

    if ( this.source.TryGet ( this.sourceindex , out sourcecontent ) )
    {
        this.state |= EngineState.ReadSource ;
    }

    T targetcontent ;

    if ( this.target.TryGet ( this.targetindex , out targetcontent ) )
    {
        this.state |= EngineState.ReadTarget ;
    }

    if
    (
        ( ( this.state & EngineState.ReadBoth ) == EngineState.ReadBoth )
    &&
        ( sourcecontent.CompareTo ( targetcontent ) == 0 )
    )
    {
        this.state |= EngineState.Equal ;
    }

    return ;
}

PromoteCandidate

This method is used to construct a DiffSectionBase<T> from a DiffCandidate once the candidate has been selected as the optimal edit.

The buffers are then truncated in preparation for the next diff section.

private PIEBALD.Types.DiffSectionBase<T>
PromoteCandidate
(
)
{
    PIEBALD.Types.DiffSectionBase<T> result = null ;

    switch ( this.type )
    {
        case PIEBALD.Types.DiffSectionType.Copy :
        {
            result = new PIEBALD.Types.CopyDiffSection<T>
            (
                this.source.GetRange ( 0 , this.sourceindex )
            ) ;

            break ;
        }

        case PIEBALD.Types.DiffSectionType.Delete :
        {
            result = new PIEBALD.Types.DeleteDiffSection<T>
            (
                this.source.GetRange ( 0 , this.sourceindex )
            ) ;

            break ;
        }

        case PIEBALD.Types.DiffSectionType.Insert :
        {
            result = new PIEBALD.Types.InsertDiffSection<T>
            (
                this.target.GetRange ( 0 , this.targetindex )
            ) ;

            break ;
        }

        case PIEBALD.Types.DiffSectionType.Replace :
        {
            result = new PIEBALD.Types.ReplaceDiffSection<T>
            (
                this.source.GetRange ( 0 , this.sourceindex )
            ,
                this.target.GetRange ( 0 , this.targetindex )
            ) ;

            break ;
        }
    }

    this.Truncate() ;

    return ( result ) ;
}

DoDiff

DoDiff controls the flow of the algorithm.

Get the next candidate from the queue and set the fields (there is no candidate on the first pass, but the fields are set properly by the constructor)
The state field will hold flags indicating whether or not there was a candidate, which buffers hold data, and whether or not the specified source and target texts match.
If the current candidate has hit the limit and there are no other candidates, then promote it. In either case, drop it and keep going.
Otherwise, act in accordance with the state:
1. If no text remains, we're done -- promote the final candidate (if any) and exit.
2. If we have only source text left, then we'll be producing delete operations from here on; but first, if the latest candidate is a copy, we need to promote it.
3. If we have only target text left, then we'll be producing insert operations from here on; but first, if the latest candidate is a copy, we need to promote it.
4. If we have text from both streams, but they don't match, then we need to consider both a delete and an insert operation; but first, if the latest candidate is a copy, we need to promote it.
5. If we have text from both streams, and they do match, then the next candidate will be a copy; but first, if the latest candidate is not a copy, we need to promote it.

private System.Collections.Generic.IEnumerable<PIEBALD.Types.DiffSectionBase<T>>
DoDiff
(
)
{
    while ( true )
    {
        this.GetCandidate() ;
 
        if
        (
            ( this.sourceindex == PIEBALD.Types.DiffCandidate.Limit )
        ||
            ( this.targetindex == PIEBALD.Types.DiffCandidate.Limit )
        )
        {
            if ( !this.HasCandidates )
            {
                yield return ( this.PromoteCandidate() ) ;
            }
 
            continue ;
        }
 
        switch ( this.state )
        {
            case EngineState.GotCandidate :
            {
                yield return ( this.PromoteCandidate() ) ;
 
                goto case EngineState.None ;
            }
 
            case EngineState.None :
            {
                this.Dispose() ;
 
                yield break ;
            }
 
            case EngineState.ReadSource | EngineState.GotCandidate :
            {
                if ( this.type == PIEBALD.Types.DiffSectionType.Copy )
                {
                    yield return ( this.PromoteCandidate() ) ;
                }
 
                goto case EngineState.ReadSource ;
            }
 
            case EngineState.ReadSource :
            {
                this.AddCandidate
                (
                    this.type | PIEBALD.Types.DiffSectionType.Delete
                ,
                    this.sourceindex + 1
                ,
                    this.targetindex
                ) ;
 
                break ;
            }
 
            case EngineState.ReadTarget | EngineState.GotCandidate :
            {
                if ( this.type == PIEBALD.Types.DiffSectionType.Copy )
                {
                    yield return ( this.PromoteCandidate() ) ;
                }
 
                goto case EngineState.ReadTarget ;
            }
 
            case EngineState.ReadTarget :
            {
                this.AddCandidate
                (
                    this.type | PIEBALD.Types.DiffSectionType.Insert
                ,
                    this.sourceindex
                ,
                    this.targetindex + 1
                ) ;
 
                break ;
            }
 
            case EngineState.ReadBoth | EngineState.GotCandidate :
            {
                if ( this.type == PIEBALD.Types.DiffSectionType.Copy )
                {
                    yield return ( this.PromoteCandidate() ) ;
                }
 
                goto case EngineState.ReadBoth ;
            }
 
            case EngineState.ReadBoth :
            {
                if ( this.sourceindex > this.targetindex )
                {
                    this.AddCandidate
                    (
                        this.type | PIEBALD.Types.DiffSectionType.Insert
                    ,
                        this.sourceindex
                    ,
                        this.targetindex + 1
                    ) ;
 
                    this.AddCandidate
                    (
                        this.type | PIEBALD.Types.DiffSectionType.Delete
                    ,
                        this.sourceindex + 1
                    ,
                        this.targetindex
                    )  ;
                }
                else
                {
                    this.AddCandidate
                    (
                        this.type | PIEBALD.Types.DiffSectionType.Delete
                    ,
                        this.sourceindex + 1
                    ,
                        this.targetindex
                    )  ;
 
                    this.AddCandidate
                    (
                        this.type | PIEBALD.Types.DiffSectionType.Insert
                    ,
                        this.sourceindex
                    ,
                        this.targetindex + 1
                    ) ;
                }
 
                break ;
            }
 
            case EngineState.Equal | EngineState.GotCandidate :
            {
                if ( this.type != PIEBALD.Types.DiffSectionType.Copy )
                {
                    yield return ( this.PromoteCandidate() ) ;
                }
 
                goto case EngineState.Equal ;
            }
 
            case EngineState.Equal :
            {
                this.AddCandidate
                (
                    PIEBALD.Types.DiffSectionType.Copy
                ,
                    this.sourceindex + 1
                ,
                    this.targetindex + 1
                ) ;
 
                break ;
            }
        }
    }
}

2009-09-20: No more casting of sourceindex and targetindex.
In case EngineState.ReadBoth, bias the addition of Delete and Insert candidates to favor replaces.

A Note about Replace-biasing

A difference section is a sequence of deletes and inserts. The order of the deletes and inserts is not important; the number of them is. On a Levenshtein table, an edit section with five deletes and three inserts will equate to three replaces and two deletes. If the number of deletes and inserts in a difference section are equal, then when mapped on a Levenshtein table it will be diagonal. If the number of deletes and inserts in a difference section are not equal, then it will depart from diagonal. The shortest edit distance will likely be close to diagonal, so we would prefer a more-diagonal edit sequence to a less-diagonal edit sequence. The first candidate that reaches a copy operation will be the one chosen. When we are adding both a delete and an insert to a DiffCandidate, we need to decide which to add first. By choosing to add an insert first when the candidate already has more deletes than inserts, or adding a delete first when the candidate already has more inserts than deletes, the result is that the resultant candidate that is more-diagonal will be evaluated (and perhaps promoted) before the one that is less-diagonal.

Diff

Diff is static. It handles the gruntwork of wrapping the documents in ItemBuffers, instantiating a DiffEngine to perform the diff, and kicking off the diff.

public static System.Collections.Generic.IEnumerable<PIEBALD.Types.DiffSectionBase<T>>
Diff
(
    System.Collections.Generic.IEnumerable<T> Source
,
    System.Collections.Generic.IEnumerable<T> Target
)
{
    DiffEngine<T> engine = new DiffEngine<T>
    (
        new PIEBALD.Types.ItemBuffer<T> ( Source )
    ,
        new PIEBALD.Types.ItemBuffer<T> ( Target )
    ) ;

    return ( engine.DoDiff() ) ;
}

If you choose to use the CharBuffer or your own customized ItemBuffer, you can use this overload of Diff to perform the Diff.

public static System.Collections.Generic.IEnumerable<PIEBALD.Types.DiffSectionBase<T>>
Diff
(
    PIEBALD.Types.ItemBuffer<T> Source
,
    PIEBALD.Types.ItemBuffer<T> Target
)
{
    return ( ( new DiffEngine<T>
    (
        Source
    ,
        Target
    ) ).DoDiff() ) ;
}

PIEBALDdiff

Which brings us to the demo app. PIEBALDdiff is a console application; it takes the names of two files as parameters. If both files exist, they get wrapped in string enumerators which get passed to the DiffEngine<string>. The resulting DiffSection<string> enumerator gets enumerated and the results displayed (with line numbers!).

namespace PIEBALD
{
    using PIEBALD.Lib.LibExt.Lines ;

    public static class PIEBALDdiff
    {
        [System.STAThreadAttribute()]
        public static int
        Main
        (
            string[] args
        )
        {
            int result = 0 ;

            try
            {
                if ( args.Length > 1 )
                {
                    System.IO.FileInfo f1 = new System.IO.FileInfo ( args [ 0 ] ) ;
                    System.IO.FileInfo f2 = new System.IO.FileInfo ( args [ 1 ] ) ;

                    if ( f1.Exists && f2.Exists )
                    {
                        int sline = 0 ;
                        int tline = 0 ;

                        long bws = System.Diagnostics.Process.GetCurrentProcess().PeakWorkingSet64 ;

                        System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch() ;

                        sw.Start() ;

                        foreach
                        (
                            PIEBALD.Types.DiffSectionBase<string> section
                        in
                            PIEBALD.Types.DiffEngine<string>.Diff
                            (
                                f1.Lines()
                            ,
                                f2.Lines()
                            )
                        )
                        {
                            switch ( section.Type )
                            {
                                case PIEBALD.Types.DiffSectionType.Copy :
                                {
                                    foreach ( string s in section.SourceContent )
                                    {
                                        ++sline ;
                                        ++tline ;
                                    }

                                    break ;
                                }

                                case PIEBALD.Types.DiffSectionType.Delete :
                                {
                                    System.Console.WriteLine ( "*****Delete******" ) ;

                                    foreach ( string s in section.SourceContent )
                                    {
                                        System.Console.WriteLine ( "{0,5}:{1}" , ++sline , s ) ;
                                    }

                                    System.Console.WriteLine ( "*****************\r\n" ) ;

                                    break ;
                                }

                                case PIEBALD.Types.DiffSectionType.Insert :
                                {
                                    System.Console.WriteLine ( "*****Insert******" ) ;

                                    foreach ( string s in section.TargetContent )
                                    {
                                        System.Console.WriteLine ( "{0,5}:{1}" , ++tline , s ) ;
                                    }

                                    System.Console.WriteLine ( "*****************\r\n" ) ;

                                    break ;
                                }

                                case PIEBALD.Types.DiffSectionType.Replace :
                                {
                                    System.Console.WriteLine ( "*****Replace*****" ) ;

                                    foreach ( string s in section.SourceContent )
                                    {
                                        System.Console.WriteLine ( "{0,5}:{1}" , ++sline , s ) ;
                                    }

                                    System.Console.WriteLine ( "*****With********" ) ;

                                    foreach ( string s in section.TargetContent )
                                    {
                                        System.Console.WriteLine ( "{0,5}:{1}" , ++tline , s ) ;
                                    }

                                    System.Console.WriteLine ( "*****************\r\n" ) ;

                                    break ;
                                }
                            }
                        }

                        sw.Stop() ;

                        long pws = System.Diagnostics.Process.GetCurrentProcess().PeakWorkingSet64 ;

                        System.Console.WriteLine ( "Elapsed          : {0}" , sw.Elapsed ) ;
                        System.Console.WriteLine
                        (
                            "PeakWorkingSet64 : {0} ({1})"
                        ,
                            pws
                        ,
                            pws - bws
                        ) ;
                    }
                    else
                    {
                        System.Console.WriteLine ( "At least one of the specified files does not exist." ) ;
                    }
                }
                else
                {
                    System.Console.WriteLine ( "Syntax: PIEBALDdiff source target" ) ;
                }
            }
            catch ( System.Exception err )
            {
                System.Console.WriteLine ( err ) ;
            }

            return ( result ) ;
        }
    }
}

(The included version also gives a summary of the difference sections.)

History

2009-08-28

First submitted

2009-09-17

Updated to present the generic version of the algorithm

2009-09-20

Altered the interface of DiffCandidate to use int rather than ushort (to avoid a bunch of casting)
Biased the adding of delete and insert candidates to favor replaces
Altered ItemBuffer so it can be derived
Added an overload of Diff that can take customized ItemBuffers

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)