Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / VB

LCS based generic diff in VB.NET

4.69/5 (5 votes)
30 Sep 2008CPOL2 min read 1   206  
How to create a diff script for a list of objects in VB.NET.

Introduction

This article aims to illustrate how to programmatically determine the differences between two arbitrary lists of objects, then interpret the list of differences to create a "diffscript" - a script containing a list of operations to be performed on the original list to end up with the destination list.

Background

In my 22 years of developing, I have come across many situations where I recognized a source and destination being updated. I have been fantasizing about how a differential update would be far more efficient than overwriting the entire destination.

Developing software is a way for me to completely get to understand (grok) the intricacies of a certain situation or problem. Writing the attached source code allowed me to understand the diff algorithm and how and why it works. In my investigation, I used this excellent website CodeProject and articles from various authors, as well as information readily available from Wikipedia.

Using the code

The code attached contains a set of Generic classes that work together.

The way to invoke the diff-script engine, which creates a diff-script, is as follows:

VB
Dim a as string() = {"a", "b", "c"}
Dim b as string() = {"b", "c", "a"}
Dim script As DiffScript(Of String)
script = DiffEngine(Of String).MakeDiffScript(a, b)

This creates a diff-script containing one modification entry, a move operation (of the "a" item) to the position after c (3).

How it works

The base algorithm is the Longest Consecutive Sequence (LCS) algorithm as described in other articles on this site and Wikipedia.

I have included an index mechanism that will convert the list of arbitrary items to a list of integers to compare. The index allows the integers to then be re-mapped to the original values, if needed.

The index mechanism also compares the start of both lists and the end, to see if they are equal. The LCS is a fairly intensive algorithm, and will only be performed on the "block" of items that is different between the start and the end that are the same.

Points of interest

There is some test code in the DiffTestCode and IndexTestCode classes, that you will not need in your implementation.

The attached code uses a modified version of the LCS algorithm. The only modification made is in the backtracking method. Where the original algorithm always favors a remove or add operation (resulting in ---+++), I modified it so that it will favor a + operation on the same location before another - operation. This modification makes it a lot simpler to identify replacement operations.

The source code is heavily commented, please review the comments in the source for more information.

History

This is version 1 of the article. I do not expect the code to change a whole lot, but you never know. I do intent to expand this article when I have some time; for now, most comments reside in the source itself.

Feel free to implement the code in your own project, and send me some feedback :P

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)