Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / C#

A Reusable Prefix Tree using Generics in C# 2.0

3.86/5 (6 votes)
26 Jul 2006CPOL5 min read 2   466  
An implementation of the Prefix Tree data structure using generics

Update

Look at Sébastien’s 2nd comment for a cleaner implementation. I used this as part of another algorithm that needed it. You probably want to use his solution if you only need to store and retrieve. I apologize for the confusion.

Introduction

Have you ever run into a case where you needed to store a key value pair but your key was a list of some sort? I know I have. Most of the built in collections in C# are good enough if you want to store even a complex object, but when it comes to lists of complex objects (or even an int[]), nothing seems to fit the bill. Enter the Prefix Tree. This implementation will allow you to store all sorts of interesting lists of objects in no time. I also make use of two new features of .NET 2.0: Generics and yield return.

The Data Structure

Let's first examine the use of the structure itself since that’s why you are here.
Let's say we have an int[] and we want to make that our key. What we might think of doing is this:

C#
List<int[]> arr = new List<int[]>();
arr.Add(new int[] { 1, 2, 3 });
bool b = arr.Contains(new int[] { 1, 2, 3 });

But when we do this, we get b = false in fact if we try:

C#
SortedList<int[], int> arr = new SortedList<int[], int>();
arr.Add(new int[] { 1, 2, 3 },1);
bool b = arr.ContainsKey(new int[] { 1, 2, 3 });

We get the exception:

System.InvalidOperationException{"Failed to compare two elements in the array."}

Here is the problem. Under the covers int[] is actually a class called System.Array which knows nothing about how an int should be compared. To get around this, we can do this:

C#
PrefixTree<int,int> arr = new PrefixTree<int,int>();
arr.Add(new int[] { 1, 2, 3 },1);
bool b = arr.ContainsKey(new int[] { 1, 2, 3 });

When we do this, we get b = true and that is what we are looking for in a collection.

What we are doing is creating a structure that knows we are dealing with an array of objects, not the single object Array. In this way, we can let C# do its magic on comparing individual objects.

A look at Generics

Lets take a look at the class definition for a second.

C#
public class PrefixTree<TListKey, TValue> : 
IEnumerable<KeyValuePair<IList<TListKey>, TValue>> 
where TListKey : IComparable<TListKey>

Ouch. What does all this mean? Well the public class PrefixTree part should be fairly straight forward so moving on to the <TListKey, TValue>.
<TListKey, TValue> instructs .NET to allow us to put any type (reference Or value) in place of them when declaring our class like so.

C#
PrefixTree mytree = new PrefixTree<char, string>();

This means that all of our code will be strongly typed by the compiler at compile time. This in my experience means fewer bugs in running systems. And these are usually the ones that are really bad. They pass QA since they didn’t get every concealable meshing of silliness.
: IEnumerable<KeyValuePair<IList<TListKey>, TValue>> is new to C#2.0 in addition to having the System.Collections namespace there is also the System.Collections.Generic namespace. This contains System.Collections.Generic.IEnumerable<T>. It's just like our friend System.Collections.IEnumerable except that it is strongly typed. Also note how we can have nested generic declarations and even an interface that can be generic IList<TListKey>.

Finally, we have a new keyword where. where TListKey : IComparable<TListKey> enforces a “constraint” on our use of generics. What this means is that when we substitute in a type for the generic TListKey it must implement the interface IComparable<TListKey> if it doesn’t the compiler will let us know. This way we don’t have to wait till the first object is added to the data structure before getting an error.

You can also put a lot of other constraints on what your generics are allowed to do and not do. More information can be found from Microsoft here: Constraints on Type Parameters (C# Programming Guide).

A Look at Yield Return

The yield return is an odd kind of thing. Its intent was to simplify code so when we have a dataset and we want to go over it with a foreach statement, it should be as easy as possible. The idea is that when you have a flat list any one can write an iterator keeping the position is simple. However, then you have a hierarchical data structure, even if it is fairly easy to walk, it might be really hard to write an iterator for that, especially across all those recursive calls. So here’s the trick: what you do is make a private IEnumerable<T> that has a recursive parameter. Then all you do is spit out the current node on the walk with the yield return and call the recursive call. It's that simple. There is an advanced gotcha though. IEnumerable<T> wants you to implement the function IEnumerator<T> GetEnumerator(). But, you can't just stick a IEnumerator in a foreach loop like you want to so that you can do recursion, but what you can do is this:

C#
public IEnumerator<KeyValuePair<IList<TListKey>, TValue>> GetEnumerator()
{
    return GetEnumeratorHelper(new LinkedList<TListKey>(),root).GetEnumerator();
}
C#
private IEnumerable<KeyValuePair<IList<TListKey>, TValue>> 
	GetEnumeratorHelper(LinkedList<TListKey> keylist,
 PrefixTreeNode<TListKey, TValue> node)
{
    keylist.AddLast(node.key);
    if (node.hasvalue)
        yield return new KeyValuePair<IList<TListKey>, TValue>
        (BuildIList(keylist), node.value);


    foreach (KeyValuePair<TListKey, PrefixTreeNode<TListKey, TValue>> 
    kvp1 in node.childern)
        foreach (KeyValuePair<IList<TListKey>, TValue> kvp2 in 
        GetEnumeratorHelper(keylist,kvp1.Value))
            yield return new KeyValuePair<IList<TListKey>, TValue>
           (BuildIList(keylist), kvp2.Value);
    keylist.RemoveLast();
}

Notice how the top call is IEnumerator and the recursive call is IEnumerable? Since IEnumerable<T> has a method called GetEnumerator() the result we want can easily be had. It is really hard to spot, though, since the difference between IEnumerator and IEnumerable is so little.

Oh, for those of you who are wondering this yield return thing sounds like a threading thing, it is not. I went with a coworker to a Microsoft event showcasing new C# 2.0 features. They assured us that this had nothing to do with threading. I don’t know how to test this, but for what it's worth…

Demo Application

The demo application is nothing fancy. It just lets you add remove clear and enumerate a collection of strings. But wait, I thought this was supposed to be on a list of objects, not one. Well it is. A string is also a char[] and since the sorting on strings is exactly what we want to do with our prefix tree, we can use a built in class (Microsoft is pretty good at making these bug free) to test if our code produces identical results. As soon as the demo application loads up, it will run through a self test of a couple hundred randomly generated strings and compare the enumeration to the SortedList<string, string> representation to the new PrefixTree<char, string> representation. If all is good, then the application will tell you so and if not, it will let you know that too. I recommend that if you want to actually step through the code, you clear the lists and add just a few entries.

Sample screenshot

I have also included a link to Wikipedia’s page on Prefix Trees if you feel like looking for more information. It’s not exactly the same thing but it's close enough.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)