(untagged)

A LargeListDictionary Implementation

Michael Micco

0.00/5 (No votes)

4 Mar 2004

Implementation of a list accessible by key with HashTable-like performance

Sample Image - LargeListDictionary_testapp.jpg

Introduction

This article covers the implementation of a LargeListDictionary class that keeps the order of the items like a ListDictionary, but performs more like a Hashtable as the list gets large.

Background

I don't quite know where to begin. I absolutely loved the article by Jason Smith: Add Support for "Set" Collections to .NET (Iesi.Collections namespace). I was looking for an implementation of sets and his was very generic and elegant. Unfortunately, I needed to use his Iesi.Collections.ListSet class to preserve the order of the items in my Iesi.Collections.Set, and the performance became unbearable as the set became large because it used a ListDictionary underneath, which performed horribly for both adding and removing items from the Iesi.Collections.Set. (A key point here is that sets use keys lookups when adding and removing items, so key lookup was a requirement.)

I then began my search for an ordered list that implements IDictionary so it could be used in a Iesi.Collections.DictionarySet (example: Iesi.Collections.ListSet inherits Iesi.Collections.DictionarySet and uses a ListDictionary). I needed a class with better performance than .NET's ListDictionary. I found the following two implementations here on Code Project:

Hashlist - Hashtable meets ArrayList by Mike McPhail.
A KeyedList implementation by Marc Clifton.

Both of these implementations basically used a Hashtable and an ArrayList internally, the Hashtable for lookup performance, and the ArrayList to keep the sequence of the items. They had two catches, though. Removing items from the list was very expensive as the list got large, due to the ArrayList; and the Iesi.Collections.Set.Intersect method called the Remove method. The second was that enumerating them gave you the hashed order, not the order that you added the items.

To sum up, I then created LargeListDictionary, which will perform similar to a Hashtable for both adding and removing items by key. I then created LargeListSet, to test my class' performance with the set operations.

If any of this introduction lost you, please read the three articles mentioned above, then return to finish this one.

Using the code

LargeListDictionary.cs consists of two classes and one struct:

LargeListDictionary - a class implementing IDictionary, ICollection, and IEnumerable
LargeListDictionaryEnumerator - a struct implementing IDictionaryEnumerator and IEnumerator
IndexedObject - a simple class containing a key, value and index.

LargeListSet.cs just contains one class: LargeListSet inheriting Iesi.Collections.DictionarySet.

For LargeListDictionary, the key concept is that it uses two Hashtable members. Both store IndexedObject objects, but one stores by the original key, and one by an internally generated int index.

public class LargeListDictionary : IDictionary, ICollection, IEnumerable
{
    private int _maxIndex;
    private Hashtable _itemsByKey;
    private Hashtable _itemsByIndex;

Whenever an item is added, an IndexedObject is created with its given key and value, and the next index is generated for it. The _maxIndex keeps track of the highest index given so far.

public void Add(object key, object value)
{
    _maxIndex++;
    int newIndex = _maxIndex;

    IndexedObject io = new IndexedObject(newIndex, key, value);
    _itemsByKey.Add(io.Key, io);
    _itemsByIndex.Add(io.Index, io);
}

Here, you can see that the expense of the Add is the overhead of creating the IndexedObject, and adding it to two Hashtable members. I am willing to accept this extra expense, because the real benefit comes when I check for objects or remove objects.

public bool Contains(object key)
{
    return _itemsByKey.ContainsKey(key);
}

public void Remove(object key)
{
    if (_itemsByKey.ContainsKey(key))
    {
        IndexedObject io = (IndexedObject)_itemsByKey[key];
        _itemsByKey.Remove(key);
        _itemsByIndex.Remove(io.Index);
    }
}

Here, you can see that checking for an item by key gives us straight-up HashTable performance, while removing an item gives us performance based on removing from two HashTables. Both of these operations are a huge improvement over a ListDictionary when the list gets large.

Why couldn't I just use a single Hashtable??? Using a wrapper object and two HashTables seems overly expensive. What gives?

Remembering the requirements from above, the list has to be enumerable in the order that the items are added. If I add 100, 90, 80, 70, etc. then enumerating them should return 100, 90, 80, 70. If there is one thing we all rudely learn at some point, it's that this is not the case with a HashTable. That is why I used the second Hashtable that is keyed by an internal index.

The Secret Sauce: LargeListDictionaryEnumerator

This is where the fun began. At first, I tried to implement something with one HashTable and one ArrayList where I would adjust the indexes after something was removed- BAD PLAN. I won't go into anymore detail on that one. With the two HashTable implementations, I could remove easily, but there remained one problem. Let's say, I added the following objects to the LargeListDictionary:

Key	Value
555-1212	J. Smith
555-4567	M. Rose
555-3333	B. Jones

The items would be wrapped with an IndexedObject, and the HashTables would contain the following:

Key	Index	Value
555-1212	0	J. Smith
555-4567	1	M. Rose
555-3333	2	B. Jones

Looking at the result, it is easy to see how I could enumerate. Just use a counter and look up the IndexedObject from the HashTable keyed by index. WAIT! It's not that simple once someone starts mucking with the list. After removing an item and adding a new one, the HashTables would contain the following:

Key	Index	Value
555-1212	0	J. Smith
555-3333	2	B. Jones
555-2727	3	New Guy

In the above example, I removed M. Rose from the list and added New Guy. Because the LargeListDictionary only increments its index generator, I end up with a gap where M. Rose used to be. This brings us to the implementation of LargeListDictionaryEnumerator.

public struct LargeListDictionaryEnumerator : 
                              IDictionaryEnumerator, IEnumerator
{

    private int _currentIndex;
    private Hashtable _itemsByIndex;
    private int _maxIndex;

    public LargeListDictionaryEnumerator(Hashtable itemsByIndex, int maxIndex)
    {
        _itemsByIndex = itemsByIndex;
        _currentIndex = -1;
        _maxIndex = maxIndex;
    }

    /// <summary>

    /// Increments the indexer. For the LargeListDictionary,

    /// if items have been removed, the next index (+1) may not

    /// have an entry.  Continue to increment until

    /// an item is found, or the max is reached.

    /// </summary>

    /// <returns>true if another item has been reached,

    /// false if the end of the list is hit.</returns>

    public bool MoveNext()
    {
        while (_currentIndex <= _maxIndex)
        {
            _currentIndex++;
            if (_itemsByIndex.ContainsKey(_currentIndex))
                return true;
        }

        return false;
    }

}

As you can see, the secret is in the MoveNext method of the enumerator. In the constructor, I pass in the HashTable that is keyed by our given index and the max index that LargeListDictionary has assigned so far. When the enumerator moves next, we run through a while loop to get to the next item that exists. Voila! Enumeration returns the items in the order they were added.

Other Performance Considerations

If you perform many remove and add operations on a LargeListDictionary, you will get more 'gaps' in the indexes. This will slow down enumeration. To address this, you could easily add a method to 'reindex' that would remove the gaps.

This kind of reindexing, or some other kind of counter may also be needed if you wanted to implement IList. True indexed (not the internal index) access like this[int index] or RemoveAt(int index) was not one of my requirements. Once you have gaps in the internal index, item[2] is not guaranteed to be IndexedObject.Index = 2.

Bonus: LargeListSet

The LargeListSet class is just a bonus. I is a class that inherits Iesi.Collections.DictionarySet and uses LargeListDictionary as the IDictionary underneath.

The TestApp

The test app provided is just a little GUI I put together to compare relative performance between sets based on different underlying classes that implement IDictionary. It also has a test button to verify ordered enumeration. It includes the source code from Jason Smith's article on sets as well as the source from Mike McPhail's article on HashList.

Conclusion

In summation, I hope that someone else finds LargeListDictionary useful where a large dictionary is needed but the sequence of items added is important to preserve.

History

March 5th, 2004 : Initial article created and submitted.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here