Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / DevOps

Bee: A Suite of Self-Balancing Binary Tree Based Dictionaries

4.90/5 (9 votes)
13 Sep 2019CPOL5 min read 7.8K   86  
A B-tree, an AVL tree, and a Splay tree in C#

Introduction

Although not used as much as .NET's Dictionary<TKey,TValue> class, the SortedDictionary<TKey, TValue> is an important part of the BCL's collections suite.

However, the framework class only gives you a red-black tree algorithm to work with in terms of the binary tree it sorts and searches.

Sometimes a B-tree, an AVL tree, or a splay tree is more appropriate. If you need one however, you're out of luck as far as the BCL goes.

Consequently, I've developed a few classes to fill the gap, primarily by porting, debugging, and rewriting swaths of code at https://www.geeksforgeeks.org (specific links for the various portions in the source) so I could learn the algorithms.

Background

Binary trees are instrumental in many, if not most searching and sorting algorithms, because divide and conquer is an efficient strategy for traversing sorted data. Over the years, several variants have been developed in order to optimize them for searches of a particular type.

A binary tree - from geeksforgeeks.org

A binary tree - from geeksforgeeks.org

A binary tree sorts items vertically, to make a sort of 2 dimensional traversal map so that instead of comparing every item in the worst case, it only compares N items in the worst case where N is the height of the tree. The tree above takes a maximum of 4 comparisons.

There are many ways to make the equivalent of the above tree, but all of the alternatives will have a maximum height higher than this one, requiring more comparisons. This tree is ideal. It's said to be balanced. A balanced tree requires the minimum number of comparisons distributed across the entire spread of items.

Let's say we wanted to add 15 to the above graph. There are two valid places to add it. It could be added under 7, to the right of 14, or it could be added under 14. The latter would make the tree taller, which is not what we want.

Want we want is an algorithm to always add nodes in the most optimal place, or otherwise always keep the tree balanced or as short as possible. Each algorithm provided has its own trick for doing so.

The B-tree's trick is it "squashes" the binary tree into something no longer binary.

b-tree

Notice how some nodes have more than one number/key in them. The b-tree does this to keep the tree short and to minimize access to the leaves. Databases like them because the leaves in a database index are often tied to disk blocks, so the less access the better.

The B-tree's real strength is in searching and sorting huge lists of items, which is why b-trees are a foundational part of database systems. The performance really starts to show after about 30,000 items or so. After a million, it can even best the Dictionary<TKey,TValue> in performance despite that class doing far less work since it doesn't have to sort. It outstrips SortedDictionary<TKey,TValue> after about 100 items. Adding and removing however, is slower. It's not strictly a binary tree, but it uses many of the same fundamentals, and is very similar. It's primary difference is storing more than two keys and items per node, and its tendency to grow fatter rather than taller.

The AVL tree looks like a standard binary tree, but it rotates portions of the tree on insert or removal to keep the tree balanced.

The AVL tree's strengths are similar to the B-trees but on a lesser scale. It catches up to the SortedDictionary<TKey, TValue> after about 10 items and bests it after 100. It's good to use when you don't do nearly as many inserts and deletes as you do searches.

The Splay tree looks exactly like a standard binary tree but it will rotate subsections, or even the entire tree. The last node operated on (inserted, removed, or searched) becomes the root node. This gives the AVL tree excellent locality, because its key and adjacent keys are so close to the root.

The Splay tree's strength is as mentioned - it has excellent locality. That means if you're searching on the same items, or items near to each other frequently, the Splay tree might be your best choice. The implementation I provide does not currently handle large numbers of items. Somewhere between 5,000 and 10,000 items you'll get a stack overflow exception because of how the alorithm works. I plan to eventually convert the offending routine to make it non-recursive but currently the limitation stands. These trees are not great for a large number of items anyway. Whether it bests SortedDictionary<TKey, TValue> or not is highly dependent on how its used. The performance tests I provide are more of a middling to worst case scenario for these trees so when looking at the numbers, bear that in mind.

Using the Code

Using the code is almost exactly like using any IDictionary<TKey,TValue> class. Each of the classes implements this interface, exposing the standard dictionary operations like ContainsKey() and TryGetValue() with the main additional constraint that the items are always returned in key order, like SortedDictionary<TKey,TValue> does. There is one major difference, however. These dictionaries require either a provided IComparable<TKey> instance, or they require the TKey types to be comparable, not simply equitable. This restriction is also shared with SortedDictionary<TKey, TValue>.

Use any of them like this:

C#
var d = new SortedBTreeDictionary<int,string>();
d.Add(1,"foo");
d.Add(100,"bar");
Console.WriteLine(d[100]);

History

  • 13th September, 2019 - Initial submission

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)