Introduction
From Wikipedia:
In computer science, a binary search tree (BST) is a node-based binary tree data structure which has the following properties:[1]
- The left subtree of a node contains only nodes with keys less than the node's key.
- The right subtree of a node contains only nodes with keys greater than the node's key.
- Both the left and right subtrees must also be binary search trees.
From the above properties, it naturally follows that:
- Each node (item in the tree) has a distinct key:
I have always been fascinated with the Binary Tree data structure. It’s nice to have a storage structure that supports O (log n) search time. I've decided to create my own binary tree data structure with a few added features like the ability to rebalance the tree whenever the user desires.
Background
First let me explain what O (log n) means when it comes to the Binary Search Trees.
Let's start out with O (n):
Let’s use an array for this example. Let’s suppose I have an array of 10 items. If I wanted to search for a specific element, there is a chance that I may have to search all the elements in the array. This situation will occur if the element that I’m looking for is located in the last cell of the array. Now let n be equal to the length of the array (here n=10). That means I will have to search up to 10 items. Therefore the search time for an array is O (n), where n is the length of the array.
Now O (Log n):
Now let’s say that I use a binary tree to store the 10 items instead of an array. Because the binary tree will store the elements in sorted order if I’m looking for a specific item, I don't have to search the entire binary tree. I can just check if the current element is greater or less than the element that I’m looking for. If the current element is greater than the element I’m looking for, I search its left tree. If the current element is less than the element I’m looking for, I search its right tree. The number of comparisons I make is really the depth of the tree itself. So O (log n) is the actual depth of the tree (the number of nodes from the root to the deepest element). This is the number of comparisons I need to make to find an element in the average case.
Why is O (log n) better than O (n)?
Again let’s say that we have 8 items that we place into an array and a binary tree.
In the array, my search time will be O (8) = 8. In the binary tree, my search time will be O (log 8) = 3. We can see that 3 < 8 here. So the binary tree would be a better choice.
Even as the number of elements increase to 64 (n = 64), I can see that O (log n) is faster because O (log 64) = 6 whereas O (64) is 64. It’s important to remember that in the binary tree, I’m only searching up the depth of the tree.
There can also be situations where a binary tree becomes degenerate (Google it). In this case, the average search time for both the array and binary tree will be O (n). My Balance()
function will take care of the degenerate tree and restore a O (log n) search time.
Implementation
I've created some simple procedures to visit the nodes in the binary tree in different orders. These functions are InOrder()
, PreOrder()
, PostOrder()
, DepthFirst()
, BreadthFirst()
. These terms should be familiar to knowledgeable developers. They all used named iterators to return each node. Currently I just print out the value of the node to the console. You can modify it to your liking.
The Balance() Function Explanation
As items get added to the binary tree, the structure can degenerate and cause a O (n) search time in some cases. The Balance()
function will rebalance the tree. The method I used to rebalance the tree is common. I've actually seen a few articles on the internet explaining it. I first copy the reference pointers of the elements in the binary tree into a list. I actually copy them InOrder
. This way all items in the list are in sorted order from smallest to largest. I then find a pivot index like this:
int middleNode = (int)Math.Ceiling(((double)min + max) / 2);
From the above line, I get the middle element of the list. This will be my root node in the new balanced tree. I then recursively apply the same technique to the left half of the list and then to the right half. Every time I find the ‘root’ node, I add it to my new balanced tree. At the end of the recursion, I am left with a nicely balanced tree.
Here is the complete code for the binary search tree:
[Serializable]
public class BinaryTree<T> where T : IComparable<T>
{
private BinaryTreeNode<T> root;
public int Count;
public void Insert(T value)
{
BinaryTreeNode<T> node = new BinaryTreeNode<T>(value);
Insert(node);
}
public void Insert(BinaryTreeNode<T> node)
{
if (node == null)
{
throw new ArgumentNullException("node","node cannot be null");
}
if (root == null)
{
root = node;
}
else
{
Insert(node, ref root);
}
Count++;
}
public void Delete(T value, bool rebalanceTree)
{
BinaryTreeNode<T> parentNode;
BinaryTreeNode<T> foundNode = null;
BinaryTreeNode<T> tree = parentNode = root;
while (tree != null)
{
if (value.CompareTo(tree.Data) == 0)
{
foundNode = tree;
break;
}
else if (value.CompareTo(tree.Data) < 0)
{
parentNode = tree;
tree = tree.Left;
}
else if (value.CompareTo(tree.Data) > 0)
{
parentNode = tree;
tree = tree.Right;
}
}
if (foundNode == null)
{
throw new Exception("Node not found in binary tree");
}
bool leftOrRightNode = false;
if (foundNode != parentNode.Left)
{
leftOrRightNode = true;
}
if (foundNode == parentNode)
{
if (rebalanceTree)
{
IList<BinaryTreeNode<T><binarytreenode<t>> listOfNodes =
new List<BinaryTreeNode<T><binarytreenode<t>>();
FillListInOrder(root, listOfNodes);
RemoveChildren(listOfNodes);
listOfNodes.Remove(parentNode);
root = null;
int count = Count - 1;
Count = 0;
BalanceTree(0, count - 1, listOfNodes);
}
else
{
BinaryTreeNode<T> leftMost = FindLeftMost(parentNode.Right, true);
if (leftMost != null)
{
leftMost.Left = parentNode.Left;
leftMost.Right = parentNode.Right;
root = leftMost;
}
}
}
else if (foundNode.Left == null && foundNode.Right == null)
{
if (leftOrRightNode)
{
parentNode.Right = null;
}
else
{
parentNode.Left = null;
}
}
else if (foundNode.Left != null &&
foundNode.Right != null)
{
if (leftOrRightNode)
{
parentNode.Right = foundNode.Right;
parentNode.Right.Left = foundNode.Left;
}
else
{
parentNode.Left = foundNode.Right;
parentNode.Left.Left = foundNode.Left;
}
}
else if (foundNode.Left != null ||
foundNode.Right != null)
{
if (foundNode.Left != null)
{
if (leftOrRightNode)
{
parentNode.Right = foundNode.Left;
}
else
{
parentNode.Left = foundNode.Left;
}
}
else
{
if (leftOrRightNode)
{
parentNode.Right = foundNode.Right;
}
else
{
parentNode.Left = foundNode.Right;
}
}
}
}
public BinaryTreeNode<T> Search(T value)
{
BinaryTreeNode<T> tree = root;
while (root != null)
{
if (value.CompareTo(tree.Data) == 0)
{
return tree;
}
else if (value.CompareTo(tree.Data) < 0)
{
tree = tree.Left;
}
else if (value.CompareTo(tree.Data) > 0)
{
tree = tree.Right;
}
}
return null;
}
public IEnumerable<BinaryTreeNode<T>> InOrder()
{
return InOrder(root);
}
private IEnumerable<BinaryTreeNode<T>> InOrder(BinaryTreeNode<T> node)
{
if (node != null)
{
foreach(BinaryTreeNode<T> left in InOrder(node.Left))
{
yield return left;
}
yield return node;
foreach (BinaryTreeNode<T> right in InOrder(node.Right))
{
yield return right;
}
}
}
public IEnumerable<BinaryTreeNode<T>> PreOrder()
{
return PreOrder(root);
}
public IEnumerable<BinaryTreeNode<T>> PostOrder()
{
return PostOrder(root);
}
public IEnumerable<BinaryTreeNode<T>> BreadthFirstTraversal()
{
Queue<BinaryTreeNode<T>> queue = new Queue<BinaryTreeNode<T>>();
queue.Enqueue(root);
while (queue.Count != 0)
{
BinaryTreeNode<T> current = queue.Dequeue();
if (current != null)
{
queue.Enqueue(current.Left);
queue.Enqueue(current.Right);
yield return current;
}
}
}
public IEnumerable<BinaryTreeNode<T>> DepthFirstTraversal()
{
Stack<BinaryTreeNode<T>> queue = new Stack<BinaryTreeNode<T>>();
BinaryTreeNode<T> current;
queue.Push(root);
while (queue.Count != 0)
{
current = queue.Pop();
if (current != null)
{
queue.Push(current.Right);
queue.Push(current.Left);
yield return current;
}
}
}
public void BalanceTree()
{
IList<BinaryTreeNode<T>> listOfNodes = new List<BinaryTreeNode<T>>();
FillListInOrder(root, listOfNodes);
RemoveChildren(listOfNodes);
root = null;
int count = Count;
Count = 0;
BalanceTree(0, count - 1, listOfNodes);
}
private void Insert(BinaryTreeNode<T> node, ref BinaryTreeNode<T> parent)
{
if (parent == null)
{
parent = node;
}
else
{
if (node.Data.CompareTo(parent.Data) < 0)
{
Insert(node, ref parent.Left);
}
else if (node.Data.CompareTo(parent.Data) > 0)
{
Insert(node, ref parent.Right);
}
else if (node.Data.CompareTo(parent.Data) == 0)
{
throw new ArgumentException("Duplicate node");
}
}
}
private void BalanceTree(int min, int max,
IList<BinaryTreeNode<T>><binarytreenode<t> list)
{
if (min <= max)
{
int middleNode = (int)Math.Ceiling(((double)min + max) / 2);
Insert(list[middleNode]);
BalanceTree(min, middleNode - 1, list);
BalanceTree(middleNode + 1, max, list);
}
}
private void FillListInOrder(BinaryTreeNode<T> node,
ICollection<BinaryTreeNode<T>><binarytreenode<t> list)
{
if (node != null)
{
FillListInOrder(node.Left, list);
list.Add(node);
FillListInOrder(node.Right, list);
}
}
private void RemoveChildren(IEnumerable<BinaryTreeNode<T><binarytreenode<t>> list)
{
foreach(BinaryTreeNode<T> node in list)
{
node.Left = null;
node.Right = null;
}
}
private IEnumerable<BinaryTreeNode<T>> PreOrder(BinaryTreeNode<T> node)
{
if (node != null)
{
yield return node;
foreach (BinaryTreeNode<T><t> left in PreOrder(node.Left))
{
yield return left;
}
foreach (BinaryTreeNode<T> right in PreOrder(node.Right))
{
yield return right;
}
}
}
private IEnumerable<BinaryTreeNode<T>> PostOrder(BinaryTreeNode<T> node)
{
if (node != null)
{
foreach (BinaryTreeNode<T> left in PostOrder(node.Left))
{
yield return left;
}
foreach (BinaryTreeNode<T> right in PostOrder(node.Right))
{
yield return right;
}
yield return node;
}
}
private BinaryTreeNode<T> FindLeftMost(BinaryTreeNode<T> node, bool setParentToNull)
{
BinaryTreeNode<T> leftMost = null;
BinaryTreeNode<T> current = leftMost = node;
BinaryTreeNode<T> parent = null;
while (current != null)
{
if (current.Left!=null)
{
parent = current;
leftMost = current.Left;
}
current = current.Left;
}
if (parent!=null && setParentToNull)
{
parent.Left = null;
}
return leftMost;
}
}
[Serializable]
public class BinaryTreeNode<T> where T : IComparable<T>
{
public BinaryTreeNode(T value)
{
Data = value;
}
public T Data
{
get;
set;
}
public BinaryTreeNode<T> Left;
public BinaryTreeNode<T> Right;
}
Comments
The way I implemented the Delete()
method when it comes to deleting the root node is interesting. Not only do I remove the root but I also rebalance the tree. I was looking at some examples where when I remove the root and not rebalance the tree, we would not get a good binary distribution and optimal tree. That being said, the binary tree (not the elements) is re-created in this case and does cause a small performance hit to re-insert them into the binary tree. The Delete()
method can also be called to delete the root node and not rebalance the tree. There is virtually no performance hit here other than the O (log n) to find the left-most node on the right sub-tree.
Very Important Points
Remember that this binary tree uses recursion to insert items and retrieve items (InOrder, PreOrder, PostOrder). That means that if there are many levels in the tree and therefore many nodes your execution stack may become overwhelmed. The thread execution stack has a limit of roughly 1 MB so many recursive calls will result in a StackOverflowException which cannot be caught and handled. In cases where you plan to store a huge amount of items in the tree please convert the recursive method to iterative ones. For example I converted the recursive insert to an interative version below:
private void Insert(BinaryTreeNode<t> node, BinaryTreeNode<t> parent)
{
BinaryTreeNode<t> tree = parent;
bool leftChild = false;
while (tree != null)
{
if (node.Data.CompareTo(tree.Data) < 0)
{
parent = tree;
tree = tree.Left;
leftChild = true;
}
else if (node.Data.CompareTo(tree.Data) > 0)
{
parent = tree;
tree = tree.Right;
leftChild = false;
}
else
{
throw new ArgumentException("Duplicate node");
}
}
if (leftChild)
{
parent.Left = node;
}
else
{
parent.Right = node;
}
}
Please tell me what you think!