Introduction
Maybe you have already remarked that a tree is a two-dimensional data structure that can't be accessed linearly, like you should do with a list. To traverse a tree, you have to visit all its nodes, with respect to their orders and hierarchy. This can be done by using recursivity or loops.
An object-oriented approach of tree traversal is to build a most generic as possible class that encapsulate the traversal algorithm. A modern and good way to do that is building a tree iterator.
This article talks about tree iterators and different choices that can be taken while coding them for taking account performance, reusability and comprehension.
Background
First, we assume that you are familiar with tree data structures. In this case, you know that a tree is a hierarchical structure, organized in linked nodes. It can also be considered as a particular case of graphs (an acyclic connected graph). Each node has a unique parent and zero or several children. Nodes at the bottommost level of the tree are called leaf nodes.
Iterators and C#
An iterator is an object that can iterate through a container data structure, and display, stay by step in incremental manner, its elements. In object-oriented programming, the Iterator pattern is a design pattern that encapsulates the internal structure of the iteration process.
Widely used in the C++ STL, it is implemented there as a pointer that can be increased with the ++ operator. This can also be done with C#, but the Framework provides for us another and convenient approach: the foreach
keyword. In this case, the Iterator must implement the IEnumerator
interface.
StringList mylist = new StringList();
ListIterator iterator = new ListIterator(mylist);
while(iterator++) ShowValue(iterator.Value);
foreach(string value in mylist) ShowValue(value);
As you can see, the C++ syntax is very natural. You first define the iterator, and then use it in a while loop.
But, what about the C# syntax ? Mmmh, it's not straightforward, but let's see the basic mechanism. In fact, the class StringList
implements the IEnumerable
interface and provides a method called GetEnumerator()
that returns an iterator on the list. Thus, before enumerating, the IEnumerable
instance calls the GetEnumerator()
method that provides the iterator and increments it in each loop.
In the Visual Studio 2005 version, Microsoft came with a very useful new keyword: yield
. If you don't know this magic keyword, I suggest you go urgently and read this article. Otherwise, you might know that it can save much time, avoiding you the need to create the IEnumerator
class. In this case, the compiler will generate it for you. Read this excellent article on the yield keyword.
The Choice of the Tree Structure
Trees can be represented in many ways. Because trees are organized sets of nodes, it is natural to consider nodes as a basis. A node can have zero, one or more children. The first idea is to set a list of nodes to a node. This is not the chosen approach in this article, because the list implementation is not optimized in the case of nodes with a few children (binary trees, for example). For a more generic and flexible implementation, I prefer a linked list, like the one drawn in the picture below:
public interface INode<T>
{
INode<T> Parent { get; set; }
INode<T> Child { get; set; }
INode<T> Right { get; set; }
}
As you can see, each node has a parent, a right neighbour, and a child. You can remark too:
- If the parent is
null
, the node is the root of the tree.
- If the right is
null
, the node is the rightmost of its parent's children.
- If the child is
null
, the node is a leaf (the bottommost of the branch).
The code defining the generic process of iteration is easy and straightforward to write. Note that because it's generic, this algorithm does really nothing and must be implemented in an inherited class. This is why the following class is declared as abstract
:
public abstract class TreeEnumerator<T> : IEnumerator<INode<T>>
{
protected INode<T> _Tree = null;
protected INode<T> _Current = null;
public TreeEnumerator(INode<T> tree) {
_Tree = tree;
}
public INode<T> Current { get { return _Current; } }
object System.Collections.IEnumerator.Current {
get { return _Current; }
}
public abstract bool MoveNext();
public void Dispose() {
}
public void Reset() {
_Current = null;
}
public virtual TreeEnumerator<T> GetEnumerator() {
return this;
}
}
Tree Traversal Algorithms and Their Implementation
There are two families of tree traversal: the in-depth and the in-breadth algorithms. The first algorithm can be easily written using recursivity, whereas the latter needs a FIFO structure for memorizing already visited nodes. See more information on tree traversal.
For iterator purpose, it was not conceivable to use recursive methods. The traditional while
loops have been preferred. I expose here only the depth traversal algorithms. The breadth algorithm can be downloaded in the files section (with the entire Visual Studio Solution).
public class DepthTreeEnumerator<t> : TreeEnumerator<t>
{
public DepthTreeEnumerator(INode<t> tree)
: base(tree) { }
public override bool MoveNext() {
if (_Current == null) _Current = _Tree;
else if (_Current.Child != null) _Current = _Current.Child;
else if (_Current.Right != null) _Current = _Current.Right;
else {
INode node = _Current;
do {
if (node.Parent == null) return false;
node = node.Parent;
} while (node.Right == null);
_Current = node.Right;
}
return true;
}
}
Performance Comparison
Here is the code I used to compare the classic implementation with the yield
method implementation.
class Program
{
static void Main(string[] args) {
Node tree = new Node(1);
tree.Child = new Node(2);
tree.Child.Right = new Node(5);
tree.Child.Child = new Node(3);
tree.Child.Child.Right = new Node(4);
tree.Child.Right.Child = new Node(6);
tree.Child.Right.Child.Right = new Node(7);
int imax = 100;
double[] ratios = new double[imax];
ulong iter = 10000;
for (int i = 0; i < imax; i++)
ratios[i] = TestPerformance(tree, iter);
StringBuilder sb = new StringBuilder();
foreach(double value in ratios)
sb.Append(value.ToString()+";");
string copy = sb.ToString();
}
public static IEnumerable<Node> MoveNext(Node root) {
yield return root;
if (root.Child != null)
foreach (Node node in MoveNext((Node)root.Child))
yield return node;
if (root.Right != null)
foreach (Node node in MoveNext((Node)root.Right))
yield return node;
}
public static double TestPerformance(Node tree, ulong iter) {
DateTime start;
start = DateTime.Now;
for (ulong i = 0; i < iter; i++) {
DepthTreeEnumerator<int> enumerator;
enumerator = new DepthTreeEnumerator<int>(tree);
foreach (Node node in enumerator) {}
long ticks1 = ((TimeSpan)(DateTime.Now - start)).Ticks;
}
start = DateTime.Now;
for (ulong i = 0; i < iter; i++) {
foreach (Node node in MoveNext(tree)) { }
}
long ticks2 = ((TimeSpan)(DateTime.Now - start)).Ticks;
return (double)ticks2 / (double)ticks1;
}
}
The result is that the classic method is up to 5 times faster than the yield
return method! So, it's clear that with the addition of simple tests, it can be shown that the yield
keyword is not adapted in this kind of work.
Example Application
Theory discussions are not understandable without a concrete application. That's why I add to this article an example application of tree iterators, useful in the "true life" of a C# developer. This application exposes a TreeView
with many nodes, each one displaying a French surname.
To test the application, download the file above, unzip it and build the solution. Run the TreeIterator.Example
project (selected by default).
Then, in the dialog box that appears, select a surname among the ones suggested by the combo box. Click OK. Finally, three things happen at the same time:
- The node matching your selected surname is visible
- The first (and readonly)
TextBox
displays the path to the selected node
- All the surnames starting with the first letter of the selected surname are listed in the bottomost
TextBox
The application runs fine. But to appreciate the application of the iterator, we need to examine the code. In this project, there are two main files:
- SurnameTreeNode.cs
- MainForm.cs
Well, how does it work ? It's not really difficult. Remember, we already have defined a tree iterator that does all the traversal for us. But this iterator has been thought of for working on any INode
structure. So, to achieve the task, we need to create a new class (we'll name it SurnameNode
) that derives from INode
. This class will make the link between the TreeView
nodes and the iterator. The only thing we have to do is define the properties of a SurnameNode
object :
public INode<TreeNode> Parent {
get { return _Node == null ? null : new SurnameTreeNode(_Node.Parent); }
set { throw new NotImplementedException();
}
public INode<TreeNode> Child
{
get {
if (_Node == null || _Node.Nodes == null) return null;
if(_Node.Nodes.Count <= 0) return null;
return new SurnameTreeNode(_Node.Nodes[0]);
}
set { throw new NotImplementedException(); }
}
}
public INode<TreeNode> Right
{
get {
if (_Node == null || _Node.Parent == null) return null;
if(_Node.Parent.Nodes.Count <= _Node.Index + 1) return null;
var node = _Node.Parent.Nodes[_Node.Index + 1];
return new SurnameTreeNode(node );
}
set { throw new NotImplementedException(); }
}
Ok. Now, if we want to use our iterator in a foreach
loop, we have to defer its traversal with a yield
return loop :
private IEnumerable<TreeNode> Enumerate() {
DepthTreeEnumerator<TreeNode> enumerator = null;
enumerator = new DepthTreeEnumerator<TreeNode>(
new SurnameTreeNode(this.treeView1.Nodes[0])
);
foreach (SurnameTreeNode node in enumerator) yield return node.Node;
}
This method must be used like this:
IEnumerable nodes = Enumerate();
The last thing to do is writing search methods. That's where the power of iterators come in. We can now handle nodes linearly, like a list, and then apply LINQ method extensions. It's very easy!
The first use consists in getting all the surnames of the tree, ordered by surname. Here's the code to do so:
IQueryable<string> items;
items = Enumerate().OrderBy(node=>node.Text).Select(node=>node.Text);
this.comboBox1.Items.AddRange(items.ToArray());
Then, we can search for a node matching a specified name :
string name = "Boris"
TreeNode resultnode = Enumerate().Single(node => node.Text == name);
Or we can find all the names that start with a specified letter :
string letter = "B"
var nodes = Enumerate().Where(node=>node.Text.StartsWith(letter)));
List<string> names = nodes.Select(node=>node.Text).ToList();
Points of Interest
In this article, we saw an simple, generic and efficient way to traverse trees, with two different algorithms. We also saw that implementing yield
return based-algorithms gave bad performances. Then, we could apply our library on daily development problems, and prove that iterators, combined with LINQ, can become powerful when searching elements in non-ordered trees.
History
- 20th March, 2009: Initial post