Introduction
With C# 2.0, a powerful language feature is being introduced which all C++ programmers have been missing since the beginnings of this altogether so cleanly and beautifully designed language: generics. This article uses this new feature to implement a generic tree class. Tree structures have more than just one natural way of being traversed, this is also a good motivation to demonstrate the use of C# 2.0's new iterator feature. If you wonder how you can build the code or how you can start trying out the new language features, go look at Microsoft's site where you can download a beta of their new Express line of developers products.
Background
For a general introduction into the topic of generics and/or iterators, please take a look at the specification of C# 2.0 by Microsoft which does a good job on explaining them in detail.
Usage
When using the code, you will be dealing with two classes: Node
and NodeCollection
. Before going into the details of the implementation, it's probably a good idea to scan the demo project which comes with the source archive. The demo project is a console application which takes a single parameter. This parameter should be a path pointing to a directory. The demo first loads the file hierarchy rooted inside that path into an internally kept tree structure using the Node
class, and then iterates over this tree structure once depth-first, then breadth-first. In the demo's Run()
method, you see how to specialize the Node
class for a given type, and construct an instance:
void Run(string rootdir)
{
Node<string> root = new Node<string>(rootdir);
The code above creates an instance of the Node
class specialized for containing data of type string
and initializes the instance with a value. The Node
class contains a property Data
which allows read-only access to the element type for which it is specialized. Exactly this property is what you initialize in the constructor code. If you want to access the children of a node, you use the Children
property which returns an instance of NodeCollection
. Each node has a Parent
property, too, which is read-only. If you add a node as the child of another, the first node's Parent
property will reflect this change. Newly constructed nodes return null
as their Parent
, meaning they are located in the root of the hierarchy. The NodeCollection
, on the other hand, has a read-only property Owner
returning the Node
whose children it contains. NodeCollection
is not meant to be used as a general-purpose container for Node
s, thus its constructor is not publicly accessible. Node
contains other useful properties and methods: the Root
property gets the root of a node (wouldn't you have guessed ;-)), IsAncestorOf()
takes another node and returns whether the instance node is an ancestor of the passed node, IsDescendantOf()
checks for exactly the opposite condition, and DoesShareHierarchyWith()
takes another node and returns whether the instance node and the passed node are equal or have an ancestor-descendant relationship (true
) or not (false
). GetDepthFirstEnumerator()
and GetBreadthFirstEnumerator()
return iterators performing a depth-first or breadth-first traversal. Note that the iterators returned by these methods iterate over the Data
values of the nodes, not the nodes themselves, meaning that .Current
will return the value of Data
of the currently visited node.
Implementation
The implementation of this small library makes use of two new features of C# 2.0: generics and iterators. Generics are used to allow the nodes to function as a type-safe container. Without them, the Data
property would have to be of type object, requiring clients to cast to the correct type and allowing to instantiate Node
with objects of different types. Iterators, on the other hand, are used for implementing depth-first and breadth-first traversal in a compellingly intuitive way. Let's take a look at their implementation:
public IEnumerator<Element> GetDepthFirstEnumerator()
{
yield return mData;
foreach (Node<Element> kid in mChildren)
{
IEnumerator<Element> kidenumerator= kid.GetDepthFirstEnumerator();
while (kidenumerator.MoveNext())
yield return kidenumerator.Current;
}
}
IEnumerator<>
is defined in System.Collections.Generic
and defines a typesafe iterator, i.e., an iterator whose Current
property is strongly typed.
Now, our depth-first traversal first returns the data of the current node itself, then iterates over the children nodes, gets their enumerators, and returns the values visited. For example, if you build up a structure like:
Node<string> root= new Node<string>("Gramps");
Node<string> son= new Node<string>("Son");
Node<string> daughter= new Node<string>("Daughter");
root.Children.Add(son);
root.Children.Add(daugther);
son.Children.Add(new Node<string>("Grandson"));
son.Children.Add(new Node<string>("Granddaughter"));
a call like:
foreach (string s in root.GetDepthFirstEnumerator())
Console.WriteLine(s);
will result in:
Gramps
Son
Grandson
Granddaughter
Daugther
First, root.GetDepthFirstEnumerator()
yields root
's data, i.e., "Gramps"; then it starts iterating over its children, the first one being son
. Now, son.GetDepthFirstEnumerator()
gets called and yields its data "Son" before starting to iterate over its own children. Both children yield their data "Grandson" and "Granddaughter", but no more because they don't have children themselves. Now, son.GetDepthFirstEnumerator()
is done, and root.GetDepthFirstEnumerator()
goes on to daughter
, which, having no children of its own, yields only its own data "Daughter".
Breadth-first traversal is a little more complicated, but still C# 2.0's iterator facility allows an incredibly easy implementation:
public IEnumerator<Element> GetBreadthFirstEnumerator()
{
Queue<Node<Element>> todo = new Queue<Node<Element>>();
todo.Enqueue(this);
while (0 < todo.Count)
{
Node<Element> n = todo.Dequeue();
foreach (Node<Element> kid in n.mChildren)
todo.Enqueue(kid);
yield return n.mData;
}
}
Assuming the same node structure as for the depth-first example, the code piece:
foreach (string s in root.GetBreadthFirstEnumerator())
Console.WriteLine(s);
produces this output:
Gramps
Son
Daugther
Grandson
Granddaughter
In this case, recursion is unrolled into a loop. First, root.GetBreadthFirstEnumerator()
creates a queue and puts root
inside. Then, getting into the while
loop, it unqueues root
. Now, it iterates over root
's children son
and daughter
and puts them into the queue. Then, it yields root
's data "Gramps". Next time inside the loop, it pops son
from the queue, pushes son
's children into the queue, and yields son
's data "Son". Next time, it pops daughter
from the queue which has no children, so it yields daughter
's data "Daugther" and proceeds. Now, it pops the two children of son
, both of which don't have children of their own, thus do not add anything to the queue. Their data "Grandson" and "Granddaugther" gets yielded, and next time checking the loop-condition, the queue is empty and traversal thus finished.
Conclusion and To-Do
This little class gets you a generic tree container, if you need a quick and simple solution. It shows off the possibilities the new language features of C# 2.0 facilitate. In order to create a really useful, general purpose tree library, though, the code would have to be more adaptable to specific needs of the clients. Here's a short list of ideas - and who knows, maybe my next article will have some or even all of them implemented:
- At the moment,
Node
is implemented for trees with an arbitrary number of children. But very often, you know that each node will have a fixed number of children (binary trees, quad trees etc.). For such scenarios, the presented implementation is clearly inefficient as it makes use internally of a framework container (System.Containers.Generic.List
) where it should be using an array. So, it would be nice if nodes were parameterizable with a strategy for keeping children. Ideally, there would be two predefined strategies in the library, one using the list container, another one for arrays of static size, and clients could implement an interface for other cases.
- Considering that quite often there is the need to visualize a tree structure, it would make sense to write an adaptor for filling a
TreeView
from a Node
structure.
- As for now, both traversal algorithms do not supply clients with any information about where exactly they are in the tree. Now, in game applications, for example, it's usually important to know how deep a depth-first traversal has gone currently. Besides, sometimes they want to parameterize how deep traversal will proceed, or they want to add nodes as they get deeper into the tree.
History
2004-08-10: first release.