Introduction
Some time ago, I had to write some code that created an object tree from a DataView
. Pretty straight forward job, the code worked and I didn't bother with it for a while.
Then recently I started using the ANTS Profiler and found that the tree construction was actually rather slow. I examined my old code again and came up with a way of speeding it up significantly...
This article explains how the recursive character of the DataView
can be used to get a big performance improvement.
Note: the method I present doesn't rely on incremental sorting of the DataView
. Sorted backwards or even a mess will work. This is a very welcome property if the tree data is from an external, uncontrollable source.
What's in the ZIP?
The ZIP-file contains the tree building library code, and the test application code. Also included is the tree DataSet
in XML format. The tree consists of 745 nodes, going about five levels deep. The test application is a simple console application that creates TreeNode
s using the slow and the fast way and shows the loading time in seconds. It does this five times to get a better idea of the performance boost.
From the tree's point of view
In my original code, I handled the problem from the tree's point of view. The tree always has a root node, so I started with that node, and searched for its child nodes in the DataView
. For every child node, I searched for their respective child nodes as well, and so on.
The DataSet
consists of a collection of records with three columns: NodeID
, ParentNodeID
and NodeText
. The data is recursive: The ParentNodeID
refers to the NodeID
of the parent node. The database makes sure that a node can't be attached to a non-existent parent node. Because of this rule, a root node is made by having the node point to itself.
public class TreeNode
{
private int id;
private TreeNode parentNode;
private string text;
private ArrayList childNodes;
public TreeNode(int id, TreeNode parentNode)
{
this.id = id;
this.parentNode = parentNode;
this.text = "";
childNodes = new ArrayList();
if (parentNode != null)
{
parentNode.ChildNodes.Add(this);
}
}
public int ID
{
get {return id;}
}
public TreeNode ParentNode
{
get {return parentNode;}
set
{
parentNode = value;
if (parentNode != null && !parentNode.ChildNodes.Contains(this))
parentNode.ChildNodes.Add(this);
}
}
public string Text
{
get {return text;}
set {text = value;}
}
public ArrayList ChildNodes
{
get {return childNodes;}
}
}
The TreeNode
class basically reflects the database structure. It has an ID, and it refers to a parent. A root node has a null
parent. As a simple payload, the node has a Text
property.
public TreeNode LoadTree(DataView dataTree)
{
TreeNode rootNode = null;
foreach (DataRowView dataNode in dataTree)
{
int nodeID = (int)dataNode["NodeID"];
int parentNodeID = (int)dataNode["ParentNodeID"];
string text = dataNode["NodeText"].ToString();
if (nodeID == parentNodeID)
{
rootNode = new TreeNode(nodeID, null);
rootNode.Text = text;
LoadChildNodes(rootNode, dataTree);
}
}
return rootNode;
}
private void LoadChildNodes(TreeNode parentNode, DataView dataTree)
{
foreach (DataRowView dataNode in dataTree)
{
int nodeID = (int)dataNode["NodeID"];
int parentNodeID = (int)dataNode["ParentNodeID"];
string text = dataNode["NodeText"].ToString();
if (nodeID != parentNodeID && parentNodeID == parentNode.ID)
{
TreeNode node = new TreeNode(nodeID, parentNode);
node.Text = text;
LoadChildNodes (node, dataTree);
}
}
}
The LoadTree
method calls the recursive LoadChildNodes
method. This makes the code quite compact, however, there's one big problem with it: for every node, it scans the entire DataView
, including already handled nodes. So with the test DataSet
, 555770 records will be read. Combined with the fact that the DataView
doesn't read too fast, the total time required to place all nodes is considerable.
Also, if you try to load a very deeply branched tree, it might cause stack overflow problems.
From the DataView's point of view
The DataView
has a linear character: it starts at record 0 and ends at record x. Just going through the records from 0 to x to create each node doesn't seem possible at first: if its parent node was not created earlier, it's impossible to create the current node. Or is it?
The trick is to let go of the idea that each object you create should be complete. I don't know anything about the parent node, but I do know the ID it will have. So I can create an new node with only the parent's ID and no parent node, and store it in a Hashtable
. If I find the ID of the empty node later on in the DataSet
, I don't create it but get the stored node from the Hashtable
instead and fill in the empty properties.
public TreeNode LoadTree(DataView dataTree)
{
foreach (DataRowView dataNode in dataTree)
{
int nodeID = (int)dataNode["NodeID"];
int parentNodeID = (int)dataNode["ParentNodeID"];
string text = dataNode["NodeText"].ToString();
if (nodeList.Contains(nodeID))
{
TreeNode node = (TreeNode)nodeList[nodeID];
node.Text = text;
if (nodeID != parentNodeID && node.ParentNode == null)
{
TreeNode parentNode = null;
if (nodeList.Contains(parentNodeID))
{
parentNode = (TreeNode)nodeList[parentNodeID];
}
else
{
parentNode = new TreeNode(parentNodeID, null);
nodeList.Add(parentNodeID, parentNode);
}
node.ParentNode = parentNode;
}
}
else
{
if (nodeID == parentNodeID)
{
TreeNode node = new TreeNode(nodeID, null);
node.Text = text;
nodeList.Add(nodeID, node);
}
else
{
TreeNode parentNode = null;
if (nodeList.Contains(parentNodeID))
{
parentNode = (TreeNode)nodeList[parentNodeID];
}
else
{
parentNode = new TreeNode(parentNodeID, null);
nodeList.Add(parentNodeID, parentNode);
}
TreeNode node = new TreeNode(nodeID, parentNode);
node.Text = text;
nodeList.Add(nodeID, node);
}
}
}
IDictionaryEnumerator nodeEnumerator = nodeList.GetEnumerator();
while (nodeEnumerator.MoveNext())
{
TreeNode node = (TreeNode)nodeEnumerator.Value;
if (node.ParentNode == null)
return node;
}
return null;
}
The LoadTree
method here is a bit longer but it reads every record only once. Instead of searching nodes in the DataView
, they're searched using the much faster Hashtable
. The only additional thing that needs to be done at the end is locating the root node, whereas the SlowTree version has it ready at hand: it was the starting point. However, the extra time this extra code requires isn't really noticeable because of the speed with which the tree was built.
The results
On my 1.8GHz laptop, the test application requires just a little above two seconds to get the tree using the slow method. Using the fast method, it only takes about 0.004 seconds to create the same tree. That's about 500 times faster!
Note: of course, the test tree is a very simple tree. In real-life applications, each tree node might have a bigger payload than just a text. This will probably slow it down a little, but it will still be much faster than the original slow tree. I've seen improvements up to 400 times faster in my applications.
Conclusion
Applications can have a bad performance because:
- it calls (fast) methods very often, or
- it calls methods that are very slow, or, worst of all,
- it calls methods that are very slow very often.
The solution offered in this article avoids this by keeping the calls to the slowest method (reading records from a DataView
) to an absolute minimum. Admittedly, this is only possible because of the close relationship between the child- and parent objects, and some clever using of the little information we have about the objects.
Acknowledgements
I'd like to thank:
- Daniel Strigl for his easy-to-use HighPerformanceTimer class[^] , which I used to take the measurements.
- My colleague Jan, who originally came up with the described idea.