Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / VB

Query Hierarchical DataTable

4.93/5 (11 votes)
5 Oct 2015CPOL3 min read 22.1K   584  
This tip shows one way to query hierarchical data from a DataTable by using an AsTree() method.

Introduction

Every once in a while, you may need to query data from a DataTable which has a hierarchical structure. In other words, the DataTable contains a foreign key column which points to itself thus creating a self-referencing structure. A typical example is an organization chart where employees have a reference to the manager.

In order to iterate the hierarchical structure correctly, you need to have some kind of recursive function. For this, I decided to create a new method called AsTree which would return the selected portion from the tree.

Implementing AsTree()

Before going to the actual implementation, let’s have a look at the return values. The AsTree method returns a set of TreeItem objects. A tree item looks as follows:

C#
/// <summary>
/// Class definition for a single tree item
/// </summary>
public class TreeItem {

   /// <summary>
   /// Zero based index for defining the level of the current item
   /// </summary>
   public int Level { get; set; }

   /// <summary>
   /// The data row for the item
   /// </summary>
   public DataRow Row { get; set; }

   /// <summary>
   /// The parent data row 
   /// </summary>
   public DataRow ParentRow { get; set; }
}
VB.NET
' Class definition for a single tree item
Public Class TreeItem

   ' Zero based index for defining the level of the current item
   Public Property Level As Integer

   ' The data row for the item
   Public Property Row As DataRow

   ' The parent data row 
   Public Property ParentRow As DataRow
End Class

The Row property contains the actual row from the tree while the ParentRow contains its parent DataRow (if any). The Level property contains the depth of the row in the tree calculated from the starting point.

The AsTree extension method is very simple:

C#
/// <summary>
/// Returns a HashSet containing all the elements in the path 
/// </summary>
/// <param name="source">Source for the data rows</param>
/// <param name="parentColumnName">Column name that acts as a parent key</param>
/// <param name="childColumnName">Column name that acts as a foreign key</param>
/// <param name="startRow">Data row to start building the tree from</param>
/// <param name="direction">Direction of traverse</param>
/// <returns>The hash set containing the tree for the parent row provided</returns>
public static HashSet<TreeItem> AsTree(this EnumerableRowCollection<DataRow> source, 
              string parentColumnName, 
              string childColumnName, 
              DataRow startRow, 
              TraverseDirection direction = TraverseDirection.FromParentToChild) {
   string actualChildColumn;
   string actualParentColumn;
   HashSet<TreeItem> resultSet = new HashSet<TreeItem>();

   // Add the start row to the set
   resultSet.Add(new TreeItem() {
      Level = 0,
      ParentRow = null,
      Row = startRow
   });

   // Decide the order of the comparison based on the traverse direction
   if (direction == TraverseDirection.FromParentToChild) {
      actualParentColumn = parentColumnName;
      actualChildColumn = childColumnName;
   } else {
      actualParentColumn = childColumnName;
      actualChildColumn = parentColumnName;
   }

   // Recursively fill the tree 
   return DataTableTree.FillTree(source, resultSet, actualParentColumn, actualChildColumn, 
                                 startRow, 1);
}
VB.NET
' <summary>
' Returns a HashSet containing all the elements in the path 
' </summary>
' <param name="source">Source for the data rows</param>
' <param name="parentColumnName">Column name that acts as a parent key</param>
' <param name="childColumnName">Column name that acts as a foreign key</param>
' <param name="startRow">Data row to start building the tree from</param>
' <param name="direction">Direction of traverse</param>
' <returns>The hash set containing the tree for the parent row provided</returns>
<Extension()>
Public Function AsTree(source As EnumerableRowCollection(Of DataRow), _
       parentColumnName As String, _
       childColumnName As String, _
       startRow As DataRow, _
       Optional direction As DataTableTree.TraverseDirection = 
       DataTableTree.TraverseDirection.FromParentToChild) As HashSet(Of DataTableTree.TreeItem)
   Dim actualChildColumn As String
   Dim actualParentColumn As String
   Dim resultSet As HashSet(Of DataTableTree.TreeItem) = New HashSet(Of DataTableTree.TreeItem)()

   ' Add the start row to the set
   resultSet.Add(New DataTableTree.TreeItem() With {
      .Level = 0,
      .ParentRow = Nothing,
      .Row = startRow
   })

   ' Decide the order of the comparison based on the traverse direction
   If (direction = DataTableTree.TraverseDirection.FromParentToChild) Then
      actualParentColumn = parentColumnName
      actualChildColumn = childColumnName
   Else
      actualParentColumn = childColumnName
      actualChildColumn = parentColumnName
   End If

   ' Recursively fill the tree 
   Return DataTableTreeModule.FillTree(source, resultSet, actualParentColumn, actualChildColumn, 
                                       startRow, 1)
End Function

The method uses parameters as follows:

  • parentColumnName is the DataColumn name which acts as a parent key
  • childColumnName is the foreign key column referencing to the parent column
  • startRow is the initial row from which the tree is investigated
  • direction defines the traverse direction, either top-down or bottom-up. The default is top-down.

The AsTree method simply adds the first element into the HashSet and then, based on the direction, decides which column acts as a parent and which one is the child. The method has an assumption that the foreign key is created using a single column.

The actual iteration is implemented in the following FillTree method:

C#
/// <summary>
/// Adds all immediate children for a single parent into te given hash set
/// </summary>
/// <param name="source">Source for the data rows</param>
/// <param name="resultSet">Result set to fill</param>
/// <param name="actualParentColumn">Column name that acts as a parent key</param>
/// <param name="actualChildColumn">Column name that acts as a foreign key</param>
/// <param name="startRow">Data row to start building the tree from</param>
/// <param name="level">Current level of the items</param>
/// <returns></returns>
private static HashSet<TreeItem> FillTree(EnumerableRowCollection<DataRow> source, 
               HashSet<TreeItem> resultSet, 
               string actualParentColumn, 
               string actualChildColumn, 
               System.Data.DataRow startRow, 
               int level) {
   // Build a query for immediate children
   var subItems = from item in source
                  where item.Field<object>(actualChildColumn) != null
                  && item.Field<object>(actualChildColumn) != System.DBNull.Value
                  && item.Field<object>(actualChildColumn).Equals
                  (startRow.Field<object>(actualParentColumn))
                  select new TreeItem() {
                     Level = level,
                     Row = item,
                     ParentRow = startRow
                  };

   // Loop through the items found and add to the result set
   foreach (TreeItem subItem in subItems) {
      resultSet.Add(subItem);

      // Recursively fill children of this sub item
      DataTableTree.FillTree(source, resultSet, actualParentColumn, actualChildColumn, 
                             subItem.Row, level + 1);
   }

   return resultSet;
}
VB.NET
' <summary>
' Adds all immediate children for a single parent into te given hash set
' </summary>
' <param name="source">Source for the data rows</param>
' <param name="resultSet">Result set to fill</param>
' <param name="actualParentColumn">Column name that acts as a parent key</param>
' <param name="actualChildColumn">Column name that acts as a foreign key</param>
' <param name="startRow">Data row to start building the tree from</param>
' <param name="level">Current level of the items</param>
' <returns></returns>
Private Function FillTree(source As EnumerableRowCollection(Of DataRow), _
        resultSet As HashSet(Of DataTableTree.TreeItem), _
        actualParentColumn As String, _
        actualChildColumn As String, _
        startRow As System.Data.DataRow, _
        level As Integer) As HashSet(Of DataTableTree.TreeItem)
   ' Build a query for immediate children
   Dim subItems = From item In source
                  Where Not (item.Field(Of Object)(actualChildColumn) Is Nothing) _
                  AndAlso Not (item.Field(Of Object)(actualChildColumn) Is System.DBNull.Value) _
                  AndAlso item.Field(Of Object)_
                  (actualChildColumn).Equals(startRow.Field(Of Object)(actualParentColumn))
                  Select New DataTableTree.TreeItem() With {
                     .Level = level,
                     .Row = item,
                     .ParentRow = startRow
                  }

   ' Loop through the items found and add to the result set
   For Each subItem As DataTableTree.TreeItem In subItems
      resultSet.Add(subItem)

      ' Recursively fill children of this sub item
      DataTableTreeModule.FillTree(source, resultSet, actualParentColumn, actualChildColumn, _
                                   subItem.Row, level + 1)
   Next subItem

   Return resultSet
End Function

This method searches for immediate children for a given DataRow and adds them into the HashSet. For each child found, the FillTree method is called recursively in order to add the children of the DataRow at hand. On each call, the level is incremented.

Usage Example

Now let’s have a few test runs. Imagine the following organization:

Image 1

In order to define the structure for the organization, the following data table is created:

C#
// Define the table
hierarchicalTable.Columns.Add("Id", typeof(int));
hierarchicalTable.Columns.Add("Name", typeof(string));
hierarchicalTable.Columns.Add("Title", typeof(string));
hierarchicalTable.Columns.Add("ManagerId", typeof(int));
hierarchicalTable.Constraints.Add(new System.Data.UniqueConstraint(
                                     hierarchicalTable.Columns["Id"], true));
hierarchicalTable.Constraints.Add(new System.Data.ForeignKeyConstraint(
                                     hierarchicalTable.Columns["Id"], 
                                     hierarchicalTable.Columns["ManagerId"]));
VB.NET
' Define the table
hierarchicalTable.Columns.Add("Id", System.Type.GetType("System.Int32"))
hierarchicalTable.Columns.Add("Name", System.Type.GetType("System.String"))
hierarchicalTable.Columns.Add("Title", System.Type.GetType("System.String"))
hierarchicalTable.Columns.Add("ManagerId", System.Type.GetType("System.Int32"))
hierarchicalTable.Constraints.Add(New UniqueConstraint( _
                                     hierarchicalTable.Columns("Id"), True))
hierarchicalTable.Constraints.Add(New ForeignKeyConstraint( _
                                     hierarchicalTable.Columns("Id"), _
                                     hierarchicalTable.Columns("ManagerId")))

Each row has an Id field acting as a primary key and a ManagerId field which defines the manager for the person. The constraints wouldn't be necessary to use but they ensure that the parent key is present and that it is unique.

Our test data is filled like this:

C#
// Add the data
hierarchicalTable.Rows.Add(1, "Edgar Allan Poe", "CEO", null);
hierarchicalTable.Rows.Add(2, "Blondie (Man with No Name)", "VP Sales", 1);
hierarchicalTable.Rows.Add(3, "Mary Poppins", "Sales Director", 2);
hierarchicalTable.Rows.Add(4, "Frodo Baggins", "Regional Sales Manager", 3);
hierarchicalTable.Rows.Add(5, "Bilbo Baggins", "Regional Sales Manager", 3);
hierarchicalTable.Rows.Add(6, "Smeagol", "Collections", 3);
hierarchicalTable.Rows.Add(7, "Euphegenia Doubtfire", "Sales Assistant", 2);
hierarchicalTable.Rows.Add(8, "Hermann Hesse", "VP R&D", 1);
hierarchicalTable.Rows.Add(9, "Tom Sawyer", "Researcher", 8);
hierarchicalTable.Rows.Add(10, "Isaac Asimov", "Futurist", 8);
hierarchicalTable.Rows.Add(11, "Mini-Me", "R&D Assistant", 8);
VB.NET
' Add the data
hierarchicalTable.Rows.Add(1, "Edgar Allan Poe", "CEO", Nothing)
hierarchicalTable.Rows.Add(2, "Blondie (Man with No Name)", "VP Sales", 1)
hierarchicalTable.Rows.Add(3, "Mary Poppins", "Sales Director", 2)
hierarchicalTable.Rows.Add(4, "Frodo Baggins", "Regional Sales Manager", 3)
hierarchicalTable.Rows.Add(5, "Bilbo Baggins", "Regional Sales Manager", 3)
hierarchicalTable.Rows.Add(6, "Smeagol", "Collections", 3)
hierarchicalTable.Rows.Add(7, "Euphegenia Doubtfire", "Sales Assistant", 2)
hierarchicalTable.Rows.Add(8, "Hermann Hesse", "VP R&D", 1)
hierarchicalTable.Rows.Add(9, "Tom Sawyer", "Researcher", 8)
hierarchicalTable.Rows.Add(10, "Isaac Asimov", "Futurist", 8)
hierarchicalTable.Rows.Add(11, "Mini-Me", "R&D Assistant", 8)

Full Tree

Now in order to see the whole tree, let’s first run a query from the top without any restrictions.

C#
// Full tree query
startRow = hierarchicalTable.AsEnumerable().Where
(x => x.Field<int>("Id") == 1).FirstOrDefault();
var query1 = from person in hierarchicalTable.AsEnumerable().AsTree("Id", "ManagerId", startRow)
             select person;

System.Console.WriteLine("=============================================");
System.Console.WriteLine(string.Format("Tree for {0}", startRow["Name"]));
System.Console.WriteLine("=============================================");
foreach (DataTableTree.TreeItem item in query1) {
   System.Console.WriteLine(string.Format("   Level {0}: {1}{2}", 
                                          item.Level, 
                                          new string(' ', item.Level * 3), 
                                          item.Row["Name"]));
}
VB.NET
' Full tree query
startRow = hierarchicalTable.AsEnumerable().Where(Function(x) _
	x.Field(Of Integer)("Id") = 1).FirstOrDefault()
Dim query1 = From person In hierarchicalTable.AsEnumerable().AsTree("Id", "ManagerId", startRow)
             Select person

System.Console.WriteLine("=============================================")
System.Console.WriteLine(String.Format("Tree for {0}", startRow("Name")))
System.Console.WriteLine("=============================================")
For Each item As DataTableTree.TreeItem In query1
   System.Console.WriteLine(String.Format("   Level {0}: {1}{2}",
                                          item.Level,
                                          New String(" ", item.Level * 3),
                                          item.Row("Name")))
Next item

First of all, the starting row is defined. I've selected the top-most row based on the id, 1. After that, the actual LINQ query is defined. In order to illustrate the results, each row found is printed to the console with small amount of formatting so the result would look like this:

=============================================
Tree for Edgar Allan Poe
=============================================
   Level 0: Edgar Allan Poe
   Level 1:    Blondie (Man with No Name)
   Level 2:       Mary Poppins
   Level 3:          Frodo Baggins
   Level 3:          Bilbo Baggins
   Level 3:          Smeagol
   Level 2:       Euphegenia Doubtfire
   Level 1:    Hermann Hesse
   Level 2:       Tom Sawyer
   Level 2:       Isaac Asimov
   Level 2:       Mini-Me

Partial Tree

The next test is to query the tree partially. If we want to select all rows starting from Blondie, we simply need to define another starting row based on the Id (2). So the code would look like:

C#
// Partial tree query
startRow = hierarchicalTable.AsEnumerable().Where(x => x.Field<int>("Id") == 2).FirstOrDefault();
var query2 = from person in hierarchicalTable.AsEnumerable().AsTree("Id", "ManagerId", startRow)
             select person;

System.Console.WriteLine();
System.Console.WriteLine("=============================================");
System.Console.WriteLine(string.Format("Tree for {0}", startRow["Name"]));
System.Console.WriteLine("=============================================");
foreach (DataTableTree.TreeItem item in query2) {
   System.Console.WriteLine(string.Format("   Level {0}: {1}{2}", 
                                          item.Level, 
                                          new string(' ', item.Level * 3), 
                                          item.Row["Name"]));
}
VB.NET
' Partial tree query
startRow = hierarchicalTable.AsEnumerable().Where(Function(x) _
x.Field(Of Integer)("Id") = 2).FirstOrDefault()
Dim query2 = From person In hierarchicalTable.AsEnumerable().AsTree("Id", "ManagerId", startRow)
             Select person

System.Console.WriteLine()
System.Console.WriteLine("=============================================")
System.Console.WriteLine(String.Format("Tree for {0}", startRow("Name")))
System.Console.WriteLine("=============================================")
For Each item As DataTableTree.TreeItem In query2
   System.Console.WriteLine(String.Format("   Level {0}: {1}{2}",
                                          item.Level,
                                          New String(" ", item.Level * 3),
                                          item.Row("Name")))
Next item

And the result would be:

=============================================
Tree for Blondie (Man with No Name)
=============================================
   Level 0: Blondie (Man with No Name)
   Level 1:    Mary Poppins
   Level 2:       Frodo Baggins
   Level 2:       Bilbo Baggins
   Level 2:       Smeagol
   Level 1:    Euphegenia Doubtfire

Note that the levels are now different since the 0 level is always the row where the query is started from.

Bottom-up Query

So far, we've made only top-down queries. The AsTree had a parameter which could be used to define the direction for the traversal. So this time, let's have a look at the relations starting from Smeagol. The code would look like:

C#
// Reverse tree query
startRow = hierarchicalTable.AsEnumerable().Where
(x => x.Field<string>("Name") == "Smeagol").FirstOrDefault();
var query3 = from person in hierarchicalTable.AsEnumerable().AsTree
("Id", "ManagerId", startRow, DataTableTree.TraverseDirection.FromChildToParent)
             select person;

System.Console.WriteLine();
System.Console.WriteLine("=============================================");
System.Console.WriteLine(string.Format("Reverse tree for {0}", startRow["Name"]));
System.Console.WriteLine("=============================================");
foreach (DataTableTree.TreeItem item in query3) {
   System.Console.WriteLine(string.Format("   Level {0}: {1}{2}", 
                                          item.Level, 
                                          new string(' ', item.Level * 3), 
                                          item.Row["Name"]));
}
VB.NET
' Reverse tree query
startRow = hierarchicalTable.AsEnumerable().Where(Function(x) _
x.Field(Of String)("Name") = "Smeagol").FirstOrDefault()
Dim query3 = From person In hierarchicalTable.AsEnumerable().AsTree_
("Id", "ManagerId", startRow, DataTableTree.TraverseDirection.FromChildToParent)
             Select person

System.Console.WriteLine()
System.Console.WriteLine("=============================================")
System.Console.WriteLine(String.Format("Reverse tree for {0}", startRow("Name")))
System.Console.WriteLine("=============================================")
For Each item As DataTableTree.TreeItem In query3
   System.Console.WriteLine(String.Format("   Level {0}: {1}{2}",
                                          item.Level,
                                          New String(" ", item.Level * 3),
                                          item.Row("Name")))
Next item

Now the result is:

=============================================
Reverse tree for Smeagol
=============================================
   Level 0: Smeagol
   Level 1:    Mary Poppins
   Level 2:       Blondie (Man with No Name)
   Level 3:          Edgar Allan Poe

Restrict the Results in the Query

The last test is to restrict the results in the LINQ query, just to make sure everything works as expected. For example, if we want to list all the assistants and their managers, the code could look like:

C#
// Restricting results, all assistants
startRow = hierarchicalTable.AsEnumerable().Where
(x => x.Field<int>("Id") == 1).FirstOrDefault();
var query4 = from person in hierarchicalTable.AsEnumerable().AsTree("Id", "ManagerId", startRow)
             where person.Row.Field<string>("Title").Contains("Assistant")
             select person;

System.Console.WriteLine();
System.Console.WriteLine("=============================================");
System.Console.WriteLine(string.Format("All assistants and their managers"));
System.Console.WriteLine("=============================================");
foreach (DataTableTree.TreeItem item in query4) {
   System.Console.WriteLine(string.Format("   Level {0}: {1}{2} (manager: {3})", 
                                          item.Level, 
                                          new string(' ', item.Level * 3), 
                                          item.Row["Name"], 
                                          item.ParentRow["Name"]));
}
VB.NET
' Restricting results, all assistants
startRow = hierarchicalTable.AsEnumerable().Where(Function(x) _
x.Field(Of Integer)("Id") = 1).FirstOrDefault()
Dim query4 = From person In hierarchicalTable.AsEnumerable().AsTree("Id", "ManagerId", startRow)
             Where person.Row.Field(Of String)("Title").Contains("Assistant")
             Select person

System.Console.WriteLine()
System.Console.WriteLine("=============================================")
System.Console.WriteLine(String.Format("All assistants and their managers"))
System.Console.WriteLine("=============================================")
For Each item As DataTableTree.TreeItem In query4
   System.Console.WriteLine(String.Format("   Level {0}: {1}{2} (manager: {3})",
                                          item.Level,
                                          New String(" ", item.Level * 3),
                                          item.Row("Name"),
                                          item.ParentRow("Name")))
Next item

And now the result would be:

=============================================
All assistants and their managers
=============================================
   Level 2:       Euphegenia Doubtfire (manager: Blondie (Man with No Name))
   Level 2:       Mini-Me (manager: Hermann Hesse)

Conclusions

Querying a hierarchical data table is actually very simple. However, the code provided doesn't cover all possible scenarios, so feel free to modify it to better suit your needs.

History

  • 5th October, 2015: Created

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)