Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / VB
Print

Query Hierarchical DataTable

4.93/5 (11 votes)
5 Oct 2015CPOL3 min read 22.1K   584  
This tip shows one way to query hierarchical data from a DataTable by using an AsTree() method.

Introduction

Every once in a while, you may need to query data from a DataTable which has a hierarchical structure. In other words, the DataTable contains a foreign key column which points to itself thus creating a self-referencing structure. A typical example is an organization chart where employees have a reference to the manager.

In order to iterate the hierarchical structure correctly, you need to have some kind of recursive function. For this, I decided to create a new method called AsTree which would return the selected portion from the tree.

Implementing AsTree()

Before going to the actual implementation, let’s have a look at the return values. The AsTree method returns a set of TreeItem objects. A tree item looks as follows:

/// <summary>
/// Class definition for a single tree item
/// </summary>
public class TreeItem {

   /// <summary>
   /// Zero based index for defining the level of the current item
   /// </summary>
   public int Level { get; set; }

   /// <summary>
   /// The data row for the item
   /// </summary>
   public DataRow Row { get; set; }

   /// <summary>
   /// The parent data row 
   /// </summary>
   public DataRow ParentRow { get; set; }
}

The Row property contains the actual row from the tree while the ParentRow contains its parent DataRow (if any). The Level property contains the depth of the row in the tree calculated from the starting point.

The AsTree extension method is very simple:

/// <summary>
/// Returns a HashSet containing all the elements in the path 
/// </summary>
/// <param name="source">Source for the data rows</param>
/// <param name="parentColumnName">Column name that acts as a parent key</param>
/// <param name="childColumnName">Column name that acts as a foreign key</param>
/// <param name="startRow">Data row to start building the tree from</param>
/// <param name="direction">Direction of traverse</param>
/// <returns>The hash set containing the tree for the parent row provided</returns>
public static HashSet<TreeItem> AsTree(this EnumerableRowCollection<DataRow> source, 
              string parentColumnName, 
              string childColumnName, 
              DataRow startRow, 
              TraverseDirection direction = TraverseDirection.FromParentToChild) {
   string actualChildColumn;
   string actualParentColumn;
   HashSet<TreeItem> resultSet = new HashSet<TreeItem>();

   // Add the start row to the set
   resultSet.Add(new TreeItem() {
      Level = 0,
      ParentRow = null,
      Row = startRow
   });

   // Decide the order of the comparison based on the traverse direction
   if (direction == TraverseDirection.FromParentToChild) {
      actualParentColumn = parentColumnName;
      actualChildColumn = childColumnName;
   } else {
      actualParentColumn = childColumnName;
      actualChildColumn = parentColumnName;
   }

   // Recursively fill the tree 
   return DataTableTree.FillTree(source, resultSet, actualParentColumn, actualChildColumn, 
                                 startRow, 1);
}

The method uses parameters as follows:

  • parentColumnName is the DataColumn name which acts as a parent key
  • childColumnName is the foreign key column referencing to the parent column
  • startRow is the initial row from which the tree is investigated
  • direction defines the traverse direction, either top-down or bottom-up. The default is top-down.

The AsTree method simply adds the first element into the HashSet and then, based on the direction, decides which column acts as a parent and which one is the child. The method has an assumption that the foreign key is created using a single column.

The actual iteration is implemented in the following FillTree method:

/// <summary>
/// Adds all immediate children for a single parent into te given hash set
/// </summary>
/// <param name="source">Source for the data rows</param>
/// <param name="resultSet">Result set to fill</param>
/// <param name="actualParentColumn">Column name that acts as a parent key</param>
/// <param name="actualChildColumn">Column name that acts as a foreign key</param>
/// <param name="startRow">Data row to start building the tree from</param>
/// <param name="level">Current level of the items</param>
/// <returns></returns>
private static HashSet<TreeItem> FillTree(EnumerableRowCollection<DataRow> source, 
               HashSet<TreeItem> resultSet, 
               string actualParentColumn, 
               string actualChildColumn, 
               System.Data.DataRow startRow, 
               int level) {
   // Build a query for immediate children
   var subItems = from item in source
                  where item.Field<object>(actualChildColumn) != null
                  && item.Field<object>(actualChildColumn) != System.DBNull.Value
                  && item.Field<object>(actualChildColumn).Equals
                  (startRow.Field<object>(actualParentColumn))
                  select new TreeItem() {
                     Level = level,
                     Row = item,
                     ParentRow = startRow
                  };

   // Loop through the items found and add to the result set
   foreach (TreeItem subItem in subItems) {
      resultSet.Add(subItem);

      // Recursively fill children of this sub item
      DataTableTree.FillTree(source, resultSet, actualParentColumn, actualChildColumn, 
                             subItem.Row, level + 1);
   }

   return resultSet;
}

This method searches for immediate children for a given DataRow and adds them into the HashSet. For each child found, the FillTree method is called recursively in order to add the children of the DataRow at hand. On each call, the level is incremented.

Usage Example

Now let’s have a few test runs. Imagine the following organization:

Image 1

In order to define the structure for the organization, the following data table is created:

// Define the table
hierarchicalTable.Columns.Add("Id", typeof(int));
hierarchicalTable.Columns.Add("Name", typeof(string));
hierarchicalTable.Columns.Add("Title", typeof(string));
hierarchicalTable.Columns.Add("ManagerId", typeof(int));
hierarchicalTable.Constraints.Add(new System.Data.UniqueConstraint(
                                     hierarchicalTable.Columns["Id"], true));
hierarchicalTable.Constraints.Add(new System.Data.ForeignKeyConstraint(
                                     hierarchicalTable.Columns["Id"], 
                                     hierarchicalTable.Columns["ManagerId"]));

Each row has an Id field acting as a primary key and a ManagerId field which defines the manager for the person. The constraints wouldn't be necessary to use but they ensure that the parent key is present and that it is unique.

Our test data is filled like this:

// Add the data
hierarchicalTable.Rows.Add(1, "Edgar Allan Poe", "CEO", null);
hierarchicalTable.Rows.Add(2, "Blondie (Man with No Name)", "VP Sales", 1);
hierarchicalTable.Rows.Add(3, "Mary Poppins", "Sales Director", 2);
hierarchicalTable.Rows.Add(4, "Frodo Baggins", "Regional Sales Manager", 3);
hierarchicalTable.Rows.Add(5, "Bilbo Baggins", "Regional Sales Manager", 3);
hierarchicalTable.Rows.Add(6, "Smeagol", "Collections", 3);
hierarchicalTable.Rows.Add(7, "Euphegenia Doubtfire", "Sales Assistant", 2);
hierarchicalTable.Rows.Add(8, "Hermann Hesse", "VP R&D", 1);
hierarchicalTable.Rows.Add(9, "Tom Sawyer", "Researcher", 8);
hierarchicalTable.Rows.Add(10, "Isaac Asimov", "Futurist", 8);
hierarchicalTable.Rows.Add(11, "Mini-Me", "R&D Assistant", 8);

Full Tree

Now in order to see the whole tree, let’s first run a query from the top without any restrictions.

// Full tree query
startRow = hierarchicalTable.AsEnumerable().Where
(x => x.Field<int>("Id") == 1).FirstOrDefault();
var query1 = from person in hierarchicalTable.AsEnumerable().AsTree("Id", "ManagerId", startRow)
             select person;

System.Console.WriteLine("=============================================");
System.Console.WriteLine(string.Format("Tree for {0}", startRow["Name"]));
System.Console.WriteLine("=============================================");
foreach (DataTableTree.TreeItem item in query1) {
   System.Console.WriteLine(string.Format("   Level {0}: {1}{2}", 
                                          item.Level, 
                                          new string(' ', item.Level * 3), 
                                          item.Row["Name"]));
}

First of all, the starting row is defined. I've selected the top-most row based on the id, 1. After that, the actual LINQ query is defined. In order to illustrate the results, each row found is printed to the console with small amount of formatting so the result would look like this:

=============================================
Tree for Edgar Allan Poe
=============================================
   Level 0: Edgar Allan Poe
   Level 1:    Blondie (Man with No Name)
   Level 2:       Mary Poppins
   Level 3:          Frodo Baggins
   Level 3:          Bilbo Baggins
   Level 3:          Smeagol
   Level 2:       Euphegenia Doubtfire
   Level 1:    Hermann Hesse
   Level 2:       Tom Sawyer
   Level 2:       Isaac Asimov
   Level 2:       Mini-Me

Partial Tree

The next test is to query the tree partially. If we want to select all rows starting from Blondie, we simply need to define another starting row based on the Id (2). So the code would look like:

// Partial tree query
startRow = hierarchicalTable.AsEnumerable().Where(x => x.Field<int>("Id") == 2).FirstOrDefault();
var query2 = from person in hierarchicalTable.AsEnumerable().AsTree("Id", "ManagerId", startRow)
             select person;

System.Console.WriteLine();
System.Console.WriteLine("=============================================");
System.Console.WriteLine(string.Format("Tree for {0}", startRow["Name"]));
System.Console.WriteLine("=============================================");
foreach (DataTableTree.TreeItem item in query2) {
   System.Console.WriteLine(string.Format("   Level {0}: {1}{2}", 
                                          item.Level, 
                                          new string(' ', item.Level * 3), 
                                          item.Row["Name"]));
}

And the result would be:

=============================================
Tree for Blondie (Man with No Name)
=============================================
   Level 0: Blondie (Man with No Name)
   Level 1:    Mary Poppins
   Level 2:       Frodo Baggins
   Level 2:       Bilbo Baggins
   Level 2:       Smeagol
   Level 1:    Euphegenia Doubtfire

Note that the levels are now different since the 0 level is always the row where the query is started from.

Bottom-up Query

So far, we've made only top-down queries. The AsTree had a parameter which could be used to define the direction for the traversal. So this time, let's have a look at the relations starting from Smeagol. The code would look like:

// Reverse tree query
startRow = hierarchicalTable.AsEnumerable().Where
(x => x.Field<string>("Name") == "Smeagol").FirstOrDefault();
var query3 = from person in hierarchicalTable.AsEnumerable().AsTree
("Id", "ManagerId", startRow, DataTableTree.TraverseDirection.FromChildToParent)
             select person;

System.Console.WriteLine();
System.Console.WriteLine("=============================================");
System.Console.WriteLine(string.Format("Reverse tree for {0}", startRow["Name"]));
System.Console.WriteLine("=============================================");
foreach (DataTableTree.TreeItem item in query3) {
   System.Console.WriteLine(string.Format("   Level {0}: {1}{2}", 
                                          item.Level, 
                                          new string(' ', item.Level * 3), 
                                          item.Row["Name"]));
}

Now the result is:

=============================================
Reverse tree for Smeagol
=============================================
   Level 0: Smeagol
   Level 1:    Mary Poppins
   Level 2:       Blondie (Man with No Name)
   Level 3:          Edgar Allan Poe

Restrict the Results in the Query

The last test is to restrict the results in the LINQ query, just to make sure everything works as expected. For example, if we want to list all the assistants and their managers, the code could look like:

// Restricting results, all assistants
startRow = hierarchicalTable.AsEnumerable().Where
(x => x.Field<int>("Id") == 1).FirstOrDefault();
var query4 = from person in hierarchicalTable.AsEnumerable().AsTree("Id", "ManagerId", startRow)
             where person.Row.Field<string>("Title").Contains("Assistant")
             select person;

System.Console.WriteLine();
System.Console.WriteLine("=============================================");
System.Console.WriteLine(string.Format("All assistants and their managers"));
System.Console.WriteLine("=============================================");
foreach (DataTableTree.TreeItem item in query4) {
   System.Console.WriteLine(string.Format("   Level {0}: {1}{2} (manager: {3})", 
                                          item.Level, 
                                          new string(' ', item.Level * 3), 
                                          item.Row["Name"], 
                                          item.ParentRow["Name"]));
}

And now the result would be:

=============================================
All assistants and their managers
=============================================
   Level 2:       Euphegenia Doubtfire (manager: Blondie (Man with No Name))
   Level 2:       Mini-Me (manager: Hermann Hesse)

Conclusions

Querying a hierarchical data table is actually very simple. However, the code provided doesn't cover all possible scenarios, so feel free to modify it to better suit your needs.

History

  • 5th October, 2015: Created

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)