Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / C#

Linq Group By over multiple keys/columns on an in memory data-table

3.10/5 (5 votes)
5 Apr 2008CPOL2 min read 1  
Shows one of the many ways to call linq group by over multiple columns on memory objects

Introduction

The code snippet in this article intendes to explain

  • How to invoke Linq queries on DataRowCollection
  • How to use group by clause using multiple dynamic columns/keys over DataRowCollection in memory

Using the code

1 - How to invoke LINQ queries on DataRowCollection

.net framework by default does not contains extension methods that supports invoking LINQ Queries over DataRowCollection thus datatable/dataset by design does not support LINQ queries out of the box (atleast in .net 3.5).

As LINQ queries are nothing but an extension we can over come this limitation by writing a wrapper of our own that could enumerate over DataRowCollection thus allowing us to invoke LINQ Queries

Class EnumerableDataRows (see code below) allows us to do the same, the constructor takes the IEnumerable DataRowCollection and the enumerator basically returns the data row one at a time on demand

C#
class EnumerableDataRows<T> : IEnumerable<T>, Enumerable
{
    IEnumerable dataRows;
    EnumerableDataRowList(IEnumerable items)
    {
        dataRows = items;
    }
    IEnumerator<T> IEnumerable<T>.GetEnumerator()
    {
       foreach(T dataRow in dataRows)
       {
           yield return dataRow;
       }
    }
    IEnumerator IEnumerable.GetEnumerator()
    {
        IEnumerable<T> iEnumerable = this;
        return iEnumerable.GetEnumerator();
    }
}

The following is a sample data table containing four columns namely Name, Age, Height and Diet

static DataTable GetTable
{
    get
    {
         DataTable dataTable = new DataTable("MyTable");
         dataTable.Columns.Add("Name");
         dataTable.Columns.Add("Age");
         dataTable.Columns.Add("Height");
         dataTable.Columns.Add("Diet");
         dataTable.Rows.Add("Bob", "28", "170", "Vegan");
         dataTable.Rows.Add("Michael", "28", "210", "Vegan");
         dataTable.Rows.Add("James", "28", "190", "NonVegan");
         dataTable.Rows.Add("George", "28", "200", "NonVegan");
         return dataTable;
    }
}

With the EnumerableDataRows class we can make the table's DataRowCollection enumerable thus enabling LINQ

EnumerableDataRowList<DataRow> enumerableRowCollection = new EnumerableDataRowList<DataRow>(Table.Rows); 

This completes our first objective, we can now invoke LINQ Queries on enumerableRowCollection

var groupedRows = from row in enumerableRowCollection group row by row["Age"]; 

2 - How to use group by clause using multiple dynamic columns/keys over DataRowCollection in memory

LINQ by default does not support grouping over multiple columns for in-memory objects (datatable in this example), as we can do in SQL. We can achieve the same in multiple ways such as writing multiple group by clause recursively for the various keys or by creating an in memory buckets to group programmatically. In this topic I am going to explain the way I feel is a better idea in terms of implementation or comprehensibility (I'd love to hear your comments if you think this a bad way etc).

The way I have implemented multiple column gouping using LINQ is by string concatenation i.e. keys are devised based on concatenation result

Logic: if we are grouping on age and diet columns, then concatenate both age and diet value(s) as one string and group based on the new string. The following function does exactly the same, it basically iterates over the dynamic list of columns, retrieve the column values and returns the concatenated string

public static String GroupData(DataRow dataRow)
{
    // This could be user defined dynamic columns
    String[] columnNames = new[] { "Age", "Diet" };
    
    stringBuilder.Remove(0, stringBuilder.Length);
    foreach (String column in columnNames)
    {
        stringBuilder.Append(dataRow[column].ToString());
    }
    return stringBuilder.ToString();
}

With our first objective achieved we can now group on multiple columns using the above logic

Func<DataRow, String> groupingFunction = GroupData;
var groupedDataRow = enumerableRowCollection.GroupBy(groupingFunction);

On execution the datatable is grouped something like this:

Grouped by -> 28Vegan
Items -> Bob
Items -> Michael
----
Grouped by -> 28NonVegan
Items -> James
Items -> George
----

Cool ... we are done with the two objectives above.

The following is the entire code snippet for your reference ... Enjoy and have fun!!!

using System;
using System.Text;
using System.Linq;
using System.Data;
using System.Collections.Generic;
using System.Collections;
using System.Linq.Expressions;
using System.Xml.Linq;
class EnumerableDataRowList<T> : IEnumerable<T>, IEnumerable
{
    IEnumerable dataRows;
    internal EnumerableDataRowList(IEnumerable items)
    {
        dataRows = items;
    }
    IEnumerator<T> IEnumerable<T>.GetEnumerator()
    {
        foreach (T dataRow in dataRows)
            yield return dataRow;
    }
    IEnumerator IEnumerable.GetEnumerator()
    {
        IEnumerable<T> iEnumerable = this;
        return iEnumerable.GetEnumerator();
    }
}
public class mainclass
{
    static StringBuilder stringBuilder = new StringBuilder();
    public static String GroupData(DataRow dataRow)
    {
        String[] columnNames = new[] { "Age", "Diet" };
        stringBuilder.Remove(0, stringBuilder.Length);
        foreach (String column in columnNames)
        {
            stringBuilder.Append(dataRow[column].ToString());
        }
        return stringBuilder.ToString();
    }
    static DataTable Table
    {
        get
        {
            DataTable dataTable = new DataTable("MyTable");
            dataTable.Columns.Add("Name");
            dataTable.Columns.Add("Age");
            dataTable.Columns.Add("Height");
            dataTable.Columns.Add("Diet");
            dataTable.Rows.Add("Bob", "28", "170", "Vegan");
            dataTable.Rows.Add("Michael", "28", "210", "Vegan");
            dataTable.Rows.Add("James", "28", "190", "NonVegan");
            dataTable.Rows.Add("George", "28", "200", "NonVegan");
            return dataTable;
        }
    }
    public static void Main()
    {
        EnumerableDataRowList<DataRow> enumerableRowCollection = new EnumerableDataRowList<DataRow>(Table.Rows);
        
        Func<DataRow, String> groupingFunction = GroupData;
        var groupedDataRow = enumerableRowCollection.GroupBy(groupingFunction);
        foreach (var keys in groupedDataRow)
        {
            Console.WriteLine(String.Format("Grouped by -> {0}",keys.Key));
            foreach (var item in keys)
            {
                Console.WriteLine(String.Format(" Item -> {0}", item["Name"]));
            }
            Console.WriteLine("----");
        }
        
        Console.ReadKey();
    }
}

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)