An Algorithm For Grouping Data on One or More Fields Using a DataReader In .NET

jfernand

4.00/5 (5 votes)

7 Jan 2010CPOL3 min read

12.1K

Introduction...

Introduction

This article explains how to group data as it is retrieved by a datareader. The point is to group and summarize data based on one or more attributes. Sounds simple but is tricky!

Background

The real world problem I faced which led me to this solution was that I was getting potentially a large number of records from a datareader, and I wanted to be able to group these records when their ID fields were in sequence, or to be able to group the records by any other criteria. Each record would then be used to instantiate a struct which would then be added to a collection of like objects/structs. Again, the ID’s of these structs might possibly be in sequence (a pre condition is that the data is sorted on this field, and then by any other relevant fields on which grouping is performed), but then again the ID fields may not be in sequence (only sorted). The collection of these structs is what ultimately gets bound to a datagrid for display to the user. When the ID’s were in sequence, I did not want to list each one separately, instead I wanted a summary, something to the effect like “objects 1 to 10”, “object 11”, “objects 12 to 20” and so forth. This is saying that I may have n objects in sequence but I do not want to display all n objects, rather 1 object whose ID field (has to be a string for display) summarizes the range involved.

Using the Code

This approach is appropriate when we do not wish to rely on any particular data source, such as a SQL database and/or we wish to restrict our data source operations to Read, Write, Change, or Delete (perhaps for reasons of keeping "business logic" away from our data access). For whatever reason, when one does not wish to or cannot use the SQL grouping functionality then the approach taken here will be appropriate. I chose to share this as I myself, when confronted with this task, tried to search using the major search engines and could not find something similiar.

In addition, we use here a datareader as opposed to a datatable since a datatable will be 1) populated by a datareader behind the scenes anyway, and so then using a datatable is not necessary, 2) since potentially a large number of records may be returned, performance can be affected if the records were to be iterated through more than once and 3) My supervisor preferred the use of a datareader (remember this is from a real world situation!).

You can copy and paste this code wherever you need to, but note you will need to declare the variables and substitute your own objects in lieu of the "SomeObject" as listed in the code below. Change the variable names to suit your own needs/naming conventions:

             cn.Open() 'open your cnxn object
             tr = cn.BeginTransaction 'start the xaction
             dr = GetDataReader() 'get your datareader reference
 
             If dr.Read() Then 'Get the first record upon which to base the starting obj.Id Number
                 'Create 1st obj (factory method)
                 startObj = SomeObject.CreateObject(dr.Item(1), dr.Item(2))
                 'look ahead and if a run is found, keep track of it using ct (as integer)
                 While dr.Read() 
                     'Create second obj, now we can
                     endObj = SomeObject.CreateObject(dr.Item(1), dr.Item(2))  
                     ' compare the two!
                     ' a run is found. Important to pass ct as a parameter
                     If IsInSequence(startObj, endObj, ct) Then 
                         ct += 1 ' count how far this run goes
                     Else
                         'there is either no run or the run was ended, which is why we 
                         'pass the var ct in this fxn call
                         AddRecord(startObj, ct) 
                         'the important part is to start over in our search for runs
                         startObj = endObj 
                         ct = 0 'part of starting over
                     End If
                 End While
                 AddRecord(startObj, ct) 'Add the last object
 
                 dr.Close()
                 tr.Commit()
             End If

     Private Function IsInSequence(ByRef startObj As SomeObject,
         ByRef endObj As SomeObject, ByVal runCount As Integer) As Boolean
         Dim obj(2) As Integer 'Object ID Numbers as integers
 
         If (startObj.Fieldx = endObj.Fieldx) = False Then Return False
 
         obj(0) = Integer.parse(startObj.ID)
         obj(1) = Integer.parse (endObj.ID)
         'we need to use runCount, even if it is 0, since the startObj.ID
         'value may differ from endObj.ID by more 'than 1, but by runCount + 1
         Return (obj (0) + 1 + runCount) = obj (1)  
     End Function
 Private Sub AddRecord(ByRef myObj As SomeObject, ByVal runCount As Integer)
         If runCount = 0 Then
             InnerList.Add(myObj)
             Exit Sub
         End If
         'if we even get here we know there was a run so we need to summarize the data
         Dim summarized As String 'to summarize the data
         'The actual obj Number at the end of the 'run' we know exists as runCount > 0
         Dim nObj As Integer 
         Const sep As String = "..."
 
         'Convert to int and add runCount to determine the last                
         ' number in sequence
         nObj = Integer.Parse(myObj.Id) + runCount 
         summarized = myObj.Id + sep + nCase.ToString 'summarize the run
         myObj.Id = summarized ‘myObj.Id is a string field
         InnerList.Add(myObj)
 
     End Sub
'Posted by Jeff Fernandez

The point of the main algorithm (it is not inside any named function here) is to fetch first a starting record, and then continue fetching one at a time from the datasource using the datareader. Once the second record has been fetched, interrogate the two records on the field upon which grouping is to be performed. The comparison is perfromed inside the function named "IsInSequence" (which can be renamed as necessary) and basically determines if the two records are in sequence, or if needed, in the same "grouping" as determined by your own criteria in which case you would have to tweak this function according to your needs. If the records are in sequence, we simply increment a count variable, here named ct, by one and proceed until there is no longer a sequence or in other words a record has been fetched which does not fit in the same grouping as the previous record. Once x number of records have been found to be in sequence (for 1 >= x <= n (for n= total number of records)), they are recorded or processed in the method "AddRecord". For anyone using this code this method can be altered however one likes.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)