Introduction
This article explains how to group data as it is retrieved by a datareader. The point is to group and summarize data based on one or more attributes. Sounds simple but is tricky!
Background
The real world problem I faced which led me to this solution was that I was getting potentially a large number of records from a datareader, and I wanted to be able to group these records when their ID fields were in sequence, or to be able to group the records by any other criteria. Each record would then be used to instantiate a struct which would then be added to a collection of like objects/structs. Again, the ID’s of these structs might possibly be in sequence (a pre condition is that the data is sorted on this field, and then by any other relevant fields on which grouping is performed), but then again the ID fields may not be in sequence (only sorted). The collection of these structs is what ultimately gets bound to a datagrid for display to the user. When the ID’s were in sequence, I did not want to list each one separately, instead I wanted a summary, something to the effect like “objects 1 to 10”, “object 11”, “objects 12 to 20” and so forth. This is saying that I may have n objects in sequence but I do not want to display all n objects, rather 1 object whose ID field (has to be a string for display) summarizes the range involved.
Using the Code
This approach is appropriate when we do not wish to rely on any particular data source, such as a SQL database and/or we wish to restrict our data source operations to Read, Write, Change, or Delete (perhaps for reasons of keeping "business logic" away from our data access). For whatever reason, when one does not wish to or cannot use the SQL grouping functionality then the approach taken here will be appropriate. I chose to share this as I myself, when confronted with this task, tried to search using the major search engines and could not find something similiar.
In addition, we use here a datareader as opposed to a datatable since a datatable will be 1) populated by a datareader behind the scenes anyway, and so then using a datatable is not necessary, 2) since potentially a large number of records may be returned, performance can be affected if the records were to be iterated through more than once and 3) My supervisor preferred the use of a datareader (remember this is from a real world situation!).
You can copy and paste this code wherever you need to, but note you will need to declare the variables and substitute your own objects in lieu of the "SomeObject" as listed in the code below. Change the variable names to suit your own needs/naming conventions:
cn.Open()
tr = cn.BeginTransaction
dr = GetDataReader()
If dr.Read() Then
startObj = SomeObject.CreateObject(dr.Item(1), dr.Item(2))
While dr.Read()
endObj = SomeObject.CreateObject(dr.Item(1), dr.Item(2))
If IsInSequence(startObj, endObj, ct) Then
ct += 1
Else
AddRecord(startObj, ct)
startObj = endObj
ct = 0
End If
End While
AddRecord(startObj, ct)
dr.Close()
tr.Commit()
End If
Private Function IsInSequence(ByRef startObj As SomeObject,
ByRef endObj As SomeObject, ByVal runCount As Integer) As Boolean
Dim obj(2) As Integer
If (startObj.Fieldx = endObj.Fieldx) = False Then Return False
obj(0) = Integer.parse(startObj.ID)
obj(1) = Integer.parse (endObj.ID)
Return (obj (0) + 1 + runCount) = obj (1)
End Function
Private Sub AddRecord(ByRef myObj As SomeObject, ByVal runCount As Integer)
If runCount = 0 Then
InnerList.Add(myObj)
Exit Sub
End If
Dim summarized As String
Dim nObj As Integer
Const sep As String = "..."
nObj = Integer.Parse(myObj.Id) + runCount
summarized = myObj.Id + sep + nCase.ToString
myObj.Id = summarized ‘myObj.Id is a string field
InnerList.Add(myObj)
End Sub
The point of the main algorithm (it is not inside any named function here) is to fetch first a starting record, and then continue fetching one at a time from the datasource using the datareader. Once the second record has been fetched, interrogate the two records on the field upon which grouping is to be performed. The comparison is perfromed inside the function named "IsInSequence" (which can be renamed as necessary) and basically determines if the two records are in sequence, or if needed, in the same "grouping" as determined by your own criteria in which case you would have to tweak this function according to your needs. If the records are in sequence, we simply increment a count variable, here named ct, by one and proceed until there is no longer a sequence or in other words a record has been fetched which does not fit in the same grouping as the previous record. Once x number of records have been found to be in sequence (for 1 >= x <= n (for n= total number of records)), they are recorded or processed in the method "AddRecord". For anyone using this code this method can be altered however one likes.