Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / .NET

Selecting Distinct Records Using LINQ

0.00/5 (No votes)
15 Oct 2012CPOL3 min read 29.4K  
How to get dictinct records from a datatable in .NET.

Introduction

Back in the days I came across a situation where I had to retrieve some data into a dataset table, then from that table I had to filter some records according to the user entry. For the selected record I had to find the history from the same table and port the details into Microsoft Excel and format it for the end user. I am sure most of you who will read this will say, yeah right big deal, just write a Stored Procedure and get the work done. Well, a Stored Procedure can only be written if you have access to the database. I was given select only on three tables and that was it, no more or less access to the DB. I did this class sometime back and have found that I use this file over and over in many projects that I have done, though not too fancy but may save someone's time on redesigning the wheel. I understand that my inline comments are not up to par and I do understand if demerit. The reason for doing this class is to avoid the round trips to the server and letting the server use its resources for important work.

Background

You may simply import this file into your solution and start using it as it is. I must confess that I borrowed the idea from an article posted on Microsoft website and modified it to some complexity. Speaking of complexity I am sure I must have bored you to almost death by now so without further due lets get into business. 

Using the code 

This class, on initialization takes a parameter, the parameter indicates the table on which the distinct method will apply, you may use one of the six overloaded functions to do the distinct selection on the table supplied during initialization. All the functions will return a datarow array.  Let dissection the code:

Initializing the class:

VB
Private m_SourceTable As DataTable

''' <summary>Initiate the distinct selector class</summary> 
''' <param name="MyTable">The table on which to perform the distinct selection</param> 
Public Sub New(ByVal MyTable As DataTable)
    m_SourceTable = MyTable
End Sub

There is a local private variable that stores the table on which the distinct is to be applied.

Next is selecting distinct rows, the rows could be selected distinct either on all the fields in the table, on selected rows (equivalent to where clause in SQL statement) or on select rows and (equivalent to where clause in SQL statement)  and defined fields (columns). For each of the three different distinct methods you could also apply the row state filters as well e.g. you may only apply distinct on all rows that have city as Suva and row stated as added. The code below show the content of the file.

Entire table:

VB
''' <summary>Selects distinct combination of columns</summary> 
Public Function SelectDistinct() As DataRow()
    Dim dt As New DataTable(m_SourceTable.TableName)
    Dim OrderString As String = ""

    ' Create the table structure, get the field combination to distinct an the order
    For Each FieldName As DataColumn In m_SourceTable.Columns
        dt.Columns.Add(FieldName.ColumnName, FieldName.DataType)
        If OrderString.Length = 0 Then
            OrderString = FieldName.ColumnName
        Else
            OrderString = OrderString + ", " + FieldName.ColumnName
        End If
    Next
    ' Define the matrix for the values to compare for distinct occurrance
    Dim LastValues(m_SourceTable.Columns.Count - 1) As Object
    ' Initialize the last values to nothing
    For LastValueCounter As Integer = 0 To LastValues.Length - 1
        LastValues(LastValueCounter) = Nothing
    Next

    For Each dr As DataRow In m_SourceTable.Select("", OrderString)
        If ArrayNothing(LastValues) OrElse Not (ColumnsEqual(LastValues, dr, OrderString.Split(", ".ToCharArray()))) Then
            Dim fCounter As Integer = 0
            Dim MyNewRow As DataRow = dt.NewRow()
            For Each FieldName As String In OrderString.Split(", ".ToCharArray())
                LastValues(fCounter) = dr(FieldName)
                MyNewRow(FieldName) = dr(FieldName)
                fCounter += 1
            Next

            dt.Rows.Add(MyNewRow)
        End If
    Next

    Return dt.Select()
End Function

''' <summary>Selects distinct combination of columns for the stated row version</summary> 
''' <param name="recordStates">The row version to perform distinct selection</param> 
Public Function SelectDistinct(ByVal recordStates As DataViewRowState) As DataRow()
    Dim dt As New DataTable(m_SourceTable.TableName)
    Dim OrderString As String = ""

    ' Create the table structure, get the field combination to distinct an the order
    For Each FieldName As DataColumn In m_SourceTable.Columns
        dt.Columns.Add(FieldName.ColumnName, FieldName.DataType)
        If OrderString.Length = 0 Then
            OrderString = FieldName.ColumnName
        Else
            OrderString = OrderString + ", " + FieldName.ColumnName
        End If
    Next
    ' Define the matrix for the values to compare for distinct occurrance
    Dim LastValues(m_SourceTable.Columns.Count - 1) As Object
    ' Initialize the last values to nothing
    For LastValueCounter As Integer = 0 To LastValues.Length - 1
        LastValues(LastValueCounter) = Nothing
    Next

    For Each dr As DataRow In m_SourceTable.Select("", OrderString, recordStates)
        If ArrayNothing(LastValues) OrElse Not (ColumnsEqual(LastValues, dr, OrderString.Split(", ".ToCharArray()))) Then
            Dim fCounter As Integer = 0
            Dim MyNewRow As DataRow = dt.NewRow()
            For Each FieldName As String In OrderString.Split(", ".ToCharArray())
                LastValues(fCounter) = dr(FieldName)
                MyNewRow(FieldName) = dr(FieldName)
                fCounter += 1
            Next

            dt.Rows.Add(MyNewRow)
        End If
    Next

    Return dt.Select()
End Function

Selected Rows (equivalent to where clause in SQL statement): 

VB
''' <summary>Selects distinct combination of columns for selected rows</summary> 
''' <param name="FilterBy">The rows to selected distinct from, restricting the rows for compare</param> 
Public Function SelectDistinct(ByVal FilterBy As String) As DataRow()
    Dim dt As New DataTable(m_SourceTable.TableName)
    Dim OrderString As String = ""

    ' Create the table structure, get the field combination to distinct an the order
    For Each FieldName As DataColumn In m_SourceTable.Columns
        dt.Columns.Add(FieldName.ColumnName, FieldName.DataType)
        If OrderString.Length = 0 Then
            OrderString = FieldName.ColumnName
        Else
            OrderString = OrderString + ", " + FieldName.ColumnName
        End If
    Next
    ' Define the matrix for the values to compare for distinct occurrance
    Dim LastValues(m_SourceTable.Columns.Count - 1) As Object
    ' Initialize the last values to nothing
    For LastValueCounter As Integer = 0 To LastValues.Length - 1
        LastValues(LastValueCounter) = Nothing
    Next

    For Each dr As DataRow In m_SourceTable.Select(FilterBy, OrderString)
        If ArrayNothing(LastValues) OrElse Not (ColumnsEqual(LastValues, dr, _
                     OrderString.Split(", ".ToCharArray()))) Then
            Dim fCounter As Integer = 0
            Dim MyNewRow As DataRow = dt.NewRow()
            For Each FieldName As String In OrderString.Split(", ".ToCharArray())
                LastValues(fCounter) = dr(FieldName)
                MyNewRow(FieldName) = dr(FieldName)
                fCounter += 1
            Next

            dt.Rows.Add(MyNewRow)
        End If
    Next

    Return dt.Select()
End Function

''' <summary>Selects distinct combination of columns for selected rows</summary> 
''' <param name="FilterBy">The rows to selected distinct from, restricting the rows for compare</param> 
''' <param name="recordStates">The row version to perform distinct selection</param> 
Public Function SelectDistinct(ByVal FilterBy As String, ByVal recordStates As DataViewRowState) As DataRow()
    Dim dt As New DataTable(m_SourceTable.TableName)
    Dim OrderString As String = ""

    ' Create the table structure, get the field combination to distinct an the order
    For Each FieldName As DataColumn In m_SourceTable.Columns
        dt.Columns.Add(FieldName.ColumnName, FieldName.DataType)
        If OrderString.Length = 0 Then
            OrderString = FieldName.ColumnName
        Else
            OrderString = OrderString + ", " + FieldName.ColumnName
        End If
    Next
    ' Define the matrix for the values to compare for distinct occurrance
    Dim LastValues(m_SourceTable.Columns.Count - 1) As Object
    ' Initialize the last values to nothing
    For LastValueCounter As Integer = 0 To LastValues.Length - 1
        LastValues(LastValueCounter) = Nothing
    Next

    For Each dr As DataRow In m_SourceTable.Select(FilterBy, OrderString, recordStates)
        If ArrayNothing(LastValues) OrElse Not (ColumnsEqual(LastValues, dr, OrderString.Split(", ".ToCharArray()))) Then
            Dim fCounter As Integer = 0
            Dim MyNewRow As DataRow = dt.NewRow()
            For Each FieldName As String In OrderString.Split(", ".ToCharArray())
                LastValues(fCounter) = dr(FieldName)
                MyNewRow(FieldName) = dr(FieldName)
                fCounter += 1
            Next

            dt.Rows.Add(MyNewRow)
        End If
    Next

    Return dt.Select()
End Function 

Select Rows and (equivalent to where clause in SQL statement)  and defined fields (columns):

VB
''' <summary>Selects distinct combination of selected columns for selected rows</summary> 
''' <param name="FilterBy">The rows to selected distinct from, restricting the rows for compare</param> 
''' <param name="FieldNames">The columns to perform distinct
'''    selection from, the datarows will be ordered by this sequence</param> 
Public Function SelectDistinct(ByVal FilterBy As String, ByVal FieldNames() As String) As DataRow()
    Dim dt As New DataTable(m_SourceTable.TableName)
    Dim OrderString As String = ""
    ' Create the table structure, get the field combination to distinct an the order
    For Each FieldName As String In FieldNames
        dt.Columns.Add(FieldName, m_SourceTable.Columns(FieldName).DataType)
        If OrderString.Length = 0 Then
            OrderString = FieldName
        Else
            OrderString = OrderString + ", " + FieldName
        End If
    Next
    ' Define the matrix for the values to compare for distinct occurrance
    Dim LastValues(FieldNames.Length - 1) As Object
    ' Initialize the last values to nothing
    For LastValueCounter As Integer = 0 To LastValues.Length - 1
        LastValues(LastValueCounter) = Nothing
    Next

    For Each dr As DataRow In m_SourceTable.Select(FilterBy, OrderString)
        If ArrayNothing(LastValues) OrElse Not (ColumnsEqual(LastValues, dr, FieldNames)) Then
            Dim fCounter As Integer = 0
            Dim MyNewRow As DataRow = dt.NewRow()
            For Each FieldName As String In FieldNames
                LastValues(fCounter) = dr(FieldName)
                MyNewRow(FieldName) = dr(FieldName)
                fCounter += 1
            Next

            dt.Rows.Add(MyNewRow)
        End If
    Next

    Return dt.Select()
End Function

''' <summary>Selects distinct combination of selected columns for selected rows</summary> 
''' <param name="FilterBy">The rows to selected distinct from, restricting the rows for compare</param> 
''' <param name="FieldNames">The columns to perform distinct selection from, the datarows will be ordered by this sequence</param> 
''' <param name="recordStates">The row version to perform distinct selection</param> 
Public Function SelectDistinct(ByVal FilterBy As String, ByVal FieldNames() As String, ByVal recordStates As DataViewRowState) As DataRow()
    Dim dt As New DataTable(m_SourceTable.TableName)
    Dim OrderString As String = ""
    ' Create the table structure, get the field combination to distinct an the order
    For Each FieldName As String In FieldNames
        dt.Columns.Add(FieldName, m_SourceTable.Columns(FieldName).DataType)
        If OrderString.Length = 0 Then
            OrderString = FieldName
        Else
            OrderString = OrderString + ", " + FieldName
        End If
    Next
    ' Define the matrix for the values to compare for distinct occurrance
    Dim LastValues(FieldNames.Length - 1) As Object
    ' Initialize the last values to nothing
    For LastValueCounter As Integer = 0 To LastValues.Length - 1
        LastValues(LastValueCounter) = Nothing
    Next

    For Each dr As DataRow In m_SourceTable.Select(FilterBy, OrderString, recordStates)
        If ArrayNothing(LastValues) OrElse Not (ColumnsEqual(LastValues, dr, FieldNames)) Then
            Dim fCounter As Integer = 0
            Dim MyNewRow As DataRow = dt.NewRow()
            For Each FieldName As String In FieldNames
                LastValues(fCounter) = dr(FieldName)
                MyNewRow(FieldName) = dr(FieldName)
                fCounter += 1
            Next

            dt.Rows.Add(MyNewRow)
        End If
    Next

    Return dt.Select()
End Function

There are two functions that do some heavy lifting to do the distinct on the table.  The functions are:

ArrayNothing:

VB
Private Function ArrayNothing(ByVal MyArray() As Object) As Boolean
    Dim Is_AllNothing As Boolean = True

    ' Check if all the values in this array is nothing
    For Counter As Integer = 0 To MyArray.Length - 1
        If Not MyArray(Counter) Is Nothing Then
            Is_AllNothing = False
            Exit For
        End If
    Next

    Return Is_AllNothing
End Function

The above function checks if there are any previously stored row values, if all the values in the array is Nothing then this indicates that this is the first row. The first row is always distinct.

ColumnsEqual

VB
Private Function ColumnsEqual(ByVal MyRowRelative() As Object, _
           ByVal MyRow As DataRow, ByVal FieldNames() As String) As Boolean
    ' Compares the field values to see if they are equal. Also compares DBNULL.Value.
        ' Note: If your DataTable contains object fields, then you must extend this
        ' function to handle them in a meaningful way if you intend to group on them.

        Dim Is_Same As Boolean = True

        For Counter As Integer = 0 To FieldNames.Length - 1
            If MyRowRelative(Counter) Is DBNull.Value AndAlso MyRow(FieldNames(Counter)) Is DBNull.Value Then
                '  both are DBNull.Value, so do nothing
            ElseIf MyRowRelative(Counter) Is DBNull.Value AndAlso MyRow(Counter) Is DBNull.Value Then
                '  only one is DBNull.Value
                Is_Same = False
            ElseIf MyRowRelative(Counter).Equals(MyRow(FieldNames(Counter))) Then
                ' value type standard comparison, so do nothing
            Else
                'the tow values are no equal
                Is_Same = False
            End If
        Next

    Return Is_Same
End Function 

The above function looks through each array and if the fields in the array correspond, to check if the two are mismatch.

Using the code in your application:  

VB
Dim DistinctProvider As clsDistinctLinq = New clsDistinctLinq(<tablename>)
'The line below will select distinct rows for the entire columns in the table
Dim MyDistinctRows() As DataRow = DistinctProvider.SelectDistinct()

I hope that you will find this article helpful and keep an eye on if any bugs.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)