Introduction
This article will show how you can drastically increase the performance of LINQ using simple to intermediate C#.
Background
I was given a project recently where we had to track personal demographic information, and when modified, the changes need synchronized to another database. The gotcha to this requirement is triggers were not allowed because the monitored database was not ours. We only had query access to it. The second gotcha is, the two databases cannot access each other.
The idea is to create a conduit in C# which hashes the demographic information for each person record and when a difference between the last sent hash and the demographic hash was discovered, the changes would be sent to a process which synchronized the data in the second database.
Essentially, there are two sets of data: the new/modified set, and the old set (which is the data in the second database). If there is a record in the new/modified set that's not in the old set, it was created. If there is a record in the old set and not in the new/modified set, it was deleted. The intersection of the two sets where the person is the same but the hashes are different, are the modified records.
Using LINQ queries and iterations with the var
keyword for the sets, it would take approximately 30 minutes to transfer 18000 records. For our scenario, this time was acceptable to the customer. But it was not acceptable for me. So I dug a little bit, and just making a few changes, I was able to increase the performance of an 18000 record transfer from 30 minutes to approximately one minute.
The examples in this article are based on my solution. However, it should be easy to apply my solution to any performance issue you may have.
Using the Code
In .NET 3.5, a new generic collection exists called the HashSet
.
The best feature of the HashSet
is the optimized set operation internal to the collection. The collections traditionally used with LINQ are not optimized, which is one of the reasons iterating these collections is time consuming. One of the ways to speed up accessing LINQ results is to store the results of the LINQ query into a HashSet
.
To accomplish this, we must add an extension method to convert the LINQ results to a templated HashSet
collection. We will follow the standard IEnumerable<T>
object and create a static ToHashSet<T>
extension method. To create an extension method, you must create a static class which holds the static extension method. The type you wish to extend must be the parameter passed in prefixed with the this
keyword. Once you perform the code below, any object which implements IEnumerable<T>
will have this ToHashSet<T>
method.
using System;
using System.Collections.Generic;
using System.Data.Linq;
using System.Linq;
using System.Text;
namespace LinqImprovements
{
public static class LinqUtilities
{
public static HashSet<T> ToHashSet<T>(this IEnumerable<T> enumerable)
{
HashSet<T> hashSet = new HashSet<T>();
foreach (var en in enumerable)
{
hashSet.Add(en);
}
return hashSet;
}
}
}
The next item we must deal is a method to compare the objects in the collection. We need to define what is equal and what is not equal. Therefore, we need to create a class which implements the IEqualityComparer<T>
interface.
using System;
using System.Collections.Generic;
using System.Data.Linq;
using System.Linq;
using System.Text;
namespace LinqImprovements
{
public static class ZIPLinqUtilities
{
public static HashSet<T> ToHashSet<T>(
this IEnumerable<T> enumerable)
{
HashSet<T> hashSet = new HashSet<T>();
foreach (var en in enumerable)
{
hashSet.Add(en);
}
return hashSet;
}
}
public class DemographicHashEqualityComparer :
IEqualityComparer<LastTransmittedPatientDemographic>
{
public bool Equals(LastTransmittedPatientDemographic demographicHashLeft,
LastTransmittedPatientDemographic demographicHashRight)
{
return (demographicHashLeft.PersonProfileId ==
demographicHashRight.PersonProfileId);
}
public int GetHashCode(LastTransmittedPatientDemographic demographicHash)
{
return base.GetHashCode();
}
}
}
The DemographicHashEqualityComparer
object will be used when comparing objects which exist in the HashSet
when performing the Except
set operation. Any object in the sets which have the same PersonProfileId
are returned as being equal. This comparison will be used when determining which objects were created or deleted since the last synchronization. When implementing the IEqualityComparer<T>
interface, you must implement the GetHashCode
method. Here, you can write your own hashing method, or you can just invoke the method in the base.
The final step is to create a second class which implements the IEqualityComparer<T>
interface. This class will be used to perform the intersect logic when determining which records were modified since the last synchronization.
using System;
using System.Collections.Generic;
using System.Data.Linq;
using System.Linq;
using System.Text;
namespace LinqImprovements
{
public static class ZIPLinqUtilities
{
public static HashSet<T> ToHashSet<T>(this IEnumerable<T> enumerable)
{
HashSet<T> hashSet = new HashSet<T>();
foreach (var en in enumerable)
{
hashSet.Add(en);
}
return hashSet;
}
}
public class DemographicHashEqualityComparer :
IEqualityComparer<LastTransmittedPatientDemographic>
{
public bool Equals(LastTransmittedPatientDemographic demographicHashLeft,
LastTransmittedPatientDemographic demographicHashRight)
{
return (demographicHashLeft.PersonProfileId == demographicHashRight.PersonProfileId);
}
public int GetHashCode(LastTransmittedPatientDemographic demographicHash)
{
return base.GetHashCode();
}
}
public class DemographicHashIntersectComparer :
IEqualityComparer<LastTransmittedPatientDemographic>
{
public bool Equals(LastTransmittedPatientDemographic demographicHashLeft,
LastTransmittedPatientDemographic demographicHashRight)
{
return ((demographicHashLeft.PersonProfileId ==
demographicHashRight.PersonProfileId) &&
(demographicHashLeft.DemographicsHash !=
demographicHashRight.DemographicsHash));
}
public int GetHashCode(LastTransmittedPatientDemographic demographicHash)
{
return base.GetHashCode();
}
}
}
Remember earlier, I defined modified records as those which exist in both sets with the same person profile ID but have different demographic hashes? This class will check for this case to determine which objects have been modified.
After defining these three classes, we're ready to use them in our LINQ operations.
In my case, I used a SQL query to build my demographic hashes, so the results come back in a DataTable
named newHashValuesTable
. My starting point here will be taking these results and storing them into a HashSet<T>
object.
HashSet<LastTransmittedPatientDemographic> newHashValues =
(from pi in newHashValuesTable.AsEnumerable()
select
new LastTransmittedPatientDemographic
{
PersonProfileId = pi.Field<int>("PersonProfileId"),
DemographicsHash = pi.Field<string>("DemographicsHash")
}).ToHashSet<LastTransmittedPatientDemographic>();
This seems redundant. However, it is faster overall to convert the DataTable
results into a HashSet
and query that collection as opposed to accessing the DataTable
.
The second step is to get the last transmitted hashes. This is just a simple LINQ query result from a database table.
HashSet<LastTransmittedPatientDemographic> lastTransmittedHashes =
dataContext
.LastTransmittedPatientDemographic
.Select(hash => hash).ToHashSet<LastTransmittedPatientDemographic>();
Now to the magic. First, we will determine which objects have been modified. Modified objects are those objects in both sets with the same profile ID and different demographic hashes. Remember the DemographicHashIntersectComparer
object? It's going to do all the dirty work for us.
DemographicHashIntersectComparer demographicIntersectComparer =
new DemographicHashIntersectComparer();
var updatedPatientInfos = newHashValues.Intersect(lastTransmittedHashes,
demographicIntersectComparer);
LINQ to SQL has an overridden Intersect
method which allows us to pass our own custom IEqualityComparer<T>
object. Each object in both sets will be compared by our custom comparison object, and those which result to true in the comparison object are returned into the updatedPatientInfos
object. Wasn't that easy? Two lines of code to determine the modified objects. No for loops, no difficult LINQ queries. Plus, since this is a HashSet
, the set operations performed during the intersect have been optimized.
Following the similar pattern, we will determine those objects created. Newly created objects are objects which exist in the new hash values set but not in the last transmitted hash values set. We can determine this by using the Except
method of the lastTransmittedHashes
HashSet
using the first comparison object we created. The Except
method will compare the objects in both sets and will return a result set where objects in the second set are removed from the first set. In the code snippet below, we are creating a new collection which are those objects in the new hash values set that do not exist in the last transmitted hash values set.
DemographicHashEqualityComparer demographicEqualityComparer =
new DemographicHashEqualityComparer();
var newPatientInfos = newHashValues.Except(lastTransmittedHashes,
demographicEqualityComparer);
Finally, determine the deleted ones. This is the exact opposite of what was done above using the same comparison object.
DemographicHashEqualityComparer demographicEqualityComparer =
new DemographicHashEqualityComparer();
var deletePatientInfos = lastTransmittedHashes.Except(newHashValues, demo
graphicEqualityComparer);
Hopefully, by now, you've realized how easier and more readable the HashSet
extension has made this code. And to top it off, performance testing showed an increase of 30 times because there was no need to iterate or query the LINQ result collections. By following these steps, you should be able to apply it to your own LINQ projects, and hopefully experience the performance increase for yourself.
Points of Interest
Unfortunately, if you are using LINQ to Entities, this method will not work for you. The reason is LINQ to Entities does not support the overloaded Intersect
and Except
methods which accept a comparison object. Bummer, huh?
History
- 08/03/2010 - AJW - Initial creation.
- 08/04/2010 - AJW - Fixed copy and paste errors.
- 08/04/2010 - AJW - Removed call to
.Contains
in the extension method. The Add
method already performs this check.