Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / C#

Increase the performance of your LINQ application with a few .NET tricks

3.58/5 (10 votes)
24 Aug 2010CPOL6 min read 59.2K  
This article will show an easy way to increase the performance of LINQ using .NET 3.5 and C#.

Introduction

This article will show how you can drastically increase the performance of LINQ using simple to intermediate C#.

Background

I was given a project recently where we had to track personal demographic information, and when modified, the changes need synchronized to another database. The gotcha to this requirement is triggers were not allowed because the monitored database was not ours. We only had query access to it. The second gotcha is, the two databases cannot access each other.

The idea is to create a conduit in C# which hashes the demographic information for each person record and when a difference between the last sent hash and the demographic hash was discovered, the changes would be sent to a process which synchronized the data in the second database.

Essentially, there are two sets of data: the new/modified set, and the old set (which is the data in the second database). If there is a record in the new/modified set that's not in the old set, it was created. If there is a record in the old set and not in the new/modified set, it was deleted. The intersection of the two sets where the person is the same but the hashes are different, are the modified records.

Using LINQ queries and iterations with the var keyword for the sets, it would take approximately 30 minutes to transfer 18000 records. For our scenario, this time was acceptable to the customer. But it was not acceptable for me. So I dug a little bit, and just making a few changes, I was able to increase the performance of an 18000 record transfer from 30 minutes to approximately one minute.

The examples in this article are based on my solution. However, it should be easy to apply my solution to any performance issue you may have.

Using the Code

In .NET 3.5, a new generic collection exists called the HashSet.

The best feature of the HashSet is the optimized set operation internal to the collection. The collections traditionally used with LINQ are not optimized, which is one of the reasons iterating these collections is time consuming. One of the ways to speed up accessing LINQ results is to store the results of the LINQ query into a HashSet.

To accomplish this, we must add an extension method to convert the LINQ results to a templated HashSet collection. We will follow the standard IEnumerable<T> object and create a static ToHashSet<T> extension method. To create an extension method, you must create a static class which holds the static extension method. The type you wish to extend must be the parameter passed in prefixed with the this keyword. Once you perform the code below, any object which implements IEnumerable<T> will have this ToHashSet<T> method.

C#
using System;
using System.Collections.Generic;
using System.Data.Linq;
using System.Linq;
using System.Text;

namespace LinqImprovements
{
   public static class LinqUtilities
   {
      public static HashSet<T> ToHashSet<T>(this IEnumerable<T> enumerable)
      {
         HashSet<T> hashSet = new HashSet<T>();

         foreach (var en in enumerable)
         {
            hashSet.Add(en);
         }

         return hashSet;
      }
   }
}

The next item we must deal is a method to compare the objects in the collection. We need to define what is equal and what is not equal. Therefore, we need to create a class which implements the IEqualityComparer<T> interface.

C#
using System;
using System.Collections.Generic;
using System.Data.Linq;
using System.Linq;
using System.Text;

namespace LinqImprovements
{
   public static class ZIPLinqUtilities
   {
      public static HashSet<T> ToHashSet<T>(
                    this IEnumerable<T> enumerable)
      {
         HashSet<T> hashSet = new HashSet<T>();

         foreach (var en in enumerable)
         {
            hashSet.Add(en);
         }

         return hashSet;
      }
   }
   
   public class DemographicHashEqualityComparer : 
          IEqualityComparer<LastTransmittedPatientDemographic>
   {
      public bool Equals(LastTransmittedPatientDemographic demographicHashLeft, 
                         LastTransmittedPatientDemographic demographicHashRight)
      {
         return (demographicHashLeft.PersonProfileId == 
                         demographicHashRight.PersonProfileId);
      }

      public int GetHashCode(LastTransmittedPatientDemographic demographicHash)
      {
         return base.GetHashCode();
      }
   }
}

The DemographicHashEqualityComparer object will be used when comparing objects which exist in the HashSet when performing the Except set operation. Any object in the sets which have the same PersonProfileId are returned as being equal. This comparison will be used when determining which objects were created or deleted since the last synchronization. When implementing the IEqualityComparer<T> interface, you must implement the GetHashCode method. Here, you can write your own hashing method, or you can just invoke the method in the base.

The final step is to create a second class which implements the IEqualityComparer<T> interface. This class will be used to perform the intersect logic when determining which records were modified since the last synchronization.

C#
using System;
using System.Collections.Generic;
using System.Data.Linq;
using System.Linq;
using System.Text;

namespace LinqImprovements
{
   public static class ZIPLinqUtilities
   {
      public static HashSet<T> ToHashSet<T>(this IEnumerable<T> enumerable)
      {
         HashSet<T> hashSet = new HashSet<T>();

         foreach (var en in enumerable)
         {
            hashSet.Add(en);
         }

         return hashSet;
      }
   }
   
   public class DemographicHashEqualityComparer : 
          IEqualityComparer<LastTransmittedPatientDemographic>
   {
      public bool Equals(LastTransmittedPatientDemographic demographicHashLeft, 
                  LastTransmittedPatientDemographic demographicHashRight)
      {
         return (demographicHashLeft.PersonProfileId == demographicHashRight.PersonProfileId);
      }

      public int GetHashCode(LastTransmittedPatientDemographic demographicHash)
      {
         return base.GetHashCode();
      }
   } 

   public class DemographicHashIntersectComparer : 
          IEqualityComparer<LastTransmittedPatientDemographic>
   {
      public bool Equals(LastTransmittedPatientDemographic demographicHashLeft, 
             LastTransmittedPatientDemographic demographicHashRight)
      {
         return ((demographicHashLeft.PersonProfileId == 
                    demographicHashRight.PersonProfileId) && 
                 (demographicHashLeft.DemographicsHash != 
                    demographicHashRight.DemographicsHash));
      }

      public int GetHashCode(LastTransmittedPatientDemographic demographicHash)
      {
         return base.GetHashCode();
      }      
   }
}

Remember earlier, I defined modified records as those which exist in both sets with the same person profile ID but have different demographic hashes? This class will check for this case to determine which objects have been modified.

After defining these three classes, we're ready to use them in our LINQ operations.

In my case, I used a SQL query to build my demographic hashes, so the results come back in a DataTable named newHashValuesTable. My starting point here will be taking these results and storing them into a HashSet<T> object.

C#
HashSet<LastTransmittedPatientDemographic> newHashValues = 
    (from pi in newHashValuesTable.AsEnumerable()
                        select
                    new LastTransmittedPatientDemographic
                {
                   PersonProfileId = pi.Field<int>("PersonProfileId"),
                   DemographicsHash = pi.Field<string>("DemographicsHash")
                }).ToHashSet<LastTransmittedPatientDemographic>();

This seems redundant. However, it is faster overall to convert the DataTable results into a HashSet and query that collection as opposed to accessing the DataTable.

The second step is to get the last transmitted hashes. This is just a simple LINQ query result from a database table.

C#
HashSet<LastTransmittedPatientDemographic> lastTransmittedHashes = 
    dataContext
     .LastTransmittedPatientDemographic
     .Select(hash => hash).ToHashSet<LastTransmittedPatientDemographic>();

Now to the magic. First, we will determine which objects have been modified. Modified objects are those objects in both sets with the same profile ID and different demographic hashes. Remember the DemographicHashIntersectComparer object? It's going to do all the dirty work for us.

C#
DemographicHashIntersectComparer demographicIntersectComparer = 
                        new DemographicHashIntersectComparer();
var updatedPatientInfos = newHashValues.Intersect(lastTransmittedHashes, 
                        demographicIntersectComparer);

LINQ to SQL has an overridden Intersect method which allows us to pass our own custom IEqualityComparer<T> object. Each object in both sets will be compared by our custom comparison object, and those which result to true in the comparison object are returned into the updatedPatientInfos object. Wasn't that easy? Two lines of code to determine the modified objects. No for loops, no difficult LINQ queries. Plus, since this is a HashSet, the set operations performed during the intersect have been optimized.

Following the similar pattern, we will determine those objects created. Newly created objects are objects which exist in the new hash values set but not in the last transmitted hash values set. We can determine this by using the Except method of the lastTransmittedHashes HashSet using the first comparison object we created. The Except method will compare the objects in both sets and will return a result set where objects in the second set are removed from the first set. In the code snippet below, we are creating a new collection which are those objects in the new hash values set that do not exist in the last transmitted hash values set.

C#
DemographicHashEqualityComparer demographicEqualityComparer = 
                       new DemographicHashEqualityComparer();
var newPatientInfos = newHashValues.Except(lastTransmittedHashes, 
                      demographicEqualityComparer);

Finally, determine the deleted ones. This is the exact opposite of what was done above using the same comparison object.

C#
DemographicHashEqualityComparer demographicEqualityComparer = 
                        new DemographicHashEqualityComparer();
var deletePatientInfos = lastTransmittedHashes.Except(newHashValues, demo
                         graphicEqualityComparer);

Hopefully, by now, you've realized how easier and more readable the HashSet extension has made this code. And to top it off, performance testing showed an increase of 30 times because there was no need to iterate or query the LINQ result collections. By following these steps, you should be able to apply it to your own LINQ projects, and hopefully experience the performance increase for yourself.

Points of Interest

Unfortunately, if you are using LINQ to Entities, this method will not work for you. The reason is LINQ to Entities does not support the overloaded Intersect and Except methods which accept a comparison object. Bummer, huh?

History

  • 08/03/2010 - AJW - Initial creation.
  • 08/04/2010 - AJW - Fixed copy and paste errors.
  • 08/04/2010 - AJW - Removed call to .Contains in the extension method. The Add method already performs this check.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)