Introduction
I love Linq and I find myself using it more and more, but I am always mildly annoyed everytime I (re)discover that I can’t do a Distinct filter on a property of the class in my collection. For example, if I have a list of Contact
objects and I want to extract from that list a distinct list of Contact
s based on their email address. The parameter-less Distinct()
method will compare a Contact
object based on the default equality comparer, but there is no quick way to specify that I want to compare them based on email address. This article describes a generic implementation of an IEqualityComparer
that can be used by Distinct()
to compare any class based on a property of that class.
Background
This article assumes that you have a general understanding of LINQ extensions for .NET collections. Also, bear in mind here that this article is discussing Linq operating on in-memory objects, not Linq to SQL or Linq to Entities or anything else like that.
The Problem
First, let's look at our sample Contact
class:
public class Contact
{
public string Name {get; set;}
public string EmailAddress { get; set; }
}
Nothing fancy there, just a class with some basic properties. And the problem we want to solve is that if we have a list of Contact
objects where some contacts have the same email address, we want to get just a distinct list of email addresses by doing something like this:
IEnumerable<Contact> collection =
IEnumerable<Contact> distinctEmails = collection.Distinct();
But if we do this, Distinct
will compare Contact
objects based on the default equality comparer which will compare them by reference. In this case, Distinct
will return all of the Contact
s in our original collection (assuming they are all unique instances).
Solution 1: Override Default Equality Comparer
One solution to get Linq operate on the EmailAddress
property would be to override the Equals
and GetHashCode
methods for the Contact
class and have it use the EmailAddress
property of the Contact
. This would cause the parameter-less Distinct()
method to use your override. Besides the fact that this method has subtle complications that make it tricky, you might not always want to compare Contact
objects based on EmailAddress
. You might also sometimes compare them based on Name
. So the Equals
operator may not be the best solution.
Solution 2: Implement IEqualityComparer<Contact>
The Distinct()
method also has an overload which allows you to specify an IEqualityComparer
implementation. So, another solution is to write a class that implements IEqualityComparer<Contact>
and performs the comparison based on the EmailAddress
property.
To do this, we have to create our comparer class:
class ContactEmailComparer : IEqualityComparer<Contact>
{
#region IEqualityComparer<Contact> Members
public bool Equals(Contact x, Contact y)
{
return x.EmailAddress.Equals(y.EmailAddress);
}
public int GetHashCode(Contact obj)
{
return obj.EmailAddress.GetHashCode();
}
#endregion
}
IEqualityComparer<Contact> customComparer = new ContactEmailComparer();
IEnumerable<Contact> distinctEmails = collection.Distinct(customComparer);
This will cause the Distinct()
method to compare our objects based our custom Equals
implementation which uses the EmailAddress
property of the Contact
.
A Generic Solution
The implementation of the ContactEmailComparer
is pretty trivial, but it does seem like a lot of work just to get a distinct list of email addresses.
A more universal solution is to write a generic class where you can tell it which property of your objects to compare on. We will extend our IEqualityComparer
to use reflection to extract the value of a specified property, rather than restricting our class to one property.
Here is an implementation of such a class:
public class PropertyComparer<T> : IEqualityComparer<T>
{
private PropertyInfo _PropertyInfo;
public PropertyComparer(string propertyName)
{
_PropertyInfo = typeof(T).GetProperty(propertyName,
BindingFlags.GetProperty | BindingFlags.Instance | BindingFlags.Public);
if (_PropertyInfo == null)
{
throw new ArgumentException(string.Format("{0}
is not a property of type {1}.", propertyName, typeof(T)));
}
}
#region IEqualityComparer<T> Members
public bool Equals(T x, T y)
{
object xValue = _PropertyInfo.GetValue(x, null);
object yValue = _PropertyInfo.GetValue(y, null);
if (xValue == null)
return yValue == null;
return xValue.Equals(yValue);
}
public int GetHashCode(T obj)
{
object propertyValue = _PropertyInfo.GetValue(obj, null);
if (propertyValue == null)
return 0;
else
return propertyValue.GetHashCode();
}
#endregion
}
Now, to get our distinct list of email addresses, we do this:
IEqualityComparer<Contact> customComparer =
new PropertyComparer<Contact>("EmailAddress");
IEnumerable<Contact> distinctEmails = collection.Distinct(customComparer);
The best part about this solution is that it will work for any property and any type, so instead of writing a custom IEqualityComparer
, we can just reuse our generic PropertyComparer
.
For example, with no extra work, we can also get a distinct list of Contact
s by name by doing this:
IEqualityComparer<Contact> customComparer = new PropertyComparer<Contact>("Name");
IEnumerable<Contact> distinctEmails = collection.Distinct(customComparer);
Enhancements
Currently, this implementation only works for public
properties on a class. It would be easy to extend it to also inspect public
fields which would be a useful feature.
Conclusion
There is really nothing very special about this code. It is just a generic implementation of IEqualityComparer
that takes a string
specifying a property name in its constructor. But performing a Distinct
filter on a property is something I always feel like ought to be really easy but turns out to be sort of a pain. This class makes it a little easier, I hope you find it useful.
History
- 15th July, 2010: Initial post