and get a set of actors with unique last names. So we need a way to provide a key selector to our
ActorComparer. This is done simply by creating a constructor that takes a function object as an argument and stores it for further use:
public ActorComparer(Func<MovieActor, object> keySelector)<br />{<br /> KeySelector = keySelector;<br />}<br />
The Func<MovieActor, object> is a class standing for something that might be called with MovieActor argument and must yield a result of type object. Although generally I don't like dealing with pure object's in my code, this is a valid way to define a key selector not bounding it to some specific key type. Moreover, in this case this doesn't lead to any problems because we aren't going to use anything except Equals(..) and GetHashCode() methods on the key, so no casts are required. Now we only need to modify Equals(..) and GetHashCode(..) methods (don't confuse these with object.Equals(object o) and object.GetHashCode() ) of our comparer so that they use new KeySelector property:
public bool Equals(MovieActor x, MovieActor y)<br />{<br /> return KeySelector(x).Equals(KeySelector(y));<br />}<br /><br />public int GetHashCode(MovieActor obj)<br />{<br /> return KeySelector(obj).GetHashCode();<br />}<br />
Since our Equals(..) and GetHashCode(..) methods look very similar, a question may arise: why do we need them both? First of all, we already know that we cant get rid of GetHashCode(..) because it is what Distinct(..) uses for comparison in the first place. Okay, let's deal with Equals(..) then: do we still need to compare key values when we have already used hash codes for comparison? Absolutely yes! What makes it inevitable is the idea behind the hash codes.
Hash functions that are used to generate hash codes, actually do one thing: they project elements from some data set to a smaller data set (the set of hash codes). The former might be almost anything, while the latter is usually the set of integers. This transformation allows for faster comparison of elements during look-up, because the elements of the second set are easier to compare and because there are fewer of them. Still, due to this same reason any hash function might eventually produce equal codes for non equal objects - this is called hash collision. That's why when LINQ comes across two elements with equal hashes it calls Equals(..) function to check whether the elements are actually equal.
This said, let's return to our ActorComparer. You might suggest that to achieve the goal we need to perform some more complex modifications, but no - all we have to do is use the comparer the new way:
var distinct = actors.Distinct(new ActorComparer(a => a.LastName));<br />
The result is the same as when using the first version of ActorComparer, although the new one is much more flexible in the sense that it may be used differently in different contexts and no further modifications are required to its code. Besides, it allows to use more than one property as a key, so the next call is absolutely valid and will preserve all actors with the same last name as long as their first names differ:
var distinct = actors.Distinct(<br /> new ActorComparer(a => <br /> new { a.LastName, a.FirstName }));<br />
The flexibility that this solution offers might be useful when one deals with the movie's sequel. The problem is that Julia Roberts plays two roles there: Tess Ocean and herself:
public static List<MovieActor> CreateSome()<br />{<br /> return new List<MovieActor>()<br /> {<br /> new MovieActor() { <br /> FirstName = "Brad", LastName = "Pitt", <br /> CharacterName = "Rusty"},<br /> new MovieActor() { <br /> FirstName = "Andy", LastName = "Garcia", <br /> CharacterName = "Terry"},<br /> new MovieActor() { <br /> FirstName = "George", LastName = "Clooney", <br /> CharacterName = "Dany"},<br /> new MovieActor() { <br /> FirstName = "Julia", LastName = "Roberts", <br /> CharacterName = "Tess"},<br /> new MovieActor() { <br /> FirstName = "Julia", LastName = "Roberts", <br /> CharacterName = "Julia Roberts"}<br /> };<br />}<br />
Still, with the previous call we'll see her only once in the results. The simple modification of the call to Distinct(..) will solve this issue, while still showing only one copy of George Clooney:
var distinct = actors.Distinct(<br /> new ActorComparer(<br /> a => <br /> new { a.LastName, a.FirstName, a.CharacterName }));<br />
<br />
Conclusion
This is it. We have explored the interaction between LINQ extension methods and custom
IEqualityComparers and even implemented one. The resulting class is both easy to use and highly customizable, because its operation is fully defined by the key selector function provided by user. Furthermore, it is very easy to make the class generic so that it can be used for collections of objects of other types - not only for MovieActors. The complete code for this example is available
through github. (There is also a generic version of our comparer.)
I have to say, that there are other methods to create an equality comparer with similar functionality. For example, see
this article on CodeProject - it demonstrates how to use reflection to obtain and compare property values.
Finally, if you just need to filter collection for distinct values based on some key and you want to do it quickly with as few additional actions as possible, there is a trick that doesn't require creating new types:
return actors.GroupBy(a => new { <br /> a.LastName, a.FirstName, a.CharacterName }).<br /> Select(g => g.First());<br />
Note that IEqualityComparer may (and should) be used to perform more complex comparisons, however its implementation won't get much more complex in most cases.
CodeProject