Introduction
It is often useful to be able to find the items in one set that are not in another set.
This article presents a method for doing so in C# while retaining a fluid programming style.
The problem
Set subtraction is a common coding problem. I've done this many times in SQL Server with queries similar to the following, for example:
select u.*
from Users u
left join Administrators a on u.UserId = a.UserId
where a.AdministratorId is null
LINQ provides the IEnumerable<T>.Except()
method that provides the same functionality:
IEnumerable<string> result = users.Except(administrators);
Unfortunately, in order to do anything interesting with it, you have to provide an IEqualityComparer<T>
..., something that I will not be covering in this article. Fortunately, we can also implement the Except concept using the join
syntax as follows:
IEnumerable<string> result =
from item in users
join otherItem in administrators on item equals otherItem into tempItems
from temp in tempItems.DefaultIfEmpty()
where temp != null
select item;
This looks a lot like the SQL implementation. The notation gets messy fast when building complex queries however, and can result in code that is difficult to maintain. In this article, I'll build upon the join
implementation to get more flexibility.
One solution
First, build an Extension method to hide the complexity of the join
syntax:
[NotNull]
public static IEnumerable<T> Except<T>([NotNull] this IEnumerable<T> items,
[CanBeNull] IEnumerable<T> other)
{
return from item in items
join otherItem in other on item equals otherItem into tempItems
from temp in tempItems.DefaultIfEmpty()
where ReferenceEquals(null, temp) || temp.Equals(default(T))
select item;
}
Note that the where
clause has been changed to allow the extension method to work whether T
is a struct or a class. Also note that the method returns an IEnumerable<T>
so you can chain the result into another LINQ method fluidly; for example:
IEnumerable<string> result =
users.Except(administrators).ToList().ForEach(Console.WriteLine);
Here are the more interesting NUnit tests:
[TestFixture]
public class When_asked_to_get_items_from_a_set_that_are_not_in_another_set
{
[Test]
public void Should_return_only_those_items_that_are_not_in_the_
other_set_where_T_is_a_class()
{
List<string> input = new List<string> {"cat",
"ran", "fast"};
List<string> other = new List<string> {"dog",
"ran", "too", "slow"};
IEnumerable<string> result = input.Except(other);
Assert.IsNotNull(result, "result should never be null");
Assert.AreEqual(2, result.Count(), "count does not match");
Assert.AreEqual("cat", result.First(), "first item in result is incorrect");
Assert.AreEqual("fast", result.Last(), "last item in result is incorrect");
}
[Test]
public void Should_return_only_those_items_that_are_
not_in_the_other_set_where_T_is_a_struct()
{
List<int> input = new List<int> {1, 2, 3};
List<int> other = new List<int> {0, 2, 4, 6};
IEnumerable<int> result = input.Except(other);
Assert.IsNotNull(result, "result should never be null");
Assert.AreEqual(2, result.Count(), "count does not match");
Assert.IsTrue(result.All(item => item.IsOdd()));
Assert.AreEqual(1, result.First(), "first item in result is incorrect");
Assert.AreEqual(3, result.Last(), "last item in result is incorrect");
}
}
Providing a comparison method
Next, we'll create an overload that takes a Lambda expression for comparing the items in the two sets. This allows you to use something other than the natural equality key to compare them.
[NotNull]
public static IEnumerable<T> Except<T, TKey>([NotNull] this IEnumerable<T> items,
[CanBeNull] IEnumerable<T> other,
[NotNull] Func<T, TKey> getKey)
{
return from item in items
join otherItem in other on getKey(item)
equals getKey(otherItem) into tempItems
from temp in tempItems.DefaultIfEmpty()
where ReferenceEquals(null, temp) ||
temp.Equals(default(T))
select item;
}
The overloaded method can be tested with:
public class TestItem
{
public string Name { get; set; }
}
[Test]
public void Should_return_only_those_items_that_are_not_in_the_other_set()
{
List<TestItem> input = new List<TestItem>
{
new TestItem {Name = "cat"},
new TestItem {Name = "ran"},
new TestItem {Name = "fast"}
};
List<TestItem> other = new List<TestItem>
{
new TestItem {Name = "dog"},
new TestItem {Name = "ran"},
new TestItem {Name = "too"},
new TestItem {Name = "slow"}
};
IEnumerable<TestItem> result = input.Except(other, item => item.Name);
Assert.IsNotNull(result, "result should never be null");
Assert.AreEqual(2, result.Count(), "count does not match");
Assert.AreEqual("cat", result.First().Name, "first item in result is incorrect");
Assert.AreEqual("fast", result.Last().Name, "last item in result is incorrect");
}
Exclusion with different types
The last and most flexible overload we'll add allows the sets to contain different types. For example, you might have users in the main set, but only the IDs of the ones that are administrators in the comparison set, and you might want to be able to get the users that are not administrators. This overload provides that capability:
[NotNull]
public static IEnumerable<T> Except<T, TOther, TKey>(
[NotNull] this IEnumerable<T> items,
[CanBeNull] IEnumerable<TOther> other,
[NotNull] Func<T, TKey> getItemKey,
[NotNull] Func<TOther, TKey> getOtherKey)
{
return from item in items
join otherItem in other on getItemKey(item)
equals getOtherKey(otherItem) into tempItems
from temp in tempItems.DefaultIfEmpty()
where ReferenceEquals(null, temp) || temp.Equals(default(TOther))
select item;
}
Test usage is as follows:
public class User
{
public int Id { get; set; }
public string Name { get; set; }
}
[Test]
public void Should_return_only_those_items_that_are_not_in_the_other_set()
{
List<User> users = new List<User>
{
new User {Id = 1, Name = "Maria"},
new User {Id = 2, Name = "ZiYi"},
new User {Id = 3, Name = "Altair"}
};
List<int> administratorIds = new List<int> {2, 4, 6};
IEnumerable<User> result = users.Except(administratorIds,
item => item.Id, administratorId => administratorId);
Assert.IsNotNull(result, "result should never be null");
Assert.AreEqual(2, result.Count(), "count does not match");
Assert.AreEqual("Maria", result.First().Name, "first item in result is incorrect");
Assert.AreEqual("Altair", result.Last().Name, "last item in result is incorrect");
}
History
- 2008-11-30 - Initial CodeProject publication.
- 2008-11-23 - Initial blog entry.