Introduction
The System.Collections
namespace in the .NET Framework provides a number of collection types that are extremely useful for manipulating data in memory. However, there is one type of collection that is conspicuously missing from System.Collections
: the Set
.
A Set
is a collection that contains no duplicate elements. It is loosely modelled after the mathematical concept of a "set." This implementation is based on the Java Set
interface definition, so if you are also a Java programmer, this may seem familiar. The major differences are that this library does not use interfaces, and it provides a number of "standard" Set
operators that the Java library neglected to include.
Sets come in handy when an Array
or a List
won't quite fit the bill. Arrays in .NET have a fixed length, making it tedious to add and remove elements. Lists allow you add new objects easily, but you can have numerous duplicate elements, which is undesirable for some types of problems. Searching Arrays
or Lists
for elements is just plain slow for large data sets, requiring a linear search. You could keep the array sorted and use a binary search, but that is often more trouble than it is worth (especially since this library, and the .NET Framework, provide better ways that are already written for you).
With sets, adding elements, removing elements, and checking for the existence of an element is fast and simple. You can mix and match the elements in different sets using the supported mathematical set operators: union
, intersection
, exclusive-or
, and minus
. See the example below for more information.
You will see some interesting side effects with different Set
implementations in this library, depending on the underlying search algorithm. For example, if you choose a sort-based Set
, the elements will come out in sort order when you iterate using foreach
. If you use a hash-based Set
, the elements will come out in no particular order, but checking for inclusion will fastest when dealing with large data sets. If you use a list-based Set
, elements will come out in the order you put them in when you iterate. Additionally, list-based sets are fastest for very small data sets (up to about 10 elements), but get slower very quickly as the number of contained elements increases. To get the best of both worlds, the library provides a Set
type that uses lists for small data sets and switches to a hash-based algorithm when the data set gets large enough to warrant it.
Using the code
The Iesi.Collections
library has the following object hierarchy:
Set
- The abstract class from which all other sets inherit.
DictionarySet
- An abstract class that lets you create new types of Set
classes based on an arbitrary implementation of IDictionary
.
HashedSet
- Based on a HashTable
.
SortedSet
- Based on a SortedList
.
ListSet
- Based on a ListDictionary
.
HybridSet
- Based on a HybridDictionary
.
ImmutableSet
- A wrapper for other Sets
that makes them read-only.
SynchronizedSet
- A wrapper for other Sets
that makes them (mostly) thread-safe.
You will probably find the HashedSet
and HybridSet
to be the most useful implementations. They can contain any object that is immutable, can be compared using Equals()
, and has a valid implementation of GetHashCode()
. All of the normal value types and many objects already meet these requirements, so for most data types, HashedSet
and HybridSet
just work. The only downside to using them is that you can't predict the order of iteration.
SortedSet
is useful if you are interested in the iteration order of your Set
, but it imposes some different requirements, and they are usually more difficult to meet. In addition to being immutable, elements in a SortedSet
must also implement IComparable
. Further, they must actually be comparable without throwing an exception. So you would not be able to put string
values and int
values into the same Set
instance.
ListSet
is useful for very small data sets. When a ListSet
contains less than 10 elements, it is actually going to be faster than any of the other implementations. However, once you get above 10 elements, the run time for many of the set operations increases as the square of the data size. So an operation on a ListSet
containing 1,000 elements would be roughly 10,000 times slower than an operation on a ListSet
containing 10 items.
ImmutableSet
and SynchronizedSet
are specialized wrappers. They contain other sets, which then do all the real work. ImmutableSet
wraps an internal Set
to make it read-only. SynchronizedSet
wraps all the functions of an internal Set
to synchronize them. This allows the Set
to be used by more than one thread. See the documentation for more information on this, since there are special considerations for enumerating collections that are in use by multiple threads.
If you are interested in creating your own Set
types, this library supports that. If you want to do it from scratch, you can extend Set
and implement all the abstract functions. If you want to create a new Set
type based on an existing IDictionary
implementation, extend DictionarySet
. If you just want to add some new functionality to an existing, working Set
implementation, choose one of HashedSet
, HybridSet
, ListSet
, or SortedSet
to extend.
The example below demonstrates creating new sets and manipulating them using set operators. The currently supported set operators are described briefly in the following table:
Set Operators
Name |
Symbol |
Description |
Union() |
| |
A | B returns a Set containing all the elements from both input sets. |
Intersect() |
& |
A & B returns a Set containing all the elements that are in both A and B . |
ExclusiveOr() |
^ |
A ^ B returns a Set containing the elements that are in A or in B , but are not in both A and B . |
Minus() |
- |
A - B returns a Set containing all the elements of A , with the elements from B removed. |
Equals() |
|
You can use Equals() to compare two Set instances for equivalence. That is, A.Equals(B) is true if A and B contain exactly the same elements. The '==' and '!=' operators are not overridden by design. |
The example uses Set
instances to represent states in the southwestern United States. Each Set
holds the names of the major rivers in the state. It then uses the basic state-river information to derive all sorts of fun facts about the rivers.
using System;
using Iesi.Collections;
namespace RiverDemo
{
class Rivers
{
[STAThread]
static void Main(string[] args)
{
Set arizona
= new SortedSet(new string[] {"Colorado River"});
Set california
= new SortedSet(new string[] {"Colorado River", "Sacramento River"});
Set colorado
= new SortedSet(new string[] {"Arkansas River",
"Colorado River", "Green River", "Rio Grande"});
Set kansas
= new SortedSet(new string[] {"Arkansas River", "Missouri River"});
Set nevada
= new SortedSet(new string[] {"Colorado River"});
Set newMexico
= new SortedSet(new string[] {"Rio Grande"});
Set utah
= new SortedSet(new string[] {"Colorado River", "Green River",
"San Juan River"});
Set southWest = colorado | newMexico | arizona | utah;
Set midWest = kansas;
Set west = california | nevada;
Set all = southWest | midWest | west;
Print("All rivers:", all);
Print("Rivers in the southwest:", southWest);
Print("Rivers in the west:", west);
Print("Rivers in the midwest:", midWest);
Console.WriteLine();
Print("Of all rivers, these don't pass through Colorado:",all - colorado);
Print("Rivers common to both Colorado and Utah:", colorado & utah);
Print("Rivers in Colorado and Utah that are not shared by both states:",
colorado ^ utah);
Print("Rivers common to Arizona, California, Colorado, Nevada, and Utah:",
arizona & california & colorado & nevada & utah);
Console.WriteLine("The type of this complex operation is: " +
((southWest ^ colorado & california) | kansas).GetType().FullName);
}
private static void Print(string title, Set elements)
{
Console.WriteLine(title);
foreach(object o in elements)
{
Console.WriteLine("\t" + o);
Console.WriteLine();
}
}
}
Although there are other kinds of sets available in the library, the example uses SortedSet
throughout. This is nice for the example, since everything will print neatly in alphabetical order. But you may be wondering what kind of Set
is returned when you "union," "intersect," "exclusive-or," or "minus" two Set
instances. The library always returns a Set
that is the same type as the Set
on the left, unless the left operand is null
, in which case it returns the type of the Set
on the right.
What this means is that since we are using SortedSet
instances, we will always get SortedSet
instances when we combine sets using the binary operators. So the output in our example will always be in alphabetical order, just as you would expect.
Here is the output from running the example:
All rivers:
Arkansas River
Colorado River
Green River
Missouri River
Rio Grande
Sacramento River
San Juan River
Rivers in the southwest:
Arkansas River
Colorado River
Green River
Rio Grande
San Juan River
Rivers in the west:
Colorado River
Sacramento River
Rivers in the midwest:
Arkansas River
Missouri River
Of all rivers, these don't pass through Colorado:
Missouri River
Sacramento River
San Juan River
Rivers common to both Colorado and Utah:
Colorado River
Green River
Rivers in Colorado and Utah that are not shared by both states:
Arkansas River
Rio Grande
San Juan River
Rivers common to Arizona, California, Colorado, Nevada, and Utah:
Colorado River
The type of this complex operation is:
Iesi.Collections.SortedSet
Press any key to continue
There is an additional example and a lot more technical information included in the documentation. The source code is pretty easy to follow as well. All the hard work of searching and sorting is performed by classes that are already present in the .NET Framework, so none of the code is particularly difficult or tricky. If you have a question that is not covered by the documentation, it should not take more than 5 or 10 minutes to discover the answer by reading the code.
Enjoy!