(untagged)

Add Support for "Set" Collections to .NET

JasonSmith

0.00/5 (No votes)

28 Mar 2004

An implementation of "Sets" for .NET

Introduction

The System.Collections namespace in the .NET Framework provides a number of collection types that are extremely useful for manipulating data in memory. However, there is one type of collection that is conspicuously missing from System.Collections: the Set.

A Set is a collection that contains no duplicate elements. It is loosely modelled after the mathematical concept of a "set." This implementation is based on the Java Set interface definition, so if you are also a Java programmer, this may seem familiar. The major differences are that this library does not use interfaces, and it provides a number of "standard" Set operators that the Java library neglected to include.

Sets come in handy when an Array or a List won't quite fit the bill. Arrays in .NET have a fixed length, making it tedious to add and remove elements. Lists allow you add new objects easily, but you can have numerous duplicate elements, which is undesirable for some types of problems. Searching Arrays or Lists for elements is just plain slow for large data sets, requiring a linear search. You could keep the array sorted and use a binary search, but that is often more trouble than it is worth (especially since this library, and the .NET Framework, provide better ways that are already written for you).

With sets, adding elements, removing elements, and checking for the existence of an element is fast and simple. You can mix and match the elements in different sets using the supported mathematical set operators: union, intersection, exclusive-or, and minus. See the example below for more information.

You will see some interesting side effects with different Set implementations in this library, depending on the underlying search algorithm. For example, if you choose a sort-based Set, the elements will come out in sort order when you iterate using foreach. If you use a hash-based Set, the elements will come out in no particular order, but checking for inclusion will fastest when dealing with large data sets. If you use a list-based Set, elements will come out in the order you put them in when you iterate. Additionally, list-based sets are fastest for very small data sets (up to about 10 elements), but get slower very quickly as the number of contained elements increases. To get the best of both worlds, the library provides a Set type that uses lists for small data sets and switches to a hash-based algorithm when the data set gets large enough to warrant it.

Using the code

The Iesi.Collections library has the following object hierarchy:

Set - The abstract class from which all other sets inherit.
- DictionarySet - An abstract class that lets you create new types of Set classes based on an arbitrary implementation of IDictionary.
  - HashedSet - Based on a HashTable.
  - SortedSet - Based on a SortedList.
  - ListSet - Based on a ListDictionary.
  - HybridSet - Based on a HybridDictionary.
- ImmutableSet - A wrapper for other Sets that makes them read-only.
- SynchronizedSet - A wrapper for other Sets that makes them (mostly) thread-safe.

You will probably find the HashedSet and HybridSet to be the most useful implementations. They can contain any object that is immutable, can be compared using Equals(), and has a valid implementation of GetHashCode(). All of the normal value types and many objects already meet these requirements, so for most data types, HashedSet and HybridSet just work. The only downside to using them is that you can't predict the order of iteration.

SortedSet is useful if you are interested in the iteration order of your Set, but it imposes some different requirements, and they are usually more difficult to meet. In addition to being immutable, elements in a SortedSet must also implement IComparable. Further, they must actually be comparable without throwing an exception. So you would not be able to put string values and int values into the same Set instance.

ListSet is useful for very small data sets. When a ListSet contains less than 10 elements, it is actually going to be faster than any of the other implementations. However, once you get above 10 elements, the run time for many of the set operations increases as the square of the data size. So an operation on a ListSet containing 1,000 elements would be roughly 10,000 times slower than an operation on a ListSet containing 10 items.

ImmutableSet and SynchronizedSet are specialized wrappers. They contain other sets, which then do all the real work. ImmutableSet wraps an internal Set to make it read-only. SynchronizedSet wraps all the functions of an internal Set to synchronize them. This allows the Set to be used by more than one thread. See the documentation for more information on this, since there are special considerations for enumerating collections that are in use by multiple threads.

If you are interested in creating your own Set types, this library supports that. If you want to do it from scratch, you can extend Set and implement all the abstract functions. If you want to create a new Set type based on an existing IDictionary implementation, extend DictionarySet. If you just want to add some new functionality to an existing, working Set implementation, choose one of HashedSet, HybridSet, ListSet, or SortedSet to extend.

The example below demonstrates creating new sets and manipulating them using set operators. The currently supported set operators are described briefly in the following table:

Set Operators

Name	Symbol	Description
`Union()`	\|	`A \| B` returns a `Set` containing all the elements from both input sets.
`Intersect()`	&	`A & B` returns a `Set` containing all the elements that are in both `A` and `B`.
`ExclusiveOr()`	^	`A ^ B` returns a `Set` containing the elements that are in `A` or in `B`, but are not in both `A` and `B`.
`Minus()`	-	`A - B` returns a `Set` containing all the elements of `A`, with the elements from `B` removed.
`Equals()`		You can use `Equals()` to compare two `Set` instances for equivalence. That is, `A.Equals(B)` is true if `A` and `B` contain exactly the same elements. The '==' and '!=' operators are not overridden by design.

The example uses Set instances to represent states in the southwestern United States. Each Set holds the names of the major rivers in the state. It then uses the basic state-river information to derive all sorts of fun facts about the rivers.

using System;
using Iesi.Collections;
namespace RiverDemo
{    
class Rivers
{
   [STAThread]
   static void Main(string[] args)
   {
     //Use Arrays (which are ICollection objects) to quickly initialize.

     Set arizona   
        = new SortedSet(new string[] {"Colorado River"});
     Set california
        = new SortedSet(new string[] {"Colorado River", "Sacramento River"});
     Set colorado
          = new SortedSet(new string[] {"Arkansas River", 
                      "Colorado River", "Green River", "Rio Grande"});
     Set kansas
            = new SortedSet(new string[] {"Arkansas River", "Missouri River"});
     Set nevada
            = new SortedSet(new string[] {"Colorado River"});
     Set newMexico
         = new SortedSet(new string[] {"Rio Grande"});
     Set utah
              = new SortedSet(new string[] {"Colorado River", "Green River", 
                              "San Juan River"});
     //Rivers by region.

     Set southWest = colorado | newMexico | arizona | utah;
     Set midWest = kansas;
     Set west = california | nevada;
     //All rivers (at least for the demo).

    Set all = southWest | midWest | west;
    Print("All rivers:", all);
    Print("Rivers in the southwest:", southWest);
    Print("Rivers in the west:", west);
    Print("Rivers in the midwest:", midWest);
    Console.WriteLine();

    //Use the '-' operator to subtract the rivers in Colorado from 

    //the set of all rivers.

    Print("Of all rivers, these don't pass through Colorado:",all - colorado);

   //Use the '&' operator to find rivers that are in Colorado AND in Utah.

   //A river must be present in both states, not just one.

   Print("Rivers common to both Colorado and Utah:", colorado & utah);

   //use the '^' operator to find rivers that are in Colorado OR Utah,

   //but not in both.

   Print("Rivers in Colorado and Utah that are not shared by both states:",
          colorado ^ utah);

   //Use the '&' operator to discover which rivers are present in Arizona, 

   // California,Colorado, Nevada, and Utah.  The river must be present in 

   // all states to be counted.

   Print("Rivers common to Arizona, California, Colorado, Nevada, and Utah:", 
         arizona & california & colorado & nevada & utah);
   //Just to prove to you that operators always return like types, let's do a

   //complex Set operation and look at the type of the result:

   Console.WriteLine("The type of this complex operation is: " + 
   ((southWest ^ colorado & california) | kansas).GetType().FullName);
}
    private static void Print(string title, Set elements)
    {
       Console.WriteLine(title);
       foreach(object o in elements)
       {
           Console.WriteLine("\t" + o);
           Console.WriteLine();
       }    
    }
}

Although there are other kinds of sets available in the library, the example uses SortedSet throughout. This is nice for the example, since everything will print neatly in alphabetical order. But you may be wondering what kind of Set is returned when you "union," "intersect," "exclusive-or," or "minus" two Set instances. The library always returns a Set that is the same type as the Set on the left, unless the left operand is null, in which case it returns the type of the Set on the right.

What this means is that since we are using SortedSet instances, we will always get SortedSet instances when we combine sets using the binary operators. So the output in our example will always be in alphabetical order, just as you would expect.

Here is the output from running the example:

All rivers:
	Arkansas River
	Colorado River
	Green River
	Missouri River
	Rio Grande
	Sacramento River
	San Juan River

Rivers in the southwest:
	Arkansas River
	Colorado River
	Green River
	Rio Grande
	San Juan River

Rivers in the west:
	Colorado River
	Sacramento River

Rivers in the midwest:
	Arkansas River
	Missouri River

Of all rivers, these don't pass through Colorado:
	Missouri River
	Sacramento River
	San Juan River

Rivers common to both Colorado and Utah:
	Colorado River
	Green River

Rivers in Colorado and Utah that are not shared by both states:
	Arkansas River
	Rio Grande
	San Juan River

Rivers common to Arizona, California, Colorado, Nevada, and Utah:
	Colorado River

The type of this complex operation is: 
	Iesi.Collections.SortedSet
Press any key to continue

There is an additional example and a lot more technical information included in the documentation. The source code is pretty easy to follow as well. All the hard work of searching and sorting is performed by classes that are already present in the .NET Framework, so none of the code is particularly difficult or tricky. If you have a question that is not covered by the documentation, it should not take more than 5 or 10 minutes to discover the answer by reading the code.

Enjoy!

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

Add Support for &quot;Set&quot; Collections to .NET

Introduction

Using the code

Set Operators

License

Add Support for "Set" Collections to .NET