Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / C#5.0

Know your collections: from IEnumerable to List and beyond

4.85/5 (30 votes)
27 Oct 2014CPOL11 min read 52.7K   356  
Sorting out the confusion behind collections

Introduction

Many times I've seen method parameters and return values that made no sense, and hated the fact that changing a collection type in one place force me to start changing method signatures in different places. 

I've been wanting to write this article for a while, in order to  help debunk the mystique aura that surrounds some of the collections and their meaning

Note: you can jump straight to the comparison images at the end by clicking here.

A word about Generics

Generics were introduced in .Net 2, and where a normal list would be List, a generic list would look like List<T> where T is a type (in the following examples, I'm using the Toy as the type). The generic version will only accept members of the type you defined, so if you defined a list of ints: List<int> , you cannot add a string to it, only ints.

In this article I'm talking mainly about the generic versions, but I'll point out the non generics ones to avoid confusion and to clarify my meaning (Cheers Georgi).

Our "goto" analogy

Let's assume a scenario involving a father (you as a developer), a son, an unwanted operation (cleaning the room), and a dangling carrot (a promised reward).

The general gist goes something like this: The father promises the son a toy if he cleans his room (which obviously the son is reluctant to do).

Note: Please, do not take parenting advice from this article. I will not be held responsible for that.

Warming the engines

A collection is a bunch of related objects. In order not to confuse it with the actual MSDN Collection class, I'll use italics for the concept. It makes our life easier and simpler to treat objects that are related, as a collection of that type.  

Let's assume we have this following toy struct:

C#
struct Toy
{
    public string     Name       { get; set; }
    public int        Cost       { get; set; }
    public NoiseLevel NoiseLevel { get; set; }
}    

along with this enumartion:

C#
enum NoiseLevel
{
    Quiet     , // You'll never hear it
    Normal    , // It's a toy
    Noisy     , // You'll secretly take the batteries away
    BanFriend , // If somebody gave this to your offspring, you'll never talk to them again
}

If we're dealing with a collection of toys (either buying, selling, finding the most expensive, or deciding which friends are allowed to be invited for dinner), we can bunch them together so it'll be easier to work with them.

Here are some collection we can use:

C#
Toy[] toyArray = new Toy[10];
Collection<Toy> toyCollection = new Collection<Toy>(); // Generic
List<Toy> toyList = new List<Toy>(10);                 // Generic

IEnumerable<T> and IEnumerator<T> (and non generic counterparts)

The usual use for a collection is to do something with its member. The most basic interface for collections is the IEnumerable<T> Interface .The only method it has is the GetEnumerator that return an IEnumerator<T> that lets you traverse your collection.

This is the most "abstract" notion of a collection and it only lets you go through it.

If you're not familiar with IEnumerator, it's similar to having  a counter in a simple for loop like the following:

C#
Toy[] toyArray = new Toy[10];
for (int counter = 0; counter < 10; counter++)
{
     toyArray[counter] = new Toy();    
}

I do suggest you follow the link though to read about the IEnumerator.

The IEnumerator has a Current property (what am I pointing to), and its methods are MoveNext, and Reset.

So, if you're dealing with anything that implements IEnumerable, (generic or not) you know you can get the current object, and you can either move forward or start from zero. The most simple useful example that pops to mind would be using the Enumerable.Range method  :

C#
var one_to_ten = Enumerable.Range(1, 10);

This will return an IEnumerable of type Int, which will hold the numbers 1 to 10 (note: this is lazy, so if you're not using it, it won't get called).

A (somewhat sexist) example would be a woman with shoes. She doesn't care how many pairs she has, as long as she can get through all of them, picking the next pair every day. Since I feel bad about this, the male counterpart would be a man with video games. There are thousands out there, but as long as he can get the next and play it, he's happy.

Note: At this level, the only real difference between the generic and non generic versios is that the non generic version has 4 extension methods (AsParallel, AsQueryable, Cast<TResult> and OfType<TResult>) . The generic version has many more (enough so I won't list them here). Feel free to have a look in the MSDN page, or in your intellisense.

You can read more on the official MSDN pages: Generic version, Non-Generic version.

ICollection<T>

This is the next step up. It implements IEnumerable<T> and IEnumerable, and will add the following Properties: Count, IsReadOnly.

The new methods you can use with it are: Add, Clear, Contains, CopyTo and Remove.

This is a bit more tangible than an Enumerator, and is something you can play with, add to, remove from, and check to see if an object exists within that collection. If you don't care about where in the collection your objects are, this is what you'll want to use.

An example would be eggs in an egg tray. While baking, you want to know that you have 6 in total, and you want to go through them and add them to the dough, but you don't care about their actual position or the order in which they go in.

The ICollection (non generic) interface is not very useful in my opinion. It adds the Count, IsSynchronized and SyncRoot properties, and only the CopyTo() method, which will let you copy your collection into an array.

You can read more on the official pages on MSDN: Generic version, Non-Generic version.

IList<T>

This interface implements both IEnumerable<T> and ICollection<T>, and will add the following property: Item.

The new methods you can use are: IndexOf, Insert, RemoveAt.

This interface gives you an even more "real" feeling of the collection, and you can now see what's where, add at specific indexes, and remove specific items by index.

An example would be: you are playing cards with your friends, and you know the 5fth card in the deck is a joker. Knowing it's in the deck is not enough. If you're unscrupulous, you will use RemoveAt(4) next time you draw a card for yourself.

The non generic interface implements both IEnumerable and ICollection, and will add the following properties: IsFixedSize, IsReadOnly,and Item.

Method wise, the non generic interface looks like it's generic counterpart (so 7 new methods), but 4 of them are simply ones that were not implemented on the non generic ICollection.

You can read more on the official page on MSDN: generic version and non-generic version.

ISet<T>

This interface provides methods for implementing sets, which are collections that have unique elements and specific operations. The HashSet<T> and SortedSet<T> collections implement this interface.

This is useful for problems in mathematics where you're dealing with sets. Head over to this MSDN page for more, and do note that a non generic version of this does not exist (as in, no ISet).

I'm adding this since in the comparison screenshots I've added the differences with the other collections we've discussed.

Enumerable<T>, Collection<T>, List<T>

These are the classes that you'll usually use with collections, and they simply implement the interfaces we discussed before.

Array

An array is a class that implements: ICloneable, IList, ICollection, IEnumerable, IStructuralComparable, IStructuralEquatable.

It is NOT a part of the System.Collections, and its size is fixed. Arrays are probably the oldest form of "collections", but they have some subtle issues since their implementation changed in different versions or .Net. Arrays aren't generic, but they kinda are as well. For a long (yet interesting) answer about why there are no generic arrays, head over to this question on stack overflow.

You'll usually want an array when dealing with a specific size of objects. Let's say you are only allowed to borrow 5 books from the library. A Book[] of size 5 would be a nice way to represent that. Another option is when you're getting the array back from a method or a library.

According to MSDN, another good cases are when Add and Remove aren't necessary.

You can read more on the official page on MSDN, and be sure to check the Array Usage Guidelines as well.

The rest of the bunch:

You can find the rest of the collection types in .Net on the System.Collections Namespace page.

Visual aid and summing it all up

Each property/method will have the color of the parent collection where it was defined. For example, the Item property is green, since it was defined in IList<T>, where as on the same column, the Count is blue, since it was defined on the ICollection<T> interface.

Here is the Generic comparison table:

Image 1

Here is the Non Generic counterpart:

Image 2

Taking off

Now that we've covered the basics, we'll go back to our "what to use when" issue, using our analogy.

If you're the father, and your son isn't willing to clean his room, you might be inclined to bribe him, and offer him a reward for doing what he should have done anyways. Which seems like the better option:

  1. If you'll clean your room, I'll get you a toy.
  2. If you'll clean your room, I'll get you a set of Pearl Drums and a matching outfit.

Assuming your kid trusts in you, and you went with option number 2, you're in trouble ( unless you have the money to buy the drums & you have no neighbours). If you settled with option number 1. you can get him a nice  unicorn rubber duck instead, ensuring he'll become an awesome developer when he grows up (If you don't know why, read this and this).

The moral of the story is : don't promise what you can't deliver, and try to be as vague as possible. This also means that it's better to accept and return an interface rather than a concrete implementation. For example:

C#
// Good:

public IList<int> GetMeSomeInts() {...};

// Not as good:

public List<int> GetMeSomeInts() {...};

Keep in mind, the above holds if you need what IList gives you, otherwise, return an IEnumerable.

Look daddy, I'm flying ...

So, you are coding ... you decide you want to have a bunch of toys, and do something with them. You might instinctively write something like:

C#
List<Toy> toyList = GetToyList();
Toy expensive = GetMostExpensiveToy(toyList);

where the method looks like :

C#
public Toy GetMostExpensiveToy(List<Toy> aCollection)
{
    return aCollection.OrderBy(toy => toy.Price).Last();
}

Congratulations, you just promised your kid the drum set with all the whistles and bells.

Let's quickly analyse what happened here:

  1. You have a collection of items
  2. You pass them to a method that only iterates through the items, looking for something
  3. you return an object

If someone tries to call you method with an Array ...it will fail. A Collection ... fail ... an IEnumerable ... you guessed it ... fail again ... You locked yourself into passing a List, even though you don't really need it. If you would have used any of the methods that a List actually needs (IndexOf, Insert, RemoveAt), that's another story, but in this instance, it was redundant.
A side effect of this kind of coding is having to call ToList() or similar things on different objects you have, because the method you're trying to call takes a List<something>, while you have an Array<something> for example.

A better approach would be to write your method like this:

C#
public Toy GetMostExpensiveToy(IEnumerable<Toy> aCollection)
{
    return aCollection.OrderBy(toy => toy.Price).Last();
}

Now you can call it with an array, collection, list, or whatever implements the IEnumerable, and it will just work.

The same logic goes for return values. If you don't need a specific type, return a wider one (so, instead of returning a list, return an IEnumerable for example. It all depends on what you're actually trying to do with that collection.

Keeping this in mind will help you to actually use what you need, without paying the extra price for features you don't need, while having the extra benefit of keeping your design and code flexible and less prone to breaking due to changes.

Extra credits:

I highly suggest going through MSDN Guidelines for Collections. It will shed some light and some of the Do's and Don't do finer points about collections.

Returning empty collections

In some cases your method will have to return an empty collection. If you're buying toys to your own offspring, you'll be looking only at toys belonging to the NoiseLevel.Quiet for example. Let's assume you are implementing a method for an online store, that can take as a parameter the noise level of the toy, so people can search for only NoiseLevel.Quiet for example.

If you're dealing with a List<Toy>, it's very easy to return an empty list. Just write something like return new List<Toy>(); and you're done.

What happens if you're trying to return an empty IEnumerable<Toy> though?

Two options come to mind.

  • You could write:
C#
return Enumerable.Empty<Toy>();
  • You could use yield break;
C#
IEnumerable<Toy> GetQuietToys(NoiseLevel aNoiseLevel) {

    var matching_products = CallDbHere();
    
    // Let's assume that there are no matching products, we'll
    // return an empty IEnumerable<Toy> using yield:
    yield break;
    
    // The above is the same as :
    //
    //    return Enumerable.Empty<Toy>();
    //
}

Code comments:

In the attached project you can see some examples about the different use of Array, Collection and List, with regards to the dummy Toy class.

In the "Get Collection methods" region, you'll notice that each method return the specific type as its name suggests. This is done for both creating the specific collection variables, and to show when / why you'll want to return a specific Collection from a method.

In the "Get Toy methods" section, you'll notice that all the parameters are of type IEnumerable<Toy>, so you can actually call them with any of the 3 collection variables, and they will still work.

Here's a screenshot of the output:

Image 3

Wrap up

If you found this useful, please feel free to vote, bookmark, and/or leave a message.

I hope you'll never promise drums to kids from this point on ;)

History

  • 26th October, 2014: Added screenshots, examples and clarification about generics..
  • 21th October, 2014: some typos, and edited some paragraphs for readability / added examples.
  • 16th October, 2014: Initial release.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)