Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / C#

Random Sample Extension Method for .NET 3.0 LINQ Queries

2.93/5 (7 votes)
6 Nov 2006CPOL4 min read 1   161  
Extension Method that allows a random sample to be returned from any IEnumerable collection

Introduction

Gathering a random sample of data from a bigger source has many uses, from testing, debugging and marketing. Some examples are as follows:

  • Whilst developing or debugging a LINQ query, it is easier to analyze a smaller result set from a much larger source
  • Running abbreviated unit tests during the daytime hours to improve multi-developer integration time, use a small sample during business hours and the full test data during the nightly builds
  • Marketing - "Lets call 10% of our users and ask them for some feedback or references"

I want to be able to write a query like this to get a random sample of 10 names from a bigger list of names in a string[]:

C#
var sample = listOfNames.RandomSample(10);

The code described in this article adds an extension method to IEnumerable that allows you to generate a random sampling of elements from a collection of any type.

Background - Extension Methods

The next release of C# introduces a new feature called Extension Methods. Normally you can call methods on an object instance as long as that class, or one of its ancestor classes provided a method in-scope and of that name. Extension Methods allow us to "add" methods to types without changing the original class, or creating a sub-classed derivative of our own.

In the following example, I added a new method to the .NET's string class that returns a new string with Hello as the prefix.

C#
public static class Extensions
{
    public static string Hello(this string s) {
         return "Hello " + s;
    }
}

After creating the extension method, I can use the following syntax to add Hello … to any string instance object and in this case, write out the resulting string to the Console window.

C#
string name = "Troy";
Console.WriteLine(name.Hello());

The important syntax change above is the first parameter has the this modifier before the type in the arguments list.

The other key points when defining extension methods are:

  • The extension method must be in a class marked as static
  • Each extension method must be marked as static
  • The this modifier must be on the first parameter

At compile time, the C# compiler first looks to see if there are any instance methods that match that name and parameter signature. If no matching method name or signature is found, then the search continues through any namespaces imported with the using clause. If any static methods with the same name have the this modifier for the same type as the instance object's type, then that method will be used.

Our RandomSample extension method will allow any instance object that implements IEnumerable to return another IEnumerable with the number of random sequence elements we request.

Using the Code

When using the RandomSequence extension method, you have two method signatures to choose from:

C#
[IEnumerable object].RandomSample( count, Allow Duplicates )
[IEnumerable object].RandomSample( count, Seed, AllowDuplicates )

count: The number of elements to return, or less if the source list has fewer elements

Allow Duplicates: true or false. If true, an element may be returned more than once if the random generator picks it more than once.

Seed: The initial integer seed for the random sequence generator. If you don't specify the system tick count will be used. If you specify an explicit seed, the sequence will be identical for each call given the same input source list. This is useful for being able to repeat tests with a specific random sequence.

To use this code in your project, download the source for this article and add the following using clause to the code where you wish to call the RandomSequence method:

C#
using Aspiring.Query;

By adding this using clause, our extension method is now in scope and the object that inherits from IEnumerable can utilize its action as in the following code which returns three random names from a list, allowing duplicates (the last argument is true for allowing duplicates if the random sequence decided too, or false to only return each element once):

C#
string[] firstNames = new string[] {"Paul", "Peter", "Mary", "Janet", 
   "Troy", "Adam", "Nick", "Tatham", "Charles" };

var randomNames = firstNames.RandomSample(3, true);

foreach(var name in randomNames) {
    Console.WriteLine(name);
}

Here is the code that implements our RandomSample extension method for IEnumerable objects:

C#
using System;
using System.Collections.Generic;
using System.Text;
using System.Query;

namespace Aspiring.Query
{
    public static class RandomSampleExtensions
    {
        public static IEnumerable<T> RandomSample<T>(
           this IEnumerable<T> source, int count, bool allowDuplicates) {
           if (source == null) throw new ArgumentNullException("source");
           return RandomSampleIterator<T>(source, count, -1, allowDuplicates);
        }

        public static IEnumerable<T> RandomSample<T>(
        this IEnumerable<T> source, int count, int seed, 
           bool allowDuplicates)
           {
           if (source == null) throw new ArgumentNullException("source");
           return RandomSampleIterator<T>(source, count, seed, 
               allowDuplicates);
        }
 
        static IEnumerable<T> RandomSampleIterator<T>(IEnumerable<T> source, 
            int count, int seed, bool allowDuplicates) {
            
            // take a copy of the current list
            List<T> buffer = new List<T>(source);

            // create the "random" generator, time dependent or with 
            // the specified seed
            Random random;
            if (seed < 0)
                random = new Random();
            else
                random = new Random(seed);

            count = Math.Min(count, buffer.Count);

            if (count > 0)
            {
                // iterate count times and "randomly" return one of the 
                // elements
                for (int i = 1; i <= count; i++)
                {
                    // maximum index actually buffer.Count -1 because 
                    // Random.Next will only return values LESS than 
                    // specified.
                    int randomIndex = random.Next(buffer.Count); 
                    yield return buffer[randomIndex];
                    if (!allowDuplicates)
                        // remove the element so it can't be selected a 
                        // second time
                        buffer.RemoveAt(randomIndex);                         
                }
            }
        }
    }
}

Points of Interest

The cornerstone to writing extension methods for LINQ is the .NET 2.0 feature yield return keyword. Each time the framework calls the GetNext() enumerator method, which it does each loop around a ForEach statement, our routine will begin execution from the line after the previous yield return statement. The framework maintains state between calls, so authoring interesting enumerators like this becomes a fraction of the work that would have been required in .NET 1.1.

I have written a library with many more useful extension methods and posted them on my Blog. But, they all follow the same pattern. Extend IEnumerable and build an iterator pattern using the yield return statement.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)