Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / programming / algorithm

Standard Deviation Extension for Enumerable

4.67/5 (3 votes)
7 Jun 2013CPOL 24K  
Calcution of a standard deviation and filtering outliers in a LINQ-style.

Introduction  

The following class provides two extensions to the .NET Enumerable class:

  1. Standard deviation calculation.
  2. Outlier removal using a k-sigma filter (which of course becomes a three-sigma rule for k=3).

See http://en.wikipedia.org/wiki/Three_sigma_rule for some basics. Please use the message board below to post suggestions or report bugs. Have fun!

Using the code 

Source code:

C#
using System;
using System.Collections.Generic;
using System.Linq;
public static class StandardDeviationEnumerableExtensions
{
    /// <summary>
    /// Calculates a standard deviation of elements, using a specified selector.
    /// </summary>
    public static double StandardDeviation<T>(
      this IEnumerable<T> enumerable, Func<T, double> selector)
    {
        double sum = 0;
        double average = enumerable.Average(selector);
        int N = 0;
        foreach (T item in enumerable)
        {   double diff= selector(item) - average;
            sum += diff*diff;
            N++;
        }
        return N == 0 ? 0 : Math.Sqrt(sum / N);
    }
    /// <summary>
    /// Filters elements to remove outliers. The enumeration will be
    /// selected three times, first to calculate an average, second
    /// for a standard deviation, and third to yield remiaining elements. The outliers are these
    /// elements which are further from an average than k*(standard deviation). Set k=3 for
    /// standard three-sigma rule.
    /// </summary>
    public static IEnumerable<T> SkipOutliers<T>(
       this IEnumerable<T> enumerable, double k, Func<T, double> selector)
    {
        // Duplicating a SD code to avoid calculating an average twice.
        double sum = 0;
        double average = enumerable.Average(selector);
        int N = 0;
        foreach (T item in enumerable)
        {   double diff = selector(item) - average;
            sum += diff*diff;
            N++;
        }
        double SD = N == 0 ? 0 : Math.Sqrt(sum / N);
        double delta = k * SD;
        foreach (T item in enumerable)
        {
            if (Math.Abs(selector(item) - average) <= delta)
                yield return item;
        }
    }
}

Usage:

C#
IEnumerable<double> results = new double[] { 1, 1.1, 1.2, 0.9, 2, 0.8 };
double[] filtered;
// contains all elements
filtered = results.SkipOutliers(k: 3, selector: result => result).ToArray();
// contains all elements except 2.0. That is, filtered={ 1, 1.1, 1.2, 0.9, 0.8 }
filtered = results.SkipOutliers(k: 2, selector: result => result).ToArray();
// contains just one element, 1.2, which is closest to an average. That is, filtered={ 1.2 }
filtered = results.SkipOutliers(k: 0.1, selector: result => result).ToArray();
// a singleton is always equal to it's average, so it's yielded even with k==0.
// That is, filtered={ 1.2 }
filtered = filtered.SkipOutliers(k: 0, selector: result => result).ToArray();

So, with k parameter you can adjust how strict the filtering is. If k==0, then only those elements which are equal to an average are yielded. However, do not use k==0 because doubles should not be tested for equality in this way.

History

  • 2013-06-03 -- Original version posted.
  • 2013-06-04 -- Possible unwanted division by zero bug-fix.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)