Introduction
The following class provides two extensions to the .NET Enumerable
class:
- Standard deviation calculation.
- Outlier removal using a k-sigma filter (which of course becomes a three-sigma rule for k=3).
See http://en.wikipedia.org/wiki/Three_sigma_rule for some basics. Please use the message board below to post suggestions or report bugs. Have fun!
Using the code
Source code:
using System;
using System.Collections.Generic;
using System.Linq;
public static class StandardDeviationEnumerableExtensions
{
public static double StandardDeviation<T>(
this IEnumerable<T> enumerable, Func<T, double> selector)
{
double sum = 0;
double average = enumerable.Average(selector);
int N = 0;
foreach (T item in enumerable)
{ double diff= selector(item) - average;
sum += diff*diff;
N++;
}
return N == 0 ? 0 : Math.Sqrt(sum / N);
}
public static IEnumerable<T> SkipOutliers<T>(
this IEnumerable<T> enumerable, double k, Func<T, double> selector)
{
double sum = 0;
double average = enumerable.Average(selector);
int N = 0;
foreach (T item in enumerable)
{ double diff = selector(item) - average;
sum += diff*diff;
N++;
}
double SD = N == 0 ? 0 : Math.Sqrt(sum / N);
double delta = k * SD;
foreach (T item in enumerable)
{
if (Math.Abs(selector(item) - average) <= delta)
yield return item;
}
}
}
Usage:
IEnumerable<double> results = new double[] { 1, 1.1, 1.2, 0.9, 2, 0.8 };
double[] filtered;
filtered = results.SkipOutliers(k: 3, selector: result => result).ToArray();
filtered = results.SkipOutliers(k: 2, selector: result => result).ToArray();
filtered = results.SkipOutliers(k: 0.1, selector: result => result).ToArray();
filtered = filtered.SkipOutliers(k: 0, selector: result => result).ToArray();
So, with k parameter you can adjust how strict the filtering is. If k==0, then only those elements which are equal to an average are yielded. However, do not use k==0 because doubles should not be tested for equality in this way.
History
- 2013-06-03 -- Original version posted.
- 2013-06-04 -- Possible unwanted division by zero bug-fix.