Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / C#

A Simple Moving Average Algorithm

4.64/5 (43 votes)
4 Mar 2007CPOL6 min read 1   3.5K  
A simple moving average algorithm

Introduction

I needed a running average to monitor CPU utilization and ended up writing a small class for doing the calculation. In writing the code, I realized that even something as simple as a running average algorithm requires some thought to writing it well. So I have written this article for beginners, to illustrate how even simple algorithms need some thought.

What is a Running Average?

A running average (also called a moving average) can be implemented in different ways. For an in-depth description, refer to wikipedia.

Simply Moving Average

A simple moving average is the unweighted mean (the sum of all items in a list divided by the number of items in the list) of the previous n data points. This is the easiest running average to implement and is the one I'm illustrating in this article.

Weighted Moving Average

A weighted moving average is an average in which the data points in the list are given different multiplying factors. This has the affect of making some items in the list more important (given more weight) than others. For example, you may wish to have older values to have more weight than newer ones, or vice-versa.

Brute Force Implementation

A simple moving average takes the sample values and divides them by the sample count. A basic implementation might look something like this:

C#
using System;
using System.Collections.Generic;
using System.Text;

namespace AveragerTest
{
  public class BruteForce
  {
    protected float[] samples;
    protected int idx;

    public float Average
    {
      get
      {
        float total=0;

        for (int i=0; i<samples.Length; i++)
        {
          total+=samples[i];
        }

        return total/samples.Length;
      }
    }

    public BruteForce(int numSamples)
    {
      if (numSamples <= 0)
      {
        throw new ArgumentOutOfRangeException(
           "numSamples can't be negative or 0.");
      }

      samples = new float[numSamples];
      idx = 0;
    }

    public void AddSample(float val)
    {
      samples[idx] = val;

      if (++idx == samples.Length)
      {
        idx = 0;
      }
    }
  }
}

In fact, this reminds me a lot of an algorithm I wrote when I was first starting programming in Commodore BASIC! There are several problems with this algorithm:

  • Until the sample array is completely loaded, the Average calculation includes unloaded (and zero) sample values. This results in the wrong average.
  • The algorithm is very inefficient for large sample sizes, having to total the samples every time the average is taken.
  • The algorithm implements a circular list. Wouldn't it be nice to extract the implementation of the circular list from the implementation of the running average algorithm?
  • There is no good way to replace the simple running average with, say, a weighted running average, because the class does not implement an interface that promotes use-case flexibility.

A Better Implementation

Separating Out the Collection and Iteration Logic

The first step, in my opinion, would be to separate out the iteration logic. We have to be careful here because the circular list that we need isn't just an iterator--we actually have to set the values in the list as well. An iterator does not allow this. From MSDN:

An enumerator remains valid as long as the collection remains unchanged. If changes are made to the collection, such as adding, modifying, or deleting elements, the enumerator is irrecoverably invalidated and the next call to MoveNext or Reset throws an InvalidOperationException.

Changing the management of the sample list to a circular list eliminates two problems:

  • The list management is decoupled from the algorithm.
  • We can use the Count method of the circular list to obtain the count of loaded items, not just the total number of items in the list.

In addition, it leverages code re-use and helps debugging--we can isolate testing of the circular list from problems in the running average class. In other words, it would be easy to write unit tests for the circular list, validate its operation, then write unit tests to verify that the averager is working correctly.

The new code looks like this:

C#
using System;
using System.Collections.Generic;
using System.Text;

using Clifton.Collections.Generic;

namespace AveragerTest
{
  public class BruteForce
  {
    CircularList<float> samples;

    public float Average
    {
      get
      {
        float total=0;

        foreach(float v in samples)
        {
          total+=v;
        }

        return total/samples.Count;
      }
    }

    public BruteForce(int numSamples)
    {
      if (numSamples <= 0)
      {
        throw new ArgumentOutOfRangeException(
           "numSamples can't be negative or 0.");
      }

      samples = new CircularList<float>(numSamples);
    }

    public void AddSample(float val)
    {
      samples.Value = val;
      samples.Next();
    }
  }
}

This is a much cleaner implementation already!

Performance

The next issue is performance. For a large sample size, we don't want to be iterating over the entire sample to get the total. It would make more sense to manage a running total. This requires a little bit of logic when adding a sample value, to determine whether a sample already exists so that it can be subtracted from the total:

C#
public void AddSample(float val)
{
  if (samples.Count == samples.Length)
  {
    total -= samples.Value;
  }

  samples.Value = val;
  total += val;
  samples.Next();
}

and the Average getter no longer needs to sum up all the samples:

C#
public float Average
{
  get { return total/samples.Count; }
}

But Is It Really Better Performance?

That's always the question, and it depends on usage! Arguably, the first method might be better performing if ratio of samples being added to the number of times the average is taken is very high. For example, you might be adding a million samples between taking average, and the sample size might only be a hundred items. In this scenario, the overhead of keeping the total current probably far outweighs the overhead of calculating the total on the fly.

So, the right answer regarding performance is to select the algorithm that works well for your requirements. The algorithm I describe above is what I consider to be a typical case, where the average is taken after every new sample, and in this case, it does make sense to use a running total, especially if the sample size is large.

Improving Usability

Usability can be improved by providing an interface that the programmer can use rather than the class directly. By using an interface, the programmer can create the correct moving average algorithm without the code needing to know which (simple or weighted) was actually constructed. Some other part of the application can make this decision.

C#
public interface IMovingAverage
{
  float Average { get;}

  void AddSample(float val);
  void ClearSamples();
  void InitializeSamples(float val);
}

Comments, Error Checking, And Bells And Whistles

The final implementation, with some additional error checking and the ability to clear the average or pre-load the samples with a particular value, looks like this:

C#
using System;

using Clifton.Collections.Generic;

namespace Clifton.Tools.Data
{
  public class SimpleMovingAverage : IMovingAverage
  {
    CircularList<float> samples;
    protected float total;

    /// <summary>
    /// Get the average for the current number of samples.
    /// </summary>
    public float Average
    {
      get
      {
        if (samples.Count == 0)
        {
          throw new ApplicationException("Number of samples is 0.");
        }

        return total / samples.Count;
      }
    }

    /// <summary>
    /// Constructor, initializing the sample size to the specified number.
    /// </summary>
    public SimpleMovingAverage(int numSamples)
    {
      if (numSamples <= 0)
      {
        throw new ArgumentOutOfRangeException(
             "numSamples can't be negative or 0.");
      }

      samples = new CircularList<float>(numSamples);
      total = 0;
    }

    /// <summary>
    /// Adds a sample to the sample collection.
    /// </summary>
    public void AddSample(float val)
    {
      if (samples.Count == samples.Length)
      {
        total -= samples.Value;
      }

      samples.Value = val;
      total += val;
      samples.Next();
    }

    /// <summary>
    /// Clears all samples to 0.
    /// </summary>
    public void ClearSamples()
    {
      total = 0;
      samples.Clear();
    }

    /// <summary>
    /// Initializes all samples to the specified value.
    /// </summary>
    public void InitializeSamples(float v)
    {
      samples.SetAll(v);
      total = v * samples.Length;
    }
  }
}

Usage

A simple program demonstrating the usage of the SimpleMovingAverage class is:

C#
using System;

using Clifton.Collections.Generic;
using Clifton.Tools.Data;

namespace AveragerTest
{
  class Program
  {
    static void Main(string[] args)
    {
      IMovingAverage avg = new SimpleMovingAverage(10);
      Test(avg);
    }

    static void Test(IMovingAverage avg)
    {
      for (int i = 0; i < 20; i++)
      {
        avg.AddSample(i);
        Console.WriteLine(avg.Average);
      }
    }
  }
}

Note how the sample size is 10 items but there are 20 samples added to the moving average. The old samples are replaced as new samples are added. As the code illustrates, a SimpleMovingAverage instance is constructed and used to initialize the variable avg, whose type is IMovingAverage. The Test method takes the interface for a parameter. If we implemented, say, a WeightedMovingAverage class, we could change the construction to:

C#
IMovingAverage avg = new WeightedMovingAverage(10);

and call the exact same Test routine to see how the new moving average works. By using the interface, we promote the usability of the code.

All This Work!

Admittedly, this is a lot of work for a simple algorithm. Personally, I try to follow the principle of doing the job right, even if it's a small job, because small jobs quickly turn into large jobs. The old excuse "we don't have the time and the budget", or "it's simple, just bang it out and we'll refactor it later" has proven time and time again to be lies. It does not save time and money to do it wrong first! Refactoring is rarely put into the "new" budget. Bad practices continue on unless they are changed from the beginning. The only reason to hack out an algorithm is for a prototype, so you can learn from it and develop the production algorithm correctly.

History

  • 4th March, 2007: Initial version

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)