A Time-series Forecasting Library in C#

Kerry Cakebread

4.87/5 (15 votes)

1 Jul 2010CPOL5 min read

82.3K

4.4K

Class of functions that accept time-series data and return forecast values and error analysis, with allowance made for holdout set testing and n-period extension.

Download demo - 20.81 KB

Introduction

This code provides a basic set of functions which accept a comma-delimited string of time-series values, the number of periods into the future to extend a forecast, and a number of periods to include in a "holdout set" for additional testing (e.g. verifying forecasted values against observed occurrences without prior knowledge of the actuals).

Background

Time-series forecasting methods use historical information only to produce estimates of future values. Time-series forecasting techniques assume the data's past pattern will continue in the future and include specific measures of error which can help users understand how accurate the forecast has been. Some techniques seek to identify underlying patterns such as trend or seasonal adjustments; others attempt to be self-correcting by including calculations about past periods' error in future forecasts.

The code included here addresses several of the most common time-series forecasting techniques, including naive/Bayes, simple moving average, weighted moving average, exponential smoothing, and adaptive rate smoothing.

In the naive/Bayes approach, the current period's value is used as the forecast for the upcoming period.

In a simple moving average, the prior n-number of values are averaged together with equal weight to produce a value for the upcoming period.

In a weighted moving average, a percentage of weight is applied to n-number of prior values by multiplying the weight by the value and summing the results to produce a value for the upcoming period. By applying different weights (which sum to 1.0), past periods can be given different emphasis. If the user wishes to place more weight upon earlier periods, they might use weights of 0.5, 0.3, and 0.2. If the user wishes to place more weight on more recent periods, they might reverse the order of those parameters.

Exponential smoothing is a version of the weighted moving average which gives recent values more weight than earlier values. However, unlike the weighted moving average, it requires only three inputs: the prior period's forecast, the current period's value, and a smoothing factor (alpha), with a value between 0 and 1.0.

Adaptive rate smoothing modifies the exponential smoothing technique by modifying the alpha smoothing parameter for each period's forecast by the inclusion of a Tracking Signal. This is a technique shown to respond much more quickly to step changes while retaining the ability to filter out random noise. For a full discussion of this technique, see Trigg and Leach, 1967 (ref below).

Using the Code

The code provided consists of a class which contains the analysis functions themselves and a demonstration form, which takes comma-separated values and displays them in a grid with an associated forecast and measures of instance-specific error. Five buttons provide samples for how the function libraries are to be called and pass parameters which, though reasonable, should be modified to suit the user's purpose. In addition to the tested set and error, the grid displays forecasted values for a number of periods into the future. Finally, several specific measures of error are calculated and displayed in labels. Users of these functions should have a solid grasp of what the individual error measures indicate in order to properly interpret both these functions and the results.

Some terminology I have used in the functions which may not be immediately obvious to the reader:

Extension - An integer number of periods in the future into which the function should attempt to produce forecasts. The further into the future (higher the number) we attempt to forecast, the higher the probability of error.
Holdout Set - An integer number of periods to withhold from the testable set of observed values (from the end). The functions calculate forecasts for these values without looking at the observed values until after the forecast is generated. In this way, forecasts for a number of periods may be verified against observed values without the inconvenience of having to wait for future periods to occur.

Generally, these functions can be called by passing in a decimal array of time-series data (examples provided), an integer extension, and an integer holdout. Other parameters are documented in the code.

//
// Pass an array of time series data {1,2,3} and get a DataTable of forecasts and error
ForecastTable dt = TimeSeries.simpleMovingAverage(new decimal[3] {1,2,3}, 5, 3, 0);
grdResults.DataSource = dt;

The simpleMovingAverage result (like the other functions in this class) is calculated according to a well-known formula, which is included in the comments.

//
//Simple Moving Average
//
//            ( Dt + D(t-1) + D(t-2) + ... + D(t-n+1) )
//  F(t+1) =  -----------------------------------------
//                              n
public static ForecastTable simpleMovingAverage(decimal[] values, 
	int Extension, int Periods, int Holdout)
{
    ForecastTable dt = new ForecastTable();
    for (Int32 i = 0; i < values.Length + Extension; i++)
    {
        //Insert a row for each value in set
        DataRow row = dt.NewRow();
        dt.Rows.Add(row);
        row.BeginEdit();
        //assign its sequence number
        row["Instance"] = i;
        if (i < values.Length)
        {//processing values which actually occurred
            row["Value"] = values[i];
        }
        //Indicate if this is a holdout row
        row["Holdout"] = (i > (values.Length - Holdout)) && (i < values.Length);
        if (i == 0)
        {//Initialize first row with its own value
            row["Forecast"] = values[i];
        }
        else if (i <= values.Length - Holdout)
        {//processing values which actually occurred, but not in holdout set
            decimal avg = 0;
            DataRow[] rows = dt.Select("Instance>=" + (i - Periods).ToString() + 
		" AND Instance < " + i.ToString(), "Instance");
            foreach (DataRow priorRow in rows)
            {
                avg += (Decimal)priorRow["Value"];
            }
            avg /= rows.Length;
            row["Forecast"] = avg;
        }
        else
        {//must be in the holdout set or the extension
            decimal avg = 0;
            //get the Periods-prior rows and calculate an average actual value
            DataRow[] rows = dt.Select("Instance>=" + (i - Periods).ToString() + 
		" AND Instance < " + i.ToString(), "Instance");
            foreach (DataRow priorRow in rows)
            {
                if ((Int32)priorRow["Instance"] < values.Length)
                {//in the test or holdout set
                    avg += (Decimal)priorRow["Value"];
                }
                else
                {//extension, use forecast since we don't have an actual value
                    avg += (Decimal)priorRow["Forecast"];
                }
            }
            avg /= rows.Length;
            //set the forecasted value
            row["Forecast"] = avg;
        }
        row.EndEdit();
    }
    dt.AcceptChanges();
    return dt;
}

Each of the forecasting functions (naive(), simpleMovingAverage(), weightedMovingAverage(), exponentialSmoothing(), and adaptiveRateSmoothing()) work in a similar manner to initialize early rows with default values--since prior data is not yet available--then calculates a forecast for each value in the testable set. Finally, holdouts and extension values are calculated.

Error analysis is accomplished through the implementation of several measures: MeanSignedError(), MeanAbsoluteError(), MeanPercentError(), MeanAbsolutePercentError(), TrackingSignal(), MeanSquaredError(), CumulativeSignedError(), and CumulativeAbsoluteError().

One of the most useful measures of forecast error is the MeanAbsolutePercentError (MAPE). This value is calculated by summing the absolute value of the percent error for each period's forecast and dividing by the number of periods tested.

//MeanAbsolutePercentError = Sum( |PercentError| ) / n
public static decimal MeanAbsolutePercentError
	(ForecastTable dt, bool Holdout, int IgnoreInitial)
{
    string Filter = "AbsolutePercentError Is Not Null AND Instance > " 
	+ IgnoreInitial.ToString();
    if (Holdout)
        Filter += " AND Holdout=True";
    if (dt.Select(Filter).Length == 0)
        return 1;
    return (Decimal)dt.Compute("AVG(AbsolutePercentError)", Filter);
}

References

Krajewski and Ritzman, Operations Management Processes and Value Chains 7^th edition (2004), Pearson Prentice Hall, Upper Saddle River, NJ, pp 535-581
Trigg and Leach, "Exponential Smoothing with an Adaptive Response Rate", Operational Research, Vol. 18, No. 1, (Mar. 1967), pp 53-59

History

This code has not been thoroughly tested and may contain bugs. It is intended to be instructive about forecasting techniques and should not be relied upon for actual forecasts used in decision-making. It has not been optimized for performance or efficiency and the code as it is written is not particularly elegant. My goal was to make it as easy to read and understand as possible, so that others can create their own functions which implement the concepts of time-series forecasting. That said, I do welcome constructive feedback if you see a bug or some glaring omission, or perhaps you feel I could have explained something more clearly.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)