Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

Resampling and merging time series data using LINQ

0.00/5 (No votes)
29 Apr 2014 1  
The technique presented here is a simple method of resampling and aggregating time series.

Introduction

The technique presented here is a simple method of resampling and aggregating time series data that is built on LINQ.

This technique is useful when you have irregular or gappy time series data that you want to normalize so that there is a regular time interval between each data point.

It is also useful when you want to compare two sets of time series data and you need to have a common time interval to do so.

The code is unit tested and works well to my knowledge, if you find problems please give feedback and I will fix the code and add new tests.

Audience

This code is all in C# and is based on LINQ. To use the code it will be helpful if you are already familiar with using functions such as Select and Zip. See here for an introduction to LINQ.

To understand the implementation of the Resample function you must have an understanding of how LINQ functions work including IEnumerable and the yield statement .

Please note that I prefer to use the method syntax style of LINQ programming and not the query syntax style.

I also have this technique working in JavaScript using linq.js. If you are a JavaScript programmer and you want this, please let me know and I'll consider cleaning up the JavaScript version and adding it to this article.

Background

I have been working on some code to help me analyze and graph financial data. I wanted to compare two sets of data. Unfortunately each set of data has gaps or the data was sampled at different time intervals.

I needed some code that would normalize the time series data to a common time interval so that the data could then be easily compared and aggregated.

After procrastinating for a few weeks and having not found online any existing technique to achieve what I want (surely there must be something out there!) I had a brainwave. What I want to achieve (resampling of time series data) is very similar to something I have implemented or used many times during my game development career. Resampling of time series data is very similar to running a keyframe animation and capturing the resulting animated value at a regular time interval.

Implementation was fast, once I knew what I wanted and I think this a useful technique and worth sharing.

LINQ is my weapon of choice and I have written a LINQ-style function that resamples an input set of data and generates new times series data where the data is sampled at a pre-defined time interval.

After resampling to a normalized time interval LINQ Zip can then easily be used to aggregate multiple sets of data.

Example project

Attached is an example project that demonstrates the technique.

The example project uses the DynamicDataDisplay charting library to display the data. The data I use in the example project is from their StockExchangeSample project. Thanks to those guys for providing the charting library, various example projects and sample data.

Using the code

The following code illustrates how to use the Resample function.

// We start with a data structure to contain our time series data.
// It should contain a property that represents the data.
// Also a property that represents the value to be resampled.
class Data 
{
    public DateTime EventDate { get; set; }
    public double SomeValue { get; set; }
};

// Now we need some time series data.
Data[] timeSeriesData = // ... input time series data ...

// Next we determine the date range to be sampled.
DateTime startDate = // ... some DateTime object ...
DateTime endDate = // ... some DateTime object ...

// We must also decide the time interval for resampling.
TimeSpan timeInterval = TimeSpan.FromDays(1);

// Now we are ready to resample the data.
IEnumerable<Data> resampledTimeSeries = 
    timeSeriesData.
        Resample(
            // Date range...
            startDate,         
            endDate,

            // Time interval...
            timeInterval,    

            // Date selector
            // An anonymous function that 
            // selects the date from a data point.
            data => data.EventDate,

            // Interpolator
            // An anonymous function that interpolates between
            // two data points.
            // t is the a percentage value (range: 0-1) that drives the 
            // interpolation between the data points.
            (curDate, data1, data2, t) =>
                // Here we instantiate and return an output (resampled) data point.
                new Data 
                {
                    // The date is already interpolated for us.
                    EventDate = curDate,

                    // We must interpolate, just doing a simple linear interpolation here.
                    SomeValue = Lerp(data1.SomeValue, data2.SomeValue, t)
                }
        );


// The linear interpolation is defined as follows.
private double Lerp(double v1, double v2, double t)
{
    return v1 + ((v2 - v1) * t);
}

After multiple data sets have been normalized to have the same time interval, it is now possible to use the LINQ Zip operator to aggregate the data sets.

// Prep some time series data.
IEnumerable<Data> resampledData1 = ... resample some data ...
IEnumerable<Data> resampledData2 = ... resample some data ...

// Use LINQ Zip to merge the data sets.
var mergedData =
    resampledData1              // 1st data set.
        .Zip(                 
            resampledData2,     // 2nd data set.     
            (data1, data2) =>   // Create a new merged data set 
                new Data
                {
                    // Both dates should be the same.
                    EventDate = data1.EventDate, 

                    // Compute the difference between the two data sets.
                    // This is just one example of the kind of aggregation 
                    // operation you might want to perform on the data sets.
                    SomeValue = data2.SomeValue - data2.SomeValue
                }
        )
        .ToArray();

You should note that because evaluation of the LINQ IEnumerable is lazy, the resampling only happens as you enumerate the output time series data. This makes the whole technique quite efficient, you only pull out of the enumerable as many data points as you need and only that much will be resampled.

Note the call ToArray() in the previous code snippet, this is simply to force the entire LINQ statement to lazily evaluate which runs the Zip operator which in turn drives the resampling process. Without the call to ToArray() no resampling would actually happen, which can seem a bit counter-intuitive for C# programmers who are new to the idea, but it is a common concept in the functional programming world.

Explaining the code

Now I'll explain the implementation of the Resample function via inline comments.

// The function is an extension method, so it must be defined in a static class.
public static class ResampleExt
{
    // Resample an input time series and create a new time series between two 
    // particular dates sampled at a specified time interval.
    public static IEnumerable<OutputDataT> Resample<InputValueT, OutputDataT>(

        // Input time series to be resampled.
        this IEnumerable<InputValueT> source,

        // Start date of the new time series.
        DateTime startDate,

        // Date at which the new time series will have ended.
        DateTime endDate,

        // The time interval between samples.
        TimeSpan resampleInterval,

        // Function that selects a date/time value from an input data point.
        Func<InputValueT, DateTime> dateSelector,

        // Interpolation function that produces a new interpolated data point
        // at a particular time between two input data points.
        Func<DateTime, InputValueT, InputValueT, double, OutputDataT> interpolator
    )
    {
        // ... argument checking omitted ...

        //
        // Manually enumerate the input time series...
        // This is manual because the first data point must be treated specially.
        //
        var e = source.GetEnumerator();
        if (e.MoveNext())
        {
            // Initialize working date to the start date, this variable will be used to 
            // walk forward in time towards the end date.
            var workingDate = startDate;

            // Extract the first data point from the input time series.
            var firstDataPoint = e.Current;
            
            // Extract the first data point's date using the date selector.
            var firstDate = dateSelector(firstDataPoint);

            // Loop forward in time until we reach either the date of the first
            // data point or the end date, which ever comes first.
            while (workingDate < endDate && workingDate <= firstDate)
            {
                // Until we reach the date of the first data point,
                // use the interpolation function to generate an output
                // data point from the first data point.
                yield return interpolator(workingDate, firstDataPoint, firstDataPoint, 0);

                // Walk forward in time by the specified time period.
                workingDate += resampleInterval; 
            }

            //
            // Setup current data point... we will now loop over input data points and 
            // interpolate between the current and next data points.
            //
            var curDataPoint = firstDataPoint;
            var curDate = firstDate;

            //
            // After we have reached the first data point, loop over remaining input data points until
            // either the input data points have been exhausted or we have reached the end date.
            //
            while (workingDate < endDate && e.MoveNext())
            {
                // Extract the next data point from the input time series.
                var nextDataPoint = e.Current;

                // Extract the next data point's date using the data selector.
                var nextDate = dateSelector(nextDataPoint);
                
                // Calculate the time span between the dates of the current and next data points.
                var timeSpan = nextDate - firstDate;

                // Loop forward in time until wwe have moved beyond the date of the next data point.
                while (workingDate <= endDate && workingDate < nextDate)
                {
                    // The time span from the current date to the working date.
                    var curTimeSpan = workingDate - curDate; 

                    // The time between the dates as a percentage (a 0-1 value).
                    var timePct = curTimeSpan.TotalSeconds / timeSpan.TotalSeconds; 

                    // Interpolate an output data point at the particular time between 
                    // the current and next data points.
                    yield return interpolator(workingDate, curDataPoint, nextDataPoint, timePct);

                    // Walk forward in time by the specified time period.
                    workingDate += resampleInterval; 
                }

                // Swap the next data point into the current data point so we can move on and continue
                // the interpolation with each subsqeuent data point assuming the role of 
                // 'next data point' in the next iteration of this loop.
                curDataPoint = nextDataPoint;
                curDate = nextDate;
            }

            // Finally loop forward in time until we reach the end date.
            while (workingDate < endDate)
            {
                // Interpolate an output data point generated from the last data point.
                yield return interpolator(workingDate, curDataPoint, curDataPoint, 1);

                // Walk forward in time by the specified time period.
                workingDate += resampleInterval; 
            }
        }
    }
}

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here