Introduction
The technique presented here is a simple method of resampling and aggregating
time series data that is built on
LINQ.
This technique is useful when you have irregular or gappy time series data that you want to normalize so that there is a regular time interval between each data point.
It is also useful when you want to compare two sets of time series data and you need to have a common time interval to do so.
The code is unit tested and works well to my knowledge, if you find problems please give feedback and I will fix the code and add new tests.
Audience
This code is all in C# and is based on LINQ. To use the code it will be helpful if you are already familiar with using functions such as Select and
Zip. See here for an
introduction to LINQ.
To understand the implementation of the Resample function you must have an understanding of how LINQ functions work including
IEnumerable
and the
yield statement
.
Please note that I prefer to use the method syntax style of LINQ programming and not the
query syntax style.
I also have this technique working in JavaScript using
linq.js. If you are a JavaScript programmer and you want this, please let me know and I'll consider cleaning up the JavaScript version and adding it to this article.
Background
I have been working on some code to help me analyze and graph financial data. I wanted to compare two sets of data. Unfortunately each set of data has gaps or the data was sampled at different time intervals.
I needed some code that would normalize the time series data to a common time interval so that the data could then be easily compared and aggregated.
After procrastinating for a few weeks and having not found online any existing technique to achieve what I want (surely there must be something out there!) I had a brainwave. What I want to achieve (resampling of time series data) is very similar to something I have implemented or used many times during my game development career. Resampling of time series data is very similar to running a keyframe animation and capturing the resulting animated value at a regular time interval.
Implementation was fast, once I knew what I wanted and I think this a useful technique and worth sharing.
LINQ is my weapon of choice and I have written a LINQ-style function that resamples an input set of data and generates new times series data where the data is sampled at a pre-defined time interval.
After resampling to a normalized time interval LINQ Zip can then easily be used to aggregate multiple sets of data.
Example project
Attached is an example project that demonstrates the technique.
The example project uses the
DynamicDataDisplay charting library to display the data. The data I use in the example project is from their StockExchangeSample project. Thanks to those guys for providing the charting library, various example projects and sample data.
Using the code
The following code illustrates how to use the Resample function.
class Data
{
public DateTime EventDate { get; set; }
public double SomeValue { get; set; }
};
Data[] timeSeriesData =
DateTime startDate = DateTime endDate =
TimeSpan timeInterval = TimeSpan.FromDays(1);
IEnumerable<Data> resampledTimeSeries =
timeSeriesData.
Resample(
startDate,
endDate,
timeInterval,
data => data.EventDate,
(curDate, data1, data2, t) =>
new Data
{
EventDate = curDate,
SomeValue = Lerp(data1.SomeValue, data2.SomeValue, t)
}
);
private double Lerp(double v1, double v2, double t)
{
return v1 + ((v2 - v1) * t);
}
After multiple data sets have been normalized to have the same time interval, it is now possible
to use the LINQ Zip operator to aggregate the data sets.
IEnumerable<Data> resampledData1 = ... resample some data ...
IEnumerable<Data> resampledData2 = ... resample some data ...
var mergedData =
resampledData1 .Zip(
resampledData2, (data1, data2) => new Data
{
EventDate = data1.EventDate,
SomeValue = data2.SomeValue - data2.SomeValue
}
)
.ToArray();
You should note that because evaluation of the LINQ IEnumerable is lazy, the resampling only happens as you enumerate the output time series data. This makes the whole technique quite efficient, you only pull out of the enumerable as many data points as you need and only that much will be resampled.
Note the call ToArray() in the previous code snippet, this is simply to force the entire LINQ statement to lazily evaluate which runs the Zip operator which in turn drives the resampling process. Without the call to ToArray() no resampling would actually happen, which can seem a bit counter-intuitive for C# programmers who are new to the idea, but it is a common concept in the functional programming world.
Explaining the code
Now I'll explain the implementation of the Resample function via inline comments.
public static class ResampleExt
{
public static IEnumerable<OutputDataT> Resample<InputValueT, OutputDataT>(
this IEnumerable<InputValueT> source,
DateTime startDate,
DateTime endDate,
TimeSpan resampleInterval,
Func<InputValueT, DateTime> dateSelector,
Func<DateTime, InputValueT, InputValueT, double, OutputDataT> interpolator
)
{
var e = source.GetEnumerator();
if (e.MoveNext())
{
var workingDate = startDate;
var firstDataPoint = e.Current;
var firstDate = dateSelector(firstDataPoint);
while (workingDate < endDate && workingDate <= firstDate)
{
yield return interpolator(workingDate, firstDataPoint, firstDataPoint, 0);
workingDate += resampleInterval;
}
var curDataPoint = firstDataPoint;
var curDate = firstDate;
while (workingDate < endDate && e.MoveNext())
{
var nextDataPoint = e.Current;
var nextDate = dateSelector(nextDataPoint);
var timeSpan = nextDate - firstDate;
while (workingDate <= endDate && workingDate < nextDate)
{
var curTimeSpan = workingDate - curDate;
var timePct = curTimeSpan.TotalSeconds / timeSpan.TotalSeconds;
yield return interpolator(workingDate, curDataPoint, nextDataPoint, timePct);
workingDate += resampleInterval;
}
curDataPoint = nextDataPoint;
curDate = nextDate;
}
while (workingDate < endDate)
{
yield return interpolator(workingDate, curDataPoint, curDataPoint, 1);
workingDate += resampleInterval;
}
}
}
}