Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / SQL

Analysing Big Data in Real Time #1

4.92/5 (51 votes)
5 Jun 2014CPOL5 min read 48.8K  
How Formula 1 is beginning to use StreamInsight to broadcast data analytics to your screen in real time

Part 1 of the series Analysing Data in Real Time #1, #2

Introduction

This article is based on conversations and consultations I have had with McLaren Electronic Systems, Formula 1 Management, my own experience with Microsoft StreamInsight and publicly available information.

When watching F1, have you ever wondered how all that in-car data gets to your TV screen or how teams change strategies and make decisions as the race evolves? If so, then read on...

Image 1

Chasing Hamilton, Rosberg is using more fuel whilst the gap between them is updated constantly

Background

Increasingly most televised sports broadcasts are accompanied by a plethora of statistical data. Whether it's tennis with serve speeds or football with run distance and pass accuracy, timely delivery of such data to viewers is becoming an important aspect of the viewers experience. Consisting of around 80,000 components, the average Formula 1 car contains up to 300 sensors broadcasting telemetry data through the ECU and out over the radio waves. Over the course of one race, this can amount to 1.5 billion data points and even when compressed represents several gigabytes.

Image 2

That's one hell of a lot of moving data to slice and dice!

If you're an F1 team, how you manage and interpret this data can be the difference between making strategic decisions that will win you the race or blunders that cost you millions in sponsorship. For viewers knowing that your favourite driver is catching the car in front (whose high fuel consumption causes him to slow down to complete the race), only adds to the excitement and understanding.

Image 3

Will Hamilton exit the pit lane to a clear track or be stuck in traffic causing him to lose precious time?

Formula 1 uses a system called the Advanced Telemetry Linked Acquisition System (ATLAS) developed by McLaren Electronic Systems, to relay information from the car to trackside antennas and then to the teams on the pit wall and the television broadcasters.

Image 4

ATLAS System components

Clients connect to the ethernet backbone and receive information over TCP and UDP. But this is raw unstructured data and receiving it is akin to looking down the mouth of a hose pipe as someone else turns the tap.

Image 5

RPM, speed, gear selection, throttle percentage, DRS and KERS just some of the data sources transmitted and interpreted.

So what techniques can be used to extract meaning and derive insight from this flood of information? Indeed, faced with a similar situation, what can you use to help provide value to your customers?

Introducing Microsoft StreamInsight

This season, the broadcasters of Formula 1 have begun introducing a new system using Microsoft StreamInsight that filters and derives useful statistics from raw telemetry data. For example, if a typical fuel sensor will only measure the current level every second you can calculate the flow rate or usage if you already know volumes and densities.

Coming under a class of applications known as Complex Event Processing (CEP), StreamInsight is Microsoft's offering for real time data stream processing. Currently in at version 2.3 StreamInsight is the easiest of CEP systems to configure and use.

Image 6

Architecture of a StreamInsight application

C# developers already comfortable with concepts such as LINQ and Observables will readily be able to transfer their skills. Additionally, so long as you have a Microsoft SQL Server license, the StreamInsight is free to use and deploy.

As StreamInsight is free, Microsoft has no incentive to sell it, one possible reason why it is known to so few. But by customising an off the shelf CEP system, we can take advantage of data and patterns in that data that would otherwise be ignored.

Image 7

Go faster using less resources

CEP Concepts

All CEP Systems share some common concepts all of which should be familiar if you have ever worked with a database before. Here's what you need to know to adapt your thinking from the world of static data querying to filtering moving data.

Event Streams

Any sequence of data whether it is static or moving can be thought of as a stream. Like a FileStream (which is just a sequence of bytes), the speed of a car can be represented as a sequence of numbers that grows as time progresses. Time stamp each data point and you have a time series of complex Events aka an Event Stream.

Queries, Filters, Sources and Sinks

With languages such as SQL, we can query a database source table and return a subset of that data to ourselves (the sink). For example:

SQL
SELECT * FROM Speedometer WHERE lap = 10 

Likewise, with an event stream, we can filter a source stream in real time using Linq and pipe the results into a new stream:

C#
var sinkEventStream = from datapoint in speedometerEventStream
where datapoint.lap = 10 
select datapoint; 

In this example, we've created a sink called sinkEventStream and every time a new event enters into the source (speedometerEventStream), if it's lap property is 10 then it is consequently added to the sink.

Joins, Windows and Partitions

A window is essentially a moving view on static data (SQL) or a fixed length view of moving data (CEP). In CEP windows can either be sliding (i.e. they move continuously) or tumbling discrete amounts.

Image 8

As an example, suppose we have 2 sensors providing us with data to the following SQL tables:

FuelLevel

Image 9

LapTime

Image 10

You can join and partition data in with SQL Window functions to keep track of the average fuel for the lap.

SQL
SELECT l.Lap , f.Fuel, AVG(f.Fuel ) 
OVER(PARTITION BY l.Lap  ) AS "Average Fuel This Lap"  FROM 
FuelLevel f
Join LapTime l 
on  f.Time = l.Time  

Similarly with StreamInsight we can join streams, group them and then partition them over time or by arbitrary values:

C#
CepStream<Fuel> FuelLevel;
CepStream<Lap> LapTime;

var joined = from f in FuelLevel
                    join l in LapTime
                    on f.Time equals l.Time
                    select new LapAndFuel() { FuelLevel = f.FuelLevel, Lap = l.Lap};
                 

            var AverageFuelStream = from win in 
                                  joined
                                  group win by win.Lap into eachGroup
                                  from window in eachGroup.SnapShotWindow()
                                  select  new {Average = window.Avg(), Lap = window.Lap};

So essentially, StreamInsight gives us much the same functionality as you would expect from SQL Server with the added advantage that everything is happening in real time. Mix this in with the ability to easily call out to other sub systems and you have the beginnings of a system that can monitor data and make intelligent decisions.

Image 11

Leadership enabled by speed AND intelligence?

Future Articles

This was just a brief introduction to CEP and MS StreamInsight, but in future articles I plan to demonstrate some practical examples as well as describe the various tools and deployment scenarios you might use in real life.

History

  1. Creation
  2. Image sizes
  3. Article series links

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)