Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / artificial-intelligence / big-data

BriefMaker – An App for Processing Real-Time Market Data

5.00/5 (12 votes)
10 May 2021MIT14 min read 32.4K   1.1K  
Converts past and real-time stock market tick data into time-sliced summaries called Briefs
In this article, I present a Windows program that converts streaming market events into time-period based summaries which I refer to as briefs.
Tip: Before downloading, please review the Limitations and Requirements.

Introduction

BriefMaker is a Windows program that converts streaming market events into time-period based summaries that I refer to as briefs. Briefs are a more familiar and practical data to work with.

Image 1

Stream data is usually not that useful – it is a random mess of different events. We usually want to convert this mess into a more logical and useful data structure that might contain things like the highest, lowest and average value. When we look at a stock’s price history table, we often take this conversion for granted forgetting that behind the scenes, this data originally came from a massive amount of market events such as trades, bid offers, ask offers, etc. The conversion process takes these varying number of events and converts them into a nice fixed-sized summary for a specific interval of time. The result is data that is easier to read for people and AI algorithms alike.

Below, on the left, is what streaming event data looks like. [78,2,2,24.43] is just my short version of saying at 78/256th of a second, for stock id #2 and attribute #2(bid price update), there was a new value of 24.43. BriefMaker converts this streaming data into time-based snapshots that I refer to as briefs displayed on the right. The right side is much easier to read.

Image 2

Why the name ‘Brief’? First off, Brief indicates a short version of something much longer. A brief is a fixed-sized summary of a potentially large set of events over a period of time. But in a way, a brief is not that brief either because it captures so much more than the basic low/high/close/volume, it also captures indicators, statistics, counts, and more. I guess a more accurate name/definition could be “fixed-sized extended summary of market event data over a unit of time”.

There is a disadvantage of converting event data to time-based summaries(briefs). Usually, when summarizing data of any kind, there is some loss of data that cannot be restored. For example, the original real-time event stream cannot be reproduced from briefs – it is a one-way street. Briefs can always be re-created from event data however, and for this reason, I recommend holding onto the original data (StreamMoments). As shown below, briefs can be pretty detailed in how they describe the event data and we might want to adjust the brief making process and re-create all our briefs.

Briefs have many benefits over streaming quotes. They can be fit in a table for example with time on one axis and volume on the other. Briefs are easier to view and contain most of the important information on what happens to each symbol in each 6-second time frame. It contains data like the period-high, period-low, last-sale-price, last-bid, last-bid-volume, trade counts, and more.

Background

BriefMaker was built because I had a need for table like data for my AI. I use something similar to a genetic algorithm to try and predict stocks. It basically reads in X number of rows and then does its best to predict the next line of data so it can do real-time trading. While not required, a table does work good for this. However, I had almost random event data.

While event data could also be used in a table, it would be very compute intensive to run the algorithm on every event. The only way this would make sense is when using very low latency stuff. If I lived a mile from the exchange and milliseconds counted, I might have just used the event data but I am on the west coast with a basic internet connection.

For my data needs, I had two choices. I could either use existing history data or I could build it myself from streaming data There are lots of choices online for stock history but it lacked the high detail that I wanted. Most are basically a bid, ask, high, low, last, and volume with a 5-60 second or resolution. I wanted more descriptive details on that data so I thought I could generate my own with a live data feed. This was one of the reasons I built this program from scratch instead of using existing data.

One great concern I had was missing hidden data when doing the conversion. I wanted the AI algorithm to have access to as much information as possible. The AI was being used in CUDA and in CUDA, the warp size is 32 so having 32 symbols and 32 attributes fit the data structure well. At first, it was difficult to fill 32 different attributes fields, but soon it was difficult to keep it under 32. Originally, I had more statistical fields like Variance, standard deviation, kurtas but dropped these in favor of some other descriptive fields. Also, there just was not enough data to fill some of these advanced data descriptors for only 6 seconds of trades.

So for this whole AI predicting algorithm to work well, I needed a number of parts and one was the problem of converting the event data to highly-detailed, fixed-sized, time-interval data. After some playing around some, BriefMaker was the result.

Features

Below are what I believe are some nice features of BriefMaker.

  • Detailed Briefs - Captures 32 different aspects of each stock including statistical and indicator like data.
  • Direct WCF connection to enhance real-time data performance. It can be used with MarketRecorder or customized to use with other programs as well.
  • Continue where left off - will re-run StreamMoments from a little before it last left off to make sure the data is more accurate. Also if the program is closed while it's creating the briefs and it is re-launched, it will continue where it left off.
  • Protection against writing incomplete Briefs - Built-in startup protection against outputting incomplete briefs when an important value, such as the last price, has not been received yet. (See waitingForData in code)
  • Out of range checking – checks data to make sure values are within acceptable ranges such as last price should be between

Data Flow

The dataflow for BriefMaker is pretty simple. It basically reads in six SteamMoments records from a table at a time then saves them to a briefs record. After all, the StreamMoments are read and it is all caught up, then it can optionally wait for new StreamMoments via WCF. Take a look at the Interactive Brokers TWS MarketRecorder project to see an example of how to send data via WCF.

Image 3

Stream-to-Brief Conversion - Capturing Almost Everything

Whenever we summarize the chaotic event data for a stock, or most anything else, we are losing data in the conversion process. We are stepping away from what exactly happened in the market. One of the goals of this project was to miss as little information where possible from the original event data. For the stock predicting algorithm, I want it to have access to as many informational fields as possible to ensure it could find undiscovered patterns in the market.

In BriefMaker, I tried to collect all kinds of detail on what happens every 6 seconds to a particular symbol. In some regards, a brief is not that brief. A brief contains the normal stuff like high, low, last, ask, bid but it also captures different kinds of volume information, statistical data (mean price, mode price, median price) and indicator data (like MACD, SMA, Bollinger-Bands), tick counts, sale counts, etc. The goal was to try and capture as much descriptive information about the stream as possible. In the past, I also had standard deviation and variance but I gave these up for other kinds of data. Often, there is just not enough data in a 6 second increment for a given ticker symbol for these.

Below are 32 different aspects captured for each symbol every six seconds. With so many different ways to view the data, we can describe pretty well what happens to that symbol in each 6-second time period. Every 6 seconds and for each stock, BriefMaker collects the following information:

ID  Format  Short Name  Init. Value  New Values           Description
0   float   volume_day  (none)       always replace       total volume for the day
1   float   volume_ths  0            sum                  volume for this 6-sec period (calculated)
2   float   largTrdPrc  price_last   conditional replace  price at largest volume trade
3   float   price_high  price_last   conditional replace  highest price in period
4   float   price_loww  price_last   conditional replace  lowest price in period
5   float   price_last  price_last   always replace       last trade price
6   float   price_bidd  price_bidd   always replace       last bid price
7   float   price_askk  price_askk   always replace       last ask price
8   float   volume_bid  volume_bid   always replace       last bid size
9   float   volume_ask  volume_bid   always replace       last ask size
10  float   price_medn  (none)       calculated           Statistics median for last price
11  float   price_mean  (none)       calculated           Statistics mean for last price
12  float   price_mode  (none)       calculated           Statistics mode for last price
13  float   buyy_price  (none)       calculated           A prediction of what the buy price.
14  float   sell_price  (none)       calculated           A prediction of what the sell price.
15  float   largTrdVol  0            conditional replace  The size of largest trade.
16  float   prcModeCnt  0            sum                  Statistics mode price count
17  float   vol_at_ask  0            always replace       Volume at ask price
18  float   vol_no_chg  0            always replace       Volume with no last change in last size.
19  float   vol_at_bid  0            always replace       Volume at bid price
20  float   BidUpTicks  0            sum                  How many times the bid went up.
21  float   BidDnTicks  0            sum                  How many times the bid went down.
22  float   sale_count  0            sum                  # of trades counted
23  float   extIndex00  overwritten  calculated           6 sec. calculated ATR of last trades
24  float   extIndex01  overwritten  calculated           6 sec. calculated CCI of last trades
25  float   extIndex02  overwritten  calculated           6 sec. calculated EMA of last trades
26  float   extIndex03  overwritten  calculated           6 sec. calculated Kama of last trades
27  float   extIndex04  overwritten  calculated           6 sec. calculated RSI of last trades
28  float   extIndex05  overwritten  calculated           6 sec. calculated SMA of last trades
29  float   extIndex06  overwritten  calculated           6 sec. calculated SarExt of last trades
30  float   extIndex07  overwritten  calculated           6 sec. calculated MACD of last trades
31  float   extIndex08  overwritten  calculated           6 sec. calculated Bollinger bands of lasts

Format: For simplicity, all values are stored as type float. One drawback to floating point is the 7.2 significate digits, however this precision is usually beyond the precision of the data.

Initial Value: This is the value that each brief starts out with. Usually, they are reset to zero or carried over from the previous brief. Some values are specified as ‘overwritten’ because they are overwritten when finishing the brief so no initial value is needed.

New Values: This is the action for new stream events.

  • Always Replace - will always replace the current value with a new value.
  • Conditional Replace - will only replace a value if a condition is met. Like the price_high would only be replaced if it is the new high.
  • Sum - will add the new value to the running total.
  • Calculated – values are calculated on the completion of each brief. For example, this might be an array of sale prices that is fed into a formula.

Extracting a Brief

To extract a brief for use in an application, use something like the below…

More details on the Brief structure can be found here.

C#
BinaryReader reader = new BinaryReader(new MemoryStream(lastBrfImage));
// Layout: |--HDRs+Indexes(32)--|----------------Stocks(32x32)----------------|
int   briefID          = reader.ReadSingle();
float Day              = reader.ReadSingle();
float Hour             = reader.ReadSingle();
float Minute           = reader.ReadSingle();
float Second           = reader.ReadSingle();
float DayOfWeek        = reader.ReadSingle();
float MinutesSinceOpen = reader.ReadSingle();
float SecondsSinceOpen = reader.ReadSingle();
float HoursSinceOpen   = reader.ReadSingle();
float RemainingHours   = reader.ReadSingle();
float RemainingMinutes = reader.ReadSingle();
float TICK-NASD  4     = reader.ReadSingle();
float VOL-NASD_0       = reader.ReadSingle();
float VOL-NASD_1       = reader.ReadSingle();
float VOL-NASD_2       = reader.ReadSingle();
float AD-NASD_1        = reader.ReadSingle();
float AD-NASD_2        = reader.ReadSingle();
float TICK-NYSE_4      = reader.ReadSingle();
float VOL-NYSE_0       = reader.ReadSingle();
float VOL-NYSE_1       = reader.ReadSingle();
float VOL-NYSE_2       = reader.ReadSingle();
float AD-NYSE_0        = reader.ReadSingle();
float AD-NYSE_1        = reader.ReadSingle();
float AD-NYSE_2        = reader.ReadSingle();
float INDU_1           = reader.ReadSingle();
float INDU_2           = reader.ReadSingle();
float INDU_4           = reader.ReadSingle();

for (int s = 0; s < symbCt; s++) // Read in each ticker
{
    symbols[s].volume_day = reader.ReadSingle();
    symbols[s].volume_ths = reader.ReadSingle();
    symbols[s].largTrdPrc = reader.ReadSingle();
    symbols[s].price_high = reader.ReadSingle();
    symbols[s].price_loww = reader.ReadSingle();
    symbols[s].price_last = reader.ReadSingle();
    symbols[s].price_bidd = reader.ReadSingle();
    symbols[s].price_askk = reader.ReadSingle();
    symbols[s].volume_bid = reader.ReadSingle();
    symbols[s].volume_ask = reader.ReadSingle();
    symbols[s].price_medn = reader.ReadSingle();
    symbols[s].price_mean = reader.ReadSingle();
    symbols[s].price_mode = reader.ReadSingle();
    symbols[s].buyy_price = reader.ReadSingle();
    symbols[s].sell_price = reader.ReadSingle();
    symbols[s].largTrdVol = reader.ReadSingle();
    symbols[s].prcModeCnt = reader.ReadSingle();
    symbols[s].vol_at_ask = reader.ReadSingle();
    symbols[s].vol_no_chg = reader.ReadSingle();
    symbols[s].vol_at_bid = reader.ReadSingle();
    symbols[s].BidUpTicks = reader.ReadSingle();
    symbols[s].BidDnTicks = reader.ReadSingle();
    symbols[s].sale_count = reader.ReadSingle();
    symbols[s].extIndex00 = reader.ReadSingle();
    ...
    symbols[s].extIndex08 = reader.ReadSingle();
}

Points of Interest

At the beginning of this project, I had some unnecessary complication. I needed a system that could receive data, prepare it, and upload it at almost the same time from many different threads. After some playing around, I thought about a coin. One side of the coin can be receiving StreamMoments while the other side can be finishing off a brief and uploading it to the database. Every 6-seconds, the coin is flipped and some data is carried over. This out-of-box thinking made the program a lot easier to code, maintain and understand. It also made threading much simpler as well.

Reading and Storing Data

Reading StreamMoments Records from the Database

The program reads six one-second StreamMoment records to generate one brief. Since the bulk of the source data is stored in a database when the program starts it…

  1. Find and load the latest brief already written to the SQL database. The goal is to re-load the system state to where it last left off.
  2. Based on the latest brief, the program then starts replaying StreamMoments slightly before this moment. Again, the goal is to re-load the system state to where it last left off.
  3. After it re-plays those StreamMoments, it continues and starts processing the new StreamMoments that do not have Briefs yet.
  4. After it catches up by reading in all the StreamMoments, or the current moment, it waits for new StreamMoments from either a new database record or directly from the WCF connection. The WCF method was purely added to lower the latency when working with real-time data.

Details on the StreamMoments format can be found here.

Writing the Finished Briefs to the Database

After BriefMaker has created a new brief, from six StreamMoments, it writes it to the briefs table. Originally, I had a crazy number of columns (32 symbols x 32 attributes) but this was a performance/memory hog, so I changed it to record in byte image format. This is much more efficient but not as easy to use the data. With the data in each column, it was convenient for reporting but it was just very slow and it used tons of memory.

In the briefs table, there are only two columns:

Image 4

BriefID: Stored in TinyTime6Sec format. This is my own format, but it can easily be converted to DateTime by a simple cast. This is kind of like DateTime but it can be fit in a 32-bit integer. The value for TinyTime6Sec is basically the number of 6-second increments between 8am-4pm M-F since 1/1/2010. This was a time format I created to (a) keep the data/time field small and (b) to create a contiguous range without gaps that I could use for an ID. For example, ID 489394 would refer to some 6-second timeframe during market hours and 4893945 would be the next 6 second timeframe.

One note is that TinyTime6Sec does not account for holidays. Just because there is a valid TinyTime6Sec ID that does not mean the market was open that day. Weekends are skipped however, there is no valid value for Saturday or Sunday.

BriefBytes: This is where the brief is stored. As mentioned before, it is stored in byte image format for performance reasons. Each byte image size is 4224 bytes. (32 header items + (32 stocks x 32 attributes)) * sizeof(float).

The byte layout is as follows:

Offset Data Stored Offset Data Stored (cont.)
0 briefID(int) 64 VOL-NYSE(2).askPrice
4 Day 68 TICK-NYSE(3).lastPrice
8 Hour 72 VOL-NYSE(4).bidSize
12 Minute 76 VOL-NYSE(4).bidPrice
16 Second 80 VOL-NYSE(4).askPrice
20 DayOfWeek 84 AD-NYSE(5).bidSize
24 TotalMinutes 88 AD-NYSE(5).bidPrice
28 TotalSeconds 92 AD-NYSE(5).askPrice
32 HoursSinceOpened 96 DJIA(6).bidPrice
36 HoursRemaining 100 DJIA(6).askPrice
40 MinutesRemaining 104 DJIA(6).lastPrice
44 TICK-NASD(0).lastPrice 108-127 note used
48 VOL-NASD(1).bidSize 128-255 Ticker 1 (see table)
52 VOL-NASD(1).bidPrice 256-383 Ticker 2 (see table)
56 AD-NASD(1).askPrice  
60 VOL-NYSE(2).bidPrice 4096-4223 Ticker 32

Viewing the Briefs

A small viewer program is included so briefs can easily be viewed. It would not be much fun to run BriefMaker, if there wasn't a way to view the output!

To view the briefs, launch the viewer application and use either the Stock Chart tab or the Brief Raw Data tab. Both tabs really show the same data but with different views - chart vs table. These are the only two tabs that are used here.

The Brief Raw Data view looks like this:

Image 5

And the chart view...

Image 6

The executable is included in the downloads at the top of this page and the source can be found here.

Limitations / Disadvantages

Before downloading this project, some limitation/annoyances might want to be reviewed. I wanted to share these with viewers so they do not have to download the project and have to figure them out on their own. =)

  • BriefMaker is currently mostly hard-coded to work with 32 symbols. To use more/less, some code will need to be modified. This would not be that difficult.
  • The BriefMaker stores each brief’s time in a propriety TinyTime6Sec. This is basically the number of 6-second intervals from M-F 8am-4pm since Jan 1st, 2010. TinyTime6Sec can easily be casted to a DateTime using the TinyTime6Sec class.
  • No Level II market data

Wish List

Some items I would like to add in the future: (not sure when or if I will ever get to it though)

  • Switch to the QLNet library (uses QuantLib) for the quantitative finance stuff. This is a more recent up-to-date project then the TA-Lib. The C# TA-Lib is a great library, but unfortunately has not been updated since 2007.
  • Get rid of the hard-coded “32 symbol” requirement.
  • Add Level II data.

Setup Instructions

  1. Download the database, extract it, and using SQL Server manager, attach it with the name Focus.
  2. Download and extract either the code or the executables. If you download the code, then you will need to build the project.
  3. Open the .config file and first look at the BriefsConnectionString. You might need to edit this connection string depending on your setup. Most often "Data Source=.;Initial Catalog=Focus;Integrated Security=True" would be for regular SQL Server or "Data Source=.\SQLEXPRESS;Initial Catalog=Focus;Integrated Security=True" would be used for SQL Server Express. Also review the BeginRecordTime, EndRecordTime, and PreBeginBufferTime. These should be set to your local time on when the markets open/close.
  4. Now run BriefMaker. If there are errors, then you can either review the log window or use Visual Studio's debugger.
  5. After BriefMaker finishes, then launch Viewer.exe to view the output. Again, you might need adjust the viewer's FocusConnectionString before starting the application. After opening it, use the "Stock Chart" and "Raw Brief Data" tabs to view the data. The other tabs are not used for BriefMaker.

Requirements

  • .NET 4.5
  • SQL Server or SQL Express (Free)

History

  • 23rd March, 2016: Initial version

License

This article, along with any associated source code and files, is licensed under The MIT License