NSimpleOlap is a project that I started in 2012 with the goal of building a stand-alone embeddable .Net OLAP library, that can be used within the context of console, desktop, or other types of applications. And my initial motivation for starting this project was that at the time there weren’t that many lightweight Open Source implementations. Or the implementations that suited my preferences were too expensive, or that would only exist as server solutions, etc..
In my previous professional path building tools for Marketing Research I was exposed to many of the tropes of what it’s called Analytics, and that gave me some understanding of the basics of Business Intelligence. Even after leaving the Marketing Research business I still kept an interest in the subject and the tools of the trade. And I researched the market for tools that had similar business cases, like survey and questionnaire applications, and OLAP and BI Server solutions. Some products struck a cord with me like Jedox, Pentaho, JasperReports, and I even dabbled on Microsoft SQL Server Analysis Services. But these were not the products I was looking for.
Since my Interests had shifted, I wanted a OLAP engine that could be used within the context of an application and that could do aggregations quickly on a limited dataset, or in real-time but with some caveats. And although it’s true that at the time there were some analytics solutions, like Tableau, that provide a full range of data, reporting and UI tools, and some real-time features. In 2012 I decided to start this project.
The project in the beginning of 2012 was actually evolving very quickly, unfortunately a personal mishap derailed everything. And for professional and personal reasons I wasn’t able or motivated to restart development on the project. But out of frustration and disillusionment with the way technical skills are evaluated I decided take a chance and get the project into a releasable state. And it’s my intention that this project will help to educate more developers on the utility of aggregation engines beyond the field of Business Intelligence and Finance.
At a personal level I am quite frustrated with the way interviews for developer roles are done, and how technical skills are evaluated, and all the selection process. From the box ticking, to questions about algorithms and data structures that are rarely or never used, or the online gamified technical tests, the code challenges that require several days of full time work (and that are suspiciously like real world problems), the bait and switch, etc.. And that is just the recruiting process, the actual work itself very often provides very little in terms of career growth. Being that in some cases, people that you work with have an incentive to devalue your skills, steal your ideas or just take advantage of you. Also, it’s annoying as hell to have to watch the constant manhood measurement contests, the attention seeking narcissists, the friendly backstabbers, and the occasional incompetent buffoon.
Well, that is out of my chest… Rant over.
The Project
At the present moment the NSimpleOlap project is still in alpha stage and unstable. And at the moment will only allow for some basic querying and limited modes of aggregation. Being that some of its features are still experimental, and are implemented in a way to allow for easy testing of different opportunities for optimization or/and feature enhancement. You can find it by going to the following Github repository:
At the conceptual level NSimpleOlap borrows a lot from the MDX query language, the model of the Star Schema, and the modelling and mapping conventions that are common in the modelling of data Cubes. As an example tuples and tuple sets are the way you can locate Cube cells, and can be used to define what information comes in rows or in columns. Examples of tuples are as follows:
- Category.Shoes.Country.Italy
- Year.2012.Products.All
- Gender.Female.State.California.Work.IT
There are some concepts that you will need to be familiar so you can use this library:
- Dimension – This is an entity or characteristic of your data points, it can be a socio-demographic variable like gender, age, region, etc., or product name, year, month etc..
- Dimension Member – This is a member of a dimension, in the case of gender an example would be “female”.
- Measure – This is a variable value from your data points, it can be the sale value, number of items bought, number of children, etc..
- Metrics – This is a value that is calculated from the aggregated results, can be an average, a percentage, or some other type of complex expression.
To be able to populate the Cube you will need to organize your data in a table that has all the facts, where the dimension columns have numerical keys, and that you have those keys and relevant metadata in separate dimension definition data sources.
Building Your First Cube
Building a Cube will require some initial setup to identify the data sources, mappings and define the useful metadata. In the following example we will build a Cube from data that is contained in CSV files, and these will be used to define the Cube dimensions and measures.
CubeBuilder builder = new CubeBuilder();
builder.SetName("Hello World")
.SetSourceMappings((sourcebuild) =>
{
sourcebuild.SetSource("sales")
.AddMapping("category", "category")
.AddMapping("sex", "sex"));
})
.AddDataSource(dsbuild =>
{
dsbuild.SetName("sales")
.SetSourceType(DataSourceType.CSV)
.SetCSVConfig(csvbuild =>
{
csvbuild.SetFilePath("TestData//table.csv")
.SetHasHeader();
})
.AddField("category", 0, typeof(int))
.AddField("sex", 1, typeof(int))
.AddField("expenses", 3, typeof(double))
.AddField("items", 4, typeof(int));
})
.AddDataSource(dsbuild =>
{
dsbuild.SetName("categories")
.SetSourceType(DataSourceType.CSV)
.AddField("id", 0, typeof(int))
.AddField("description", 1, typeof(string))
.SetCSVConfig(csvbuild =>
{
csvbuild.SetFilePath("TestData//dimension1.csv")
.SetHasHeader();
});
})
.AddDataSource(dsbuild =>
{
dsbuild.SetName("sexes")
.SetSourceType(DataSourceType.CSV)
.AddField("id", 0, typeof(int))
.AddField("description", 1, typeof(string))
.SetCSVConfig(csvbuild =>
{
csvbuild.SetFilePath("TestData//dimension2.csv")
.SetHasHeader();
});
})
.MetaData(mbuild =>
{
mbuild.AddDimension("category", (dimbuild) =>
{
dimbuild.Source("categories")
.ValueField("id")
.DescField("description");
})
.AddDimension("sex", (dimbuild) =>
{
dimbuild.Source("sexes")
.ValueField("id")
.DescField("description");
})
.AddMeasure("spent", mesbuild =>
{
mesbuild.ValueField("expenses")
.SetType(typeof(double));
})
.AddMeasure("quantity", mesbuild =>
{
mesbuild.ValueField("items")
.SetType(typeof(int));
});
});
Creating the Cube will require you to make the necessary method calls so the data will be loaded and processed. And this can be done as follows.
var cube = builder.Create<int>();
cube.Initialize();
cube.Process();
Querying The Cube
Querying the Cube can be done by using the querying interface, here’s a basic example:
var queryBuilder = cube.BuildQuery()
.OnRows("sex.female")
.OnColumns("category.shoes")
.AddMeasuresOrMetrics("quantity");
var query = queryBuilder.Create();
var result = query.StreamCells().ToList();
In the previous example you streamed the results by cells, but you can also stream by rows:
var result_rows = query.StreamRows().ToList();
You can also add some basic expressions to filter on the table facts, this will limit the scope of the rows that will be aggregated.
var queryBuilder = cube.BuildQuery()
.OnRows("sex.All")
.OnColumns("category.All")
.AddMeasuresOrMetrics("quantity")
.Where(b => b.Define(x => x.Measure("quantity").IsEquals(5)));
var query = queryBuilder.Create();
var result = query.StreamCells().ToList();
Or you can add some basic expressions to filter on dimension members, which won’t affect the scope of the aggregated results.
var queryBuilder = cube.BuildQuery()
.OnRows("sex.All")
.OnColumns("category.All")
.AddMeasuresOrMetrics("quantity")
.Where(b => b.Define(x => x.Dimension("sex").NotEquals("male")));
var query = queryBuilder.Create();
var result = query.StreamCells().ToList();
Concluding & Hopes For the Future
In conclusion, there is still a lot of work to be done to have the sets of features like dimension levels, Date and Time dimension transformers, query expressions, etc.. Hopefully these features will be coming in the near future.