Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / web / HTML

Transformalizing NorthWind

4.95/5 (29 votes)
24 Jul 2014GPL37 min read 59.1K   341  
Combining de-normalization, transformation, replication, and awesome-ness.

Introduction

Transformalize aims to transform and denormalize relational data in near real-time. The resulting data may be used in several ways:

  • As an OLAP cube data source
  • To feed a SOLR or Elasticsearch index
  • To provide faster, simpler, non-blocking access to regular SQL queries and reports
  • Or, use your imagination... (e.g. to feed a Redis cache server, to load a NoSql database, etc.)

Transformalize is an open source .NET 4.5 class library. It may be referenced and run directly in code, or run with an included console application. It's source is hosted on GitHub.

Demo

Start with an XML configuration:

XML
<transformalize>
	<processes>
		<add name="NorthWind">
		    <connections>
		        <add name="input" />
		        <add name="output" />
		    </connections>
		    <entities/>
		    <relationships/>
		</add>
	</processes>
</transformalize>

Specify the NorthWind database as an input, and another as the output. The default input and output connection provider is SQL Server 2005+. 

XML
<connections>
    <add name="input" connection-string="server=localhost;Database=NorthWind;Trusted_Connection=True;"/>
    <add name="output" connection-string="server=localhost;Database=NorthWindOutput;Trusted_Connection=True;"/>
</connections>

The NorthWind database schema:

NorthWind Schema

Add an Order Details entity and save the file as NorthWind.xml.

XML
<entities>
    <add name="Order Details"/>
</entities> 

Using the console application, run Transformalize (aka tfl) in "metadata" mode:

tfl NorthWind.xml {'mode':'metadata'}

Metadata mode reads the information schema of the database.  Then, it writes and opens an XML file with Order Detail's primary key and field definitions. Copy them into NorthWind.xml:

XML
<entities>
    <add name="Order Details">
        <fields>
            <add name="OrderID" type="System.Int32" primary-key="true" />
            <add name="ProductID" type="System.Int32" primary-key="true" />
            <add name="Discount" type="System.Single" />
            <add name="Quantity" type="System.Int16" />
            <add name="UnitPrice" type="System.Decimal" precision="19" scale="4"/>
        </fields>
    </add>
</entities>  

Now, run Transformalize in Initialize mode:

tfl NorthWind.xml {'mode':'init'}
23:38:57 | Info | NorthWind | All | Initialized TrAnSfOrMaLiZeR.
23:38:57 | Info | NorthWind | All | Initialized NorthWindOrderDetails in NorthWindOutput on localhost.
23:38:57 | Info | NorthWind | All | Process completed in 00:00:00.5585967.

Initialize mode initializes the output, preparing a place to store the data. Now run Tfl without specifying a mode:

tfl NorthWind.xml
23:43:01 | Info | NorthWind | Order Details....... | Processed 2155 inserts, and 0 updates in Order Details.
23:43:01 | Info | NorthWind | Order Details....... | Process completed in 00:00:00.7455880.

Transformalize copied the data that is configured in Northwind.xml. If we run it again, this happens: 

tfl NorthWind.xml
23:44:18 | Info | NorthWind | Order Details....... | Processed 0 inserts, and 2155 updates in Order Details.
23:44:18 | Info | NorthWind | Order Details....... | Process completed in 00:00:01.0926105

It updates the data. It copies new and updates existing data, but it is inefficient. The 2155 records have not been modified in the source, but they have been updated unnecessarily in the destination. So, we need to add a version column to Order Details entity. A version column should be a value that will increment anytime a record is inserted or updated. Conveniently, SQL Server offers a ROWVERSION type that gives us a version column without having to modify the application or add a trigger.

ALTER TABLE [Order Details] ADD RowVersion ROWVERSION; 

Update the Order Details entity to use RowVersion: 

XML
<entities>
    <add name="Order Details" version="RowVersion">
        <fields>
            <add name="OrderID" type="System.Int32" primary-key="true" />
            <add name="ProductID" type="System.Int32" primary-key="true" />
            <add name="Discount" type="System.Single" />
            <add name="Quantity" type="System.Int16" />
            <add name="RowVersion" type="System.Byte[]" length="8" />
            <add name="UnitPrice" type="System.Decimal" precision="19" scale="4"/>
        </fields>
    </add>
</entities> 

Re-initialize and run twice: 

tfl NorthWind.xml {'mode':'init'}
23:58:52 | Info | NorthWind | All | Initialized TrAnSfOrMaLiZeR.
23:58:52 | Info | NorthWind | All | Initialized NorthWindOrderDetails in NorthWindOutput on localhost.
23:58:52 | Info | NorthWind | All | Process completed in 00:00:00.5504415.

tfl NorthWind.xml
00:00:18 | Info | NorthWind | Order Details....... | Processed 2155 inserts, and 0 updates in Order Details.
00:00:18 | Info | NorthWind | Order Details....... | Process completed in 00:00:00.7417452.

tfl NorthWind.xml
00:00:23 | Info | NorthWind | Order Details....... | Processed 0 inserts, and 0 updates in Order Details.
00:00:23 | Info | NorthWind | Order Details....... | Process completed in 00:00:00.6042720.

Now it doesn't update data unnecessarily.  It's using the version field to sense that the data hasn't been updated.  Let's view the output.

SQL
SELECT TOP 10 Discount, OrderID, ProductID, Quantity, UnitPrice
FROM NorthWindStar;
Discount      OrderID     ProductID   Quantity UnitPrice
------------- ----------- ----------- -------- ---------
0.2           10248       11          12       14.0000
0             10248       42          10       9.8000
0             10248       72          5        34.8000
0             10249       14          9        18.6000
0             10249       51          40       42.4000
0             10250       41          10       7.7000
0.15          10250       51          35       42.4000
0.15          10250       65          15       16.8000
0.05          10251       22          6        16.8000
0.05          10251       57          15       15.6000

Review the NorthWind diagram. The next closest tables to Order Details are Orders and Products. Add the Orders entity. Hint: Add entity <add name="Orders"/> and run Tfl in metadata mode.

XML
<add name="Orders" version="RowVersion">
    <fields>
        <add name="OrderID" type="System.Int32" primary-key="true" ></add>
        <add name="Discount" type="System.Single" ></add>
        <add name="Quantity" type="System.Int16" ></add>
        <add name="RowVersion" type="System.Byte[]" length="8" ></add>
        <add name="UnitPrice" type="System.Decimal" precision="19" scale="4"></add>
        <add name="CustomerID" type="System.Char" length="5" ></add>
        <add name="EmployeeID" type="System.Int32" ></add>
        <add name="Freight" type="System.Decimal" precision="19" scale="4"></add>
        <add name="OrderDate" type="System.DateTime" ></add>
        <add name="RequiredDate" type="System.DateTime" ></add>
        <add name="RowVersion" type="System.Byte[]" length="8" ></add>
        <add name="ShipAddress" length="60" ></add>
        <add name="ShipCity" length="15" ></add>
        <add name="ShipCountry" length="15" ></add>
        <add name="ShipName" length="40" ></add>
        <add name="ShippedDate" type="System.DateTime" ></add>
        <add name="ShipPostalCode" length="10" ></add>
        <add name="ShipRegion" length="15" ></add>
        <add name="ShipVia" type="System.Int32" ></add>
    </fields>
</add> 

Re-initialize.

tfl NorthWind.xml {'mode':'init'}
22:32:14 | Error | NorthWind | The entity Orders must have a relationship to the master entity Order Details.

When another table is added, it must be related to the master table. The master table is the first table defined. In this case, it's Order Details. So, we have to add a relationship:

XML
</entities>
    <relationships>
        <add left-entity="Order Details" left-field="OrderID" 
                right-entity="Orders" right-field="OrderID"/>
    </relationships>
</process> 

Re-initialize.

tfl NorthWind.xml {'mode':'init'}
23:13:31 | Error | NorthWind | field overlap error in Orders. The field: RowVersion is already defined in a previous entity.  You must alias (rename) it.

Just like in SQL views, multiple entities (or tables) joined together can introduce identical field names.  So, you have to re-name (or alias) any columns that have the same name.  In this case, it's our RowVersion column that we're using to detect changes.  So, alias the RowVersion in the Orders entity to OrdersRowVersion like this: 

XML
<add name="Orders" version="RowVersion">
	<fields>
		<!-- ... -->
		<add name="RowVersion" alias="OrdersRowVersion" type="System.Byte[]" length="8" />
	</fields>
</add>

Re-initialize and run twice.

tfl NorthWind.xml {'mode':'init'}
23:23:47 | Info | NorthWind | All | Initialized TrAnSfOrMaLiZeR.
23:23:47 | Info | NorthWind | All | Initialized NorthWindOrderDetails in NorthWindOutput on localhost.
23:23:47 | Info | NorthWind | All | Initialized NorthWindOrders in NorthWindOutput on localhost.
23:23:47 | Info | NorthWind | All | Process completed in 00:00:00.6609756.

tfl NorthWind.xml
23:24:30 | Info | NorthWind | Order Details....... | Processed 2155 inserts, and 0 updates in Order Details.
23:24:30 | Info | NorthWind | Orders.............. | Processed 830 inserts, and 0 updates in Orders.
23:24:30 | Info | NorthWind | Orders.............. | Process completed in 00:00:00.9719255.

tfl NorthWind.xml
23:24:35 | Info | NorthWind | Order Details....... | Processed 0 inserts, and 0 updates in Order Details.
23:24:35 | Info | NorthWind | Orders.............. | Processed 0 inserts, and 0 updates in Orders.
23:24:35 | Info | NorthWind | Orders.............. | Process completed in 00:00:00.7284382.

View the output:

SQL
SELECT TOP 10 Discount, OrderID, ProductID, Quantity, UnitPrice, 
  CustomerID, EmployeeID, Freight, OrderDate, RequiredDate, 
  ShipAddress, ShipCity, ShippedDate, ShipPostalCode, ShipRegion, ShipVia
FROM NorthWindStar;
Discount OrderID   ProductID   Quantity UnitPrice  CustomerID EmployeeID  Freight    OrderDate  RequiredDate ShipAddress               ShipCity        ShippedDate ShipPostalCode ShipRegion ShipVia
-------- --------- ----------- -------- ---------  ---------- ----------- --------   ---------- ------------ ------------------------- --------------- ----------- -------------- ---------- -----------
0.2      10248     11          12       14.0000    VINET      5           32.3800    1996-07-04 1996-08-01   59 rue de l'Abbaye    Reims           1996-07-16  51100                     3
0        10248     42          10       9.8000     VINET      5           32.3800    1996-07-04 1996-08-01   59 rue de l'Abbaye    Reims           1996-07-16  51100                     3
0        10248     72          5        34.8000    VINET      5           32.3800    1996-07-04 1996-08-01   59 rue de l'Abbaye    Reims           1996-07-16  51100                     3
0        10249     14          9        18.6000    TOMSP      6           11.6100    1996-07-05 1996-08-16   Luisenstr. 48             Münster    1996-07-10  44087                     1
0        10249     51          40       42.4000    TOMSP      6           11.6100    1996-07-05 1996-08-16   Luisenstr. 48             Münster    1996-07-10  44087                     1
0        10250     41          10       7.7000     HANAR      4           65.8300    1996-07-08 1996-08-05   Rua do Paço, 67    Rio de Janeiro  1996-07-12  05454-876      RJ         2
0.15     10250     51          35       42.4000    HANAR      4           65.8300    1996-07-08 1996-08-05   Rua do Paço, 67    Rio de Janeiro  1996-07-12  05454-876      RJ         2
0.15     10250     65          15       16.8000    HANAR      4           65.8300    1996-07-08 1996-08-05   Rua do Paço, 67    Rio de Janeiro  1996-07-12  05454-876      RJ         2
0.05     10251     22          6        16.8000    VICTE      3           41.3400    1996-07-08 1996-08-05   2, rue du Commerce        Lyon            1996-07-15  69004                     1
0.05     10251     57          15       15.6000    VICTE      3           41.3400    1996-07-08 1996-08-05   2, rue du Commerce        Lyon            1996-07-15  69004                     1

Now, rinse and repeat.  That is, consult the NorthWind diagram and continue adding related entities until the relationships configuration look like this:

XML
<relationships>
    <add left-entity="Order Details" left-field="OrderID" right-entity="Orders" right-field="OrderID" />
    <add left-entity="Order Details" left-field="ProductID" right-entity="Products" right-field="ProductID" />
    <add left-entity="Orders" left-field="CustomerID" right-entity="Customers" right-field="CustomerID" />
    <add left-entity="Orders" left-field="EmployeeID" right-entity="Employees" right-field="EmployeeID" />
    <add left-entity="Orders" left-field="ShipVia" right-entity="Shippers" right-field="ShipperID" />
    <add left-entity="Products" left-field="SupplierID" right-entity="Suppliers" right-field="SupplierID" />
    <add left-entity="Products" left-field="CategoryID" right-entity="Categories" right-field="CategoryID" />
</relationships>

As you might expect, adding all these entities creates many duplicate field names. Instead of renaming each one, we can add a prefix to the entity. A prefix aliases all the fields as prefix + name.

XML
<add name="Employees" version="RowVersion" prefix="Employee">
    <fields>
		<!-- ... -->
    </fields>
</add>

Initialize, and run twice. Console output should look like this:

tfl NorthWind.xml {'mode':'init'}
19:41:53 | Info | NorthWind | All | Initialized TrAnSfOrMaLiZeR.
19:41:53 | Info | NorthWind | All | Initialized NorthWindOrderDetails in NorthWindOutput on localhost.
19:41:53 | Info | NorthWind | All | Initialized NorthWindOrders in NorthWindOutput on localhost.
19:41:53 | Info | NorthWind | All | Initialized NorthWindProducts in NorthWindOutput on localhost.
19:41:53 | Info | NorthWind | All | Initialized NorthWindCustomers in NorthWindOutput on localhost.
19:41:53 | Info | NorthWind | All | Initialized NorthWindEmployees in NorthWindOutput on localhost.
19:41:53 | Info | NorthWind | All | Initialized NorthWindShippers in NorthWindOutput on localhost.
19:41:53 | Info | NorthWind | All | Initialized NorthWindSuppliers in NorthWindOutput on localhost.
19:41:53 | Info | NorthWind | All | Initialized NorthWindCategories in NorthWindOutput on localhost.
19:41:53 | Info | NorthWind | All | Process completed in 00:00:01.1828232.

tfl NorthWind.xml
19:42:06 | Info | NorthWind | Order Details....... | Processed 2155 inserts, and 0 updates in Order Details.
19:42:07 | Info | NorthWind | Orders.............. | Processed 830 inserts, and 0 updates in Orders.
19:42:07 | Info | NorthWind | Products............ | Processed 77 inserts, and 0 updates in Products.
19:42:07 | Info | NorthWind | Customers........... | Processed 91 inserts, and 0 updates in Customers.
19:42:07 | Info | NorthWind | Employees........... | Processed 9 inserts, and 0 updates in Employees.
19:42:07 | Info | NorthWind | Shippers............ | Processed 3 inserts, and 0 updates in Shippers.
19:42:07 | Info | NorthWind | Suppliers........... | Processed 29 inserts, and 0 updates in Suppliers.
19:42:07 | Info | NorthWind | Categories.......... | Processed 8 inserts, and 0 updates in Categories.
19:42:07 | Info | NorthWind | Orders.............. | Processed 2155 rows. Updated Order Details with Orders.
19:42:07 | Info | NorthWind | Products............ | Processed 2155 rows. Updated Order Details with Products.
19:42:07 | Info | NorthWind | All................. | Process completed in 00:00:01.2583563.

tfl NorthWind.xml
19:42:13 | Info | NorthWind | Order Details....... | Processed 0 inserts, and 0 updates in Order Details.
19:42:13 | Info | NorthWind | Orders.............. | Processed 0 inserts, and 0 updates in Orders.
19:42:13 | Info | NorthWind | Products............ | Processed 0 inserts, and 0 updates in Products.
19:42:13 | Info | NorthWind | Customers........... | Processed 0 inserts, and 0 updates in Customers.
19:42:13 | Info | NorthWind | Employees........... | Processed 0 inserts, and 0 updates in Employees.
19:42:13 | Info | NorthWind | Shippers............ | Processed 0 inserts, and 0 updates in Shippers.
19:42:13 | Info | NorthWind | Suppliers........... | Processed 0 inserts, and 0 updates in Suppliers.
19:42:13 | Info | NorthWind | Categories.......... | Processed 0 inserts, and 0 updates in Categories.
19:42:13 | Info | NorthWind | All................. | Process completed in 00:00:00.7708553.

Now there are 81 fields available in the output NorthWindStar:

SQL
SELECT COUNT(*) AS FieldCount
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 'NorthWindStar'
FieldCount
-----------
81

Northwind Star Schema

As you can see in the diagram above, I haven't completely de-normalized the data.  Instead, I have created a star schema, where every related table has a single join to the master table.  In addition, I have created a view (e.g. NorthWindStar) so users may query it as if it we're a single table.  Having this single view of the duplicated data allows us to quickly create simple cubes or integrate with search engines: 

Leveraging SQL Server Analysis Services

Open up BIDS, and create an Analysis Services Project (or cube) to browse the data.

  • Set the data source to your NorthWindOutput database 
  • Set the data view to the NorthWindStar view
  • Create a dimension using all the fields (except the binary ones) in the view. Call it Properties.
  • Create a cube with a couple measures (e.g. UnitPrice, Quantity) and connect it to the Properties dimension.
  • Process the cube and browse it. Note: You may have to grant NT AUTHORITY\LOCAL SERVICE read writes to the NorthWindOutput database.

BIDS

As you can see, slicing the measures by order date isn't ideal. Moreover, the unit price and quantity measures don't help much by themselves. This cube needs a time hierarchy and revenue calculation. We can add them with Transformalize. First, add three calculated fields based on "order date" to create a time hierarchy:

XML
<add name="Orders" version="RowVersion" prefix="Orders">
    <fields>
      <!-- ... -->
    </fields>
    <calculated-fields>
      <add name="TimeDate" length="10" default="9999-12-31">
        <transforms>
            <add method="toString" format="yyyy-MM-dd" parameter="OrderDate" />
        </transforms>
      </add>
      <add name="TimeMonth" length="6" default="12-DEC">
        <transforms>
            <add method="toString" format="MM-MMM" parameter="OrderDate" />
            <add method="toUpper" />
        </transforms>
      </add>
      <add name="TimeYear" type="System.Int16" default="9999">
        <transforms>
            <add method="toString" format="yyyy" parameter="OrderDate" />
		</transforms>
      </add>
    </calculated-fields>
</add>

Calculated fields project new fields based on the values of other fields and previously defined other calculated fields. They are used at the entity level, or at the process level. In an entity, they have access to any field within their entity. In a process, they have access to all of the data. To control which fields they have access to, use parameters like this:

XML
<transform method="format" format="{0} is a big city!">
    <parameters>
        <add field="City" />
    </parameters>
</transform>

You may add multiple parameters in this way.  However, if you only have a single parameter, you can specify it in the parameter attribute in the transform element itself, like this:

XML
<transform method="format" format="{0} is a big city!" 
        parameter="City" /> 

Another short-cut is to set the parameter attribute to "*" to include all fields.

There are many built-in Transforms. If you can't find one that fits your needs, you can use the C#, JavaScript, or the Razor template transforms to define your own. Let's use a JavaScript transform to calculate revenue:

XML
<calculated-fields>
    <add name="Revenue" type="System.Decimal" >
        <transforms>
            <add method="javascript" script="(UnitPrice * (1 - Discount)) * Quantity" parameter="*" />
        </transforms>
    </add>
</calculated-fields>

Re-initialize and run Tfl. Then, using the new time fields and revenue, see if it improves the cube browsing experience.

BIDS 2

The cube looks better now, but we'll need it to update whenever Transformalize runs.  So,  add a connection to Analysis Services and a corresponding template action: 

XML
<connections>
    <add name="input" connection-string="server=localhost;Database=NorthWind;Trusted_Connection=True;"/>
    <add name="output" connection-string="Server=localhost;Database=NorthWindOutput;Trusted_Connection=True;"/>
    <add name="cube" connection-string="Data Source=localhost;Catalog=NorthWind;" provider="AnalysisServices"/>
</connections>

<!-- ... -->

<templates path="C:\Tfl\">
    <add name="process-cube" file="process-cube.xmla">
        <settings>
            <add name="DatabaseID" value="NorthWind2"></add>
        </settings>    
        <actions>
            <add action="run" connection="cube"></add>
        </actions>
    </add>
</templates>**

Transformalize "templates" use C# Razor syntax. Settings are passed into the template and used like this:

XML
<Batch xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">
  <Process xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ddl2="http://schemas.microsoft.com/analysisservices/2003/engine/2" xmlns:ddl2_2="http://schemas.microsoft.com/analysisservices/2003/engine/2/2" xmlns:ddl100_100="http://schemas.microsoft.com/analysisservices/2008/engine/100/100" xmlns:ddl200="http://schemas.microsoft.com/analysisservices/2010/engine/200" xmlns:ddl200_200="http://schemas.microsoft.com/analysisservices/2010/engine/200/200">
    <Object>
      <DatabaseID>@(Model.Settings.DatabaseID)</DatabaseID>
    </Object>
    <Type>ProcessFull</Type>
    <WriteBackTableCreation>UseExisting</WriteBackTableCreation>
  </Process>
</Batch>

The @(Model.Settings.DatabaseID) will be replaced with NorthWind2. Transformalize's template manager will render the template, and subsequently run defined "actions." The "run" action executes the rendered content against the designated connection.  This allows you to dynamically build data manipulation queries, or XMLA commands in this case, and execute them.

tfl NorthWind.xml
00:14:28 | Info | NorthWind | Order Details.. | Processed 2155 inserts, and 0 updates in Order Details.
00:14:28 | Info | NorthWind | Orders......... | Processed 830 inserts, and 0 updates in Orders.
00:14:28 | Info | NorthWind | Products....... | Processed 77 inserts, and 0 updates in Products.
00:14:28 | Info | NorthWind | Customers...... | Processed 91 inserts, and 0 updates in Customers.
00:14:28 | Info | NorthWind | Employees...... | Processed 9 inserts, and 0 updates in Employees.
00:14:28 | Info | NorthWind | Shippers....... | Processed 3 inserts, and 0 updates in Shippers.
00:14:28 | Info | NorthWind | Suppliers...... | Processed 29 inserts, and 0 updates in Suppliers.
00:14:28 | Info | NorthWind | Categories..... | Processed 8 inserts, and 0 updates in Categories.
00:14:28 | Info | NorthWind | Orders......... | Processed 2155 rows. Updated Order Details with Orders.
00:14:29 | Info | NorthWind | Products....... | Processed 2155 rows. Updated Order Details with Products.
00:14:31 | Info | NorthWind | Categories..... | process-cube ran successfully.
00:14:31 | Info | NorthWind | All............ | Process completed in 00:00:03.8312882.

Leveraging Apache SOLR

With more complex templates, and an Apache SOLR server, it is possible to integrate full text search into the process as well.  Transformalize comes with a pair of templates that can build the necessary SOLR configuration files for schema, and data import handling. 

XML
<templates>

    <add name="solr-data-handler" file="solr-data-handler.cshtml" cache="true">
      <actions>
        <add action="copy" file="C:\Solr\NorthWind\conf\data-config.xml"/>
      </actions>
    </add>

    <add name="solr-schema" file="solr-schema.cshtml" cache="true">
      <actions>
        <add action="copy" file="C:\Solr\NorthWind\conf\schema.xml"/>
        <add action="web" url="http://localhost:8983/solr/NorthWind/dataimport?command=full-import&amp;clean=true&amp;commit=true&amp;optimize=true"/>
      </actions>
    </add>

</templates>

<search-types>
    <add name="default" />
    <add name="facet" analyzer="lowercase" store="true" index="true" />
    <add name="standard" analyzer="standard_lowercase" store="false" index="true"/>
</search-types>

The Razor templates "solr-data-handler.cshtml" and "solr-schema.cshtml" render the SOLR configuration files.  This is possible because the template manager passes the entire NorthWind configuration (that we've built with XML above) into the templates.

To control how the fields are handled in SOLR, "search types" are applied to each &lt;field/&gt; or &lt;calculated-field/&gt;. By default, each field is indexed and stored in the search index according to it's data type. To assign more complex text analysis, you can set the search-type attribute to facet, or standard, or any others you define. To exclude a field from search, set search-type to "none."

Running Tfl now produces:

tfl NorthWind.xml
...
00:48:25 | Info | NorthWind | Products...... | Processed 2155 rows. Updated Order Details with Products.
00:48:28 | Info | NorthWind | Categories.... | process-cube ran successfully.
00:48:28 | Info | NorthWind | Categories.... | Copied solr-data-handler template output to C:\Solr\NorthWind\conf\data-config.xml.
00:48:29 | Info | NorthWind | Categories.... | Copied solr-schema template output to C:\Solr\NorthWind\conf\schema.xml.
00:48:29 | Info | NorthWind | Categories.... | Made web request to http://localhost:8983/solr/NorthWind/dataimport?command=full-import&clean=true&commit=true&optimize=true.
00:48:29 | Info | NorthWind | Categories.... | Process completed in 00:00:04.8287386.

In this example, the template action "web" triggers SOLR to clean and re-import the index. In a production environment, you'd want to reload the schema when it changes, and make use of full and delta imports appropriately. If all goes well, you see something like this in the SOLR admin:

SOLR

Now, if you schedule Transformalize to run every couple minutes, you have near real-time OLAP and search engine services on top of your OLTP data.  An OLAP cube supports more performant and complex reporting requirements, and a search engine allows for lightning fast and fuzzy searches for specific records.  If your users want to see the data in different ways, and they will, all you have to do is add transforms and/or new calculated fields and re-initialize your output.  

When Transformalize reads your production databases, it attempts to do so introducing as little contention as possible. Using version fields, it can keep a star-schema copy of very large databases up to date very quickly.

Summary

The NorthWind data is fairly clean. In reality, you'll face more challenging data sources.


Transformalize uses several other open source projects including 

  1. Rhino ETL 
  2. Razor Engine
  3. Jint
  4. Ninject
  5. NLog
  6. fastJSON 
  7. Dapper-dot-net
  8. File Helpers
  9. Excel Data Reader
  10. Enterprise Library 6 Validation Block
  11. Elasticsearch.NET
  12. SolrNet

Where possible, I've included source code from these projects rather than the Nuget packages. The upside of doing this is I get to step into and learn from other people's code. The downside is it's a bit harder to keep these libraries up to date.

License

This article, along with any associated source code and files, is licensed under The GNU General Public License (GPLv3)