Introduction
Transformalize aims to transform and denormalize relational data in near real-time. The resulting data may be used in several ways:
- As an OLAP cube data source
- To feed a SOLR or Elasticsearch index
- To provide faster, simpler, non-blocking access to regular SQL queries and reports
- Or, use your imagination... (e.g. to feed a Redis cache server, to load a NoSql database, etc.)
Transformalize is an open source .NET 4.5 class library. It may be referenced and run directly in code, or run with an included console application. It's source is hosted on GitHub.
Demo
Start with an XML configuration:
<transformalize>
<processes>
<add name="NorthWind">
<connections>
<add name="input" />
<add name="output" />
</connections>
<entities/>
<relationships/>
</add>
</processes>
</transformalize>
Specify the NorthWind database as an input, and another as the output. The default input and output connection provider is SQL Server 2005+.
<connections>
<add name="input" connection-string="server=localhost;Database=NorthWind;Trusted_Connection=True;"/>
<add name="output" connection-string="server=localhost;Database=NorthWindOutput;Trusted_Connection=True;"/>
</connections>
The NorthWind database schema:
Add an Order Details
entity and save the file as NorthWind.xml.
<entities>
<add name="Order Details"/>
</entities>
Using the console application, run Transformalize (aka tfl) in "metadata" mode:
tfl NorthWind.xml {'mode':'metadata'}
Metadata mode reads the information schema of the database. Then, it writes and opens an XML file with Order Detail's primary key and field definitions. Copy them into NorthWind.xml:
<entities>
<add name="Order Details">
<fields>
<add name="OrderID" type="System.Int32" primary-key="true" />
<add name="ProductID" type="System.Int32" primary-key="true" />
<add name="Discount" type="System.Single" />
<add name="Quantity" type="System.Int16" />
<add name="UnitPrice" type="System.Decimal" precision="19" scale="4"/>
</fields>
</add>
</entities>
Now, run Transformalize in Initialize mode:
tfl NorthWind.xml {'mode':'init'}
23:38:57 | Info | NorthWind | All | Initialized TrAnSfOrMaLiZeR.
23:38:57 | Info | NorthWind | All | Initialized NorthWindOrderDetails in NorthWindOutput on localhost.
23:38:57 | Info | NorthWind | All | Process completed in 00:00:00.5585967.
Initialize mode initializes the output, preparing a place to store the data. Now run Tfl without specifying a mode:
tfl NorthWind.xml
23:43:01 | Info | NorthWind | Order Details....... | Processed 2155 inserts, and 0 updates in Order Details.
23:43:01 | Info | NorthWind | Order Details....... | Process completed in 00:00:00.7455880.
Transformalize copied the data that is configured in Northwind.xml. If we run it again, this happens:
tfl NorthWind.xml
23:44:18 | Info | NorthWind | Order Details....... | Processed 0 inserts, and 2155 updates in Order Details.
23:44:18 | Info | NorthWind | Order Details....... | Process completed in 00:00:01.0926105.
It updates the data. It copies new and updates existing data, but it is inefficient. The 2155 records have not been modified in the source, but they have been updated unnecessarily in the destination. So, we need to add a version column to Order Details
entity. A version column should be a value that will increment anytime a record is inserted or updated. Conveniently, SQL Server offers a ROWVERSION type that gives us a version column without having to modify the application or add a trigger.
ALTER TABLE [Order Details] ADD RowVersion ROWVERSION;
Update the Order Details
entity to use RowVersion:
<entities>
<add name="Order Details" version="RowVersion">
<fields>
<add name="OrderID" type="System.Int32" primary-key="true" />
<add name="ProductID" type="System.Int32" primary-key="true" />
<add name="Discount" type="System.Single" />
<add name="Quantity" type="System.Int16" />
<add name="RowVersion" type="System.Byte[]" length="8" />
<add name="UnitPrice" type="System.Decimal" precision="19" scale="4"/>
</fields>
</add>
</entities>
Re-initialize and run twice:
tfl NorthWind.xml {'mode':'init'}
23:58:52 | Info | NorthWind | All | Initialized TrAnSfOrMaLiZeR.
23:58:52 | Info | NorthWind | All | Initialized NorthWindOrderDetails in NorthWindOutput on localhost.
23:58:52 | Info | NorthWind | All | Process completed in 00:00:00.5504415.
tfl NorthWind.xml
00:00:18 | Info | NorthWind | Order Details....... | Processed 2155 inserts, and 0 updates in Order Details.
00:00:18 | Info | NorthWind | Order Details....... | Process completed in 00:00:00.7417452.
tfl NorthWind.xml
00:00:23 | Info | NorthWind | Order Details....... | Processed 0 inserts, and 0 updates in Order Details.
00:00:23 | Info | NorthWind | Order Details....... | Process completed in 00:00:00.6042720.
Now it doesn't update data unnecessarily. It's using the version field to sense that the data hasn't been updated. Let's view the output.
SELECT TOP 10 Discount, OrderID, ProductID, Quantity, UnitPrice
FROM NorthWindStar;
Discount OrderID ProductID Quantity UnitPrice
------------- ----------- ----------- -------- ---------
0.2 10248 11 12 14.0000
0 10248 42 10 9.8000
0 10248 72 5 34.8000
0 10249 14 9 18.6000
0 10249 51 40 42.4000
0 10250 41 10 7.7000
0.15 10250 51 35 42.4000
0.15 10250 65 15 16.8000
0.05 10251 22 6 16.8000
0.05 10251 57 15 15.6000
Review the NorthWind diagram. The next closest tables to Order Details
are Orders
and Products
. Add the Orders
entity. Hint: Add entity <add name="Orders"/> and run Tfl in metadata mode.
<add name="Orders" version="RowVersion">
<fields>
<add name="OrderID" type="System.Int32" primary-key="true" ></add>
<add name="Discount" type="System.Single" ></add>
<add name="Quantity" type="System.Int16" ></add>
<add name="RowVersion" type="System.Byte[]" length="8" ></add>
<add name="UnitPrice" type="System.Decimal" precision="19" scale="4"></add>
<add name="CustomerID" type="System.Char" length="5" ></add>
<add name="EmployeeID" type="System.Int32" ></add>
<add name="Freight" type="System.Decimal" precision="19" scale="4"></add>
<add name="OrderDate" type="System.DateTime" ></add>
<add name="RequiredDate" type="System.DateTime" ></add>
<add name="RowVersion" type="System.Byte[]" length="8" ></add>
<add name="ShipAddress" length="60" ></add>
<add name="ShipCity" length="15" ></add>
<add name="ShipCountry" length="15" ></add>
<add name="ShipName" length="40" ></add>
<add name="ShippedDate" type="System.DateTime" ></add>
<add name="ShipPostalCode" length="10" ></add>
<add name="ShipRegion" length="15" ></add>
<add name="ShipVia" type="System.Int32" ></add>
</fields>
</add>
Re-initialize.
tfl NorthWind.xml {'mode':'init'}
22:32:14 | Error | NorthWind | The entity Orders must have a relationship to the master entity Order Details.
When another table is added, it must be related to the master table. The master table is the first table defined. In this case, it's Order Details
. So, we have to add a relationship:
</entities>
<relationships>
<add left-entity="Order Details" left-field="OrderID"
right-entity="Orders" right-field="OrderID"/>
</relationships>
</process>
Re-initialize.
tfl NorthWind.xml {'mode':'init'}
23:13:31 | Error | NorthWind | field overlap error in Orders. The field: RowVersion is already defined in a previous entity. You must alias (rename) it.
Just like in SQL views, multiple entities (or tables) joined together can introduce identical field names. So, you have to re-name (or alias) any columns that have the same name. In this case, it's our RowVersion column that we're using to detect changes. So, alias the RowVersion in the Orders entity to OrdersRowVersion like this:
<add name="Orders" version="RowVersion">
<fields>
<add name="RowVersion" alias="OrdersRowVersion" type="System.Byte[]" length="8" />
</fields>
</add>
Re-initialize and run twice.
tfl NorthWind.xml {'mode':'init'}
23:23:47 | Info | NorthWind | All | Initialized TrAnSfOrMaLiZeR.
23:23:47 | Info | NorthWind | All | Initialized NorthWindOrderDetails in NorthWindOutput on localhost.
23:23:47 | Info | NorthWind | All | Initialized NorthWindOrders in NorthWindOutput on localhost.
23:23:47 | Info | NorthWind | All | Process completed in 00:00:00.6609756.
tfl NorthWind.xml
23:24:30 | Info | NorthWind | Order Details....... | Processed 2155 inserts, and 0 updates in Order Details.
23:24:30 | Info | NorthWind | Orders.............. | Processed 830 inserts, and 0 updates in Orders.
23:24:30 | Info | NorthWind | Orders.............. | Process completed in 00:00:00.9719255.
tfl NorthWind.xml
23:24:35 | Info | NorthWind | Order Details....... | Processed 0 inserts, and 0 updates in Order Details.
23:24:35 | Info | NorthWind | Orders.............. | Processed 0 inserts, and 0 updates in Orders.
23:24:35 | Info | NorthWind | Orders.............. | Process completed in 00:00:00.7284382.
View the output:
SELECT TOP 10 Discount, OrderID, ProductID, Quantity, UnitPrice,
CustomerID, EmployeeID, Freight, OrderDate, RequiredDate,
ShipAddress, ShipCity, ShippedDate, ShipPostalCode, ShipRegion, ShipVia
FROM NorthWindStar;
Discount OrderID ProductID Quantity UnitPrice CustomerID EmployeeID Freight OrderDate RequiredDate ShipAddress ShipCity ShippedDate ShipPostalCode ShipRegion ShipVia
-------- --------- ----------- -------- --------- ---------- ----------- -------- ---------- ------------ ------------------------- --------------- ----------- -------------- ---------- -----------
0.2 10248 11 12 14.0000 VINET 5 32.3800 1996-07-04 1996-08-01 59 rue de l'Abbaye Reims 1996-07-16 51100 3
0 10248 42 10 9.8000 VINET 5 32.3800 1996-07-04 1996-08-01 59 rue de l'Abbaye Reims 1996-07-16 51100 3
0 10248 72 5 34.8000 VINET 5 32.3800 1996-07-04 1996-08-01 59 rue de l'Abbaye Reims 1996-07-16 51100 3
0 10249 14 9 18.6000 TOMSP 6 11.6100 1996-07-05 1996-08-16 Luisenstr. 48 Münster 1996-07-10 44087 1
0 10249 51 40 42.4000 TOMSP 6 11.6100 1996-07-05 1996-08-16 Luisenstr. 48 Münster 1996-07-10 44087 1
0 10250 41 10 7.7000 HANAR 4 65.8300 1996-07-08 1996-08-05 Rua do Paço, 67 Rio de Janeiro 1996-07-12 05454-876 RJ 2
0.15 10250 51 35 42.4000 HANAR 4 65.8300 1996-07-08 1996-08-05 Rua do Paço, 67 Rio de Janeiro 1996-07-12 05454-876 RJ 2
0.15 10250 65 15 16.8000 HANAR 4 65.8300 1996-07-08 1996-08-05 Rua do Paço, 67 Rio de Janeiro 1996-07-12 05454-876 RJ 2
0.05 10251 22 6 16.8000 VICTE 3 41.3400 1996-07-08 1996-08-05 2, rue du Commerce Lyon 1996-07-15 69004 1
0.05 10251 57 15 15.6000 VICTE 3 41.3400 1996-07-08 1996-08-05 2, rue du Commerce Lyon 1996-07-15 69004 1
Now, rinse and repeat. That is, consult the NorthWind diagram and continue adding related entities until the relationships configuration look like this:
<relationships>
<add left-entity="Order Details" left-field="OrderID" right-entity="Orders" right-field="OrderID" />
<add left-entity="Order Details" left-field="ProductID" right-entity="Products" right-field="ProductID" />
<add left-entity="Orders" left-field="CustomerID" right-entity="Customers" right-field="CustomerID" />
<add left-entity="Orders" left-field="EmployeeID" right-entity="Employees" right-field="EmployeeID" />
<add left-entity="Orders" left-field="ShipVia" right-entity="Shippers" right-field="ShipperID" />
<add left-entity="Products" left-field="SupplierID" right-entity="Suppliers" right-field="SupplierID" />
<add left-entity="Products" left-field="CategoryID" right-entity="Categories" right-field="CategoryID" />
</relationships>
As you might expect, adding all these entities creates many duplicate field names. Instead of renaming each one, we can add a prefix to the entity. A prefix aliases all the fields as prefix + name.
<add name="Employees" version="RowVersion" prefix="Employee">
<fields>
</fields>
</add>
Initialize, and run twice. Console output should look like this:
tfl NorthWind.xml {'mode':'init'}
19:41:53 | Info | NorthWind | All | Initialized TrAnSfOrMaLiZeR.
19:41:53 | Info | NorthWind | All | Initialized NorthWindOrderDetails in NorthWindOutput on localhost.
19:41:53 | Info | NorthWind | All | Initialized NorthWindOrders in NorthWindOutput on localhost.
19:41:53 | Info | NorthWind | All | Initialized NorthWindProducts in NorthWindOutput on localhost.
19:41:53 | Info | NorthWind | All | Initialized NorthWindCustomers in NorthWindOutput on localhost.
19:41:53 | Info | NorthWind | All | Initialized NorthWindEmployees in NorthWindOutput on localhost.
19:41:53 | Info | NorthWind | All | Initialized NorthWindShippers in NorthWindOutput on localhost.
19:41:53 | Info | NorthWind | All | Initialized NorthWindSuppliers in NorthWindOutput on localhost.
19:41:53 | Info | NorthWind | All | Initialized NorthWindCategories in NorthWindOutput on localhost.
19:41:53 | Info | NorthWind | All | Process completed in 00:00:01.1828232.
tfl NorthWind.xml
19:42:06 | Info | NorthWind | Order Details....... | Processed 2155 inserts, and 0 updates in Order Details.
19:42:07 | Info | NorthWind | Orders.............. | Processed 830 inserts, and 0 updates in Orders.
19:42:07 | Info | NorthWind | Products............ | Processed 77 inserts, and 0 updates in Products.
19:42:07 | Info | NorthWind | Customers........... | Processed 91 inserts, and 0 updates in Customers.
19:42:07 | Info | NorthWind | Employees........... | Processed 9 inserts, and 0 updates in Employees.
19:42:07 | Info | NorthWind | Shippers............ | Processed 3 inserts, and 0 updates in Shippers.
19:42:07 | Info | NorthWind | Suppliers........... | Processed 29 inserts, and 0 updates in Suppliers.
19:42:07 | Info | NorthWind | Categories.......... | Processed 8 inserts, and 0 updates in Categories.
19:42:07 | Info | NorthWind | Orders.............. | Processed 2155 rows. Updated Order Details with Orders.
19:42:07 | Info | NorthWind | Products............ | Processed 2155 rows. Updated Order Details with Products.
19:42:07 | Info | NorthWind | All................. | Process completed in 00:00:01.2583563.
tfl NorthWind.xml
19:42:13 | Info | NorthWind | Order Details....... | Processed 0 inserts, and 0 updates in Order Details.
19:42:13 | Info | NorthWind | Orders.............. | Processed 0 inserts, and 0 updates in Orders.
19:42:13 | Info | NorthWind | Products............ | Processed 0 inserts, and 0 updates in Products.
19:42:13 | Info | NorthWind | Customers........... | Processed 0 inserts, and 0 updates in Customers.
19:42:13 | Info | NorthWind | Employees........... | Processed 0 inserts, and 0 updates in Employees.
19:42:13 | Info | NorthWind | Shippers............ | Processed 0 inserts, and 0 updates in Shippers.
19:42:13 | Info | NorthWind | Suppliers........... | Processed 0 inserts, and 0 updates in Suppliers.
19:42:13 | Info | NorthWind | Categories.......... | Processed 0 inserts, and 0 updates in Categories.
19:42:13 | Info | NorthWind | All................. | Process completed in 00:00:00.7708553.
Now there are 81 fields available in the output NorthWindStar
:
SELECT COUNT(*) AS FieldCount
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 'NorthWindStar'
FieldCount
-----------
81
As you can see in the diagram above, I haven't completely de-normalized the data. Instead, I have created a star schema, where every related table has a single join to the master table. In addition, I have created a view (e.g. NorthWindStar) so users may query it as if it we're a single table. Having this single view of the duplicated data allows us to quickly create simple cubes or integrate with search engines:
Leveraging SQL Server Analysis Services
Open up BIDS, and create an Analysis Services Project (or cube) to browse the data.
- Set the data source to your NorthWindOutput database
- Set the data view to the NorthWindStar view
- Create a dimension using all the fields (except the binary ones) in the view. Call it Properties.
- Create a cube with a couple measures (e.g. UnitPrice, Quantity) and connect it to the Properties dimension.
- Process the cube and browse it. Note: You may have to grant NT AUTHORITY\LOCAL SERVICE read writes to the NorthWindOutput database.
As you can see, slicing the measures by order date isn't ideal. Moreover, the unit price and quantity measures don't help much by themselves. This cube needs a time hierarchy and revenue calculation. We can add them with Transformalize. First, add three calculated fields based on "order date" to create a time hierarchy:
<add name="Orders" version="RowVersion" prefix="Orders">
<fields>
</fields>
<calculated-fields>
<add name="TimeDate" length="10" default="9999-12-31">
<transforms>
<add method="toString" format="yyyy-MM-dd" parameter="OrderDate" />
</transforms>
</add>
<add name="TimeMonth" length="6" default="12-DEC">
<transforms>
<add method="toString" format="MM-MMM" parameter="OrderDate" />
<add method="toUpper" />
</transforms>
</add>
<add name="TimeYear" type="System.Int16" default="9999">
<transforms>
<add method="toString" format="yyyy" parameter="OrderDate" />
</transforms>
</add>
</calculated-fields>
</add>
Calculated fields project new fields based on the values of other fields and previously defined other calculated fields. They are used at the entity level, or at the process level. In an entity, they have access to any field within their entity. In a process, they have access to all of the data. To control which fields they have access to, use parameters like this:
<transform method="format" format="{0} is a big city!">
<parameters>
<add field="City" />
</parameters>
</transform>
You may add multiple parameters in this way. However, if you only have a single parameter, you can specify it in the parameter attribute in the transform element itself, like this:
<transform method="format" format="{0} is a big city!"
parameter="City" />
Another short-cut is to set the parameter attribute to "*" to include all fields.
There are many built-in Transforms. If you can't find one that fits your needs, you can use the C#, JavaScript, or the Razor template transforms to define your own. Let's use a JavaScript transform to calculate revenue:
<calculated-fields>
<add name="Revenue" type="System.Decimal" >
<transforms>
<add method="javascript" script="(UnitPrice * (1 - Discount)) * Quantity" parameter="*" />
</transforms>
</add>
</calculated-fields>
Re-initialize and run Tfl. Then, using the new time fields and revenue, see if it improves the cube browsing experience.
The cube looks better now, but we'll need it to update whenever Transformalize runs. So, add a connection to Analysis Services and a corresponding template action:
<connections>
<add name="input" connection-string="server=localhost;Database=NorthWind;Trusted_Connection=True;"/>
<add name="output" connection-string="Server=localhost;Database=NorthWindOutput;Trusted_Connection=True;"/>
<add name="cube" connection-string="Data Source=localhost;Catalog=NorthWind;" provider="AnalysisServices"/>
</connections>
<templates path="C:\Tfl\">
<add name="process-cube" file="process-cube.xmla">
<settings>
<add name="DatabaseID" value="NorthWind2"></add>
</settings>
<actions>
<add action="run" connection="cube"></add>
</actions>
</add>
</templates>**
Transformalize "templates" use C# Razor syntax. Settings are passed into the template and used like this:
<Batch xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">
<Process xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ddl2="http://schemas.microsoft.com/analysisservices/2003/engine/2" xmlns:ddl2_2="http://schemas.microsoft.com/analysisservices/2003/engine/2/2" xmlns:ddl100_100="http://schemas.microsoft.com/analysisservices/2008/engine/100/100" xmlns:ddl200="http://schemas.microsoft.com/analysisservices/2010/engine/200" xmlns:ddl200_200="http://schemas.microsoft.com/analysisservices/2010/engine/200/200">
<Object>
<DatabaseID>@(Model.Settings.DatabaseID)</DatabaseID>
</Object>
<Type>ProcessFull</Type>
<WriteBackTableCreation>UseExisting</WriteBackTableCreation>
</Process>
</Batch>
The @(Model.Settings.DatabaseID)
will be replaced with NorthWind2
. Transformalize's template manager will render the template, and subsequently run defined "actions." The "run" action executes the rendered content against the designated connection. This allows you to dynamically build data manipulation queries, or XMLA commands in this case, and execute them.
tfl NorthWind.xml
00:14:28 | Info | NorthWind | Order Details.. | Processed 2155 inserts, and 0 updates in Order Details.
00:14:28 | Info | NorthWind | Orders......... | Processed 830 inserts, and 0 updates in Orders.
00:14:28 | Info | NorthWind | Products....... | Processed 77 inserts, and 0 updates in Products.
00:14:28 | Info | NorthWind | Customers...... | Processed 91 inserts, and 0 updates in Customers.
00:14:28 | Info | NorthWind | Employees...... | Processed 9 inserts, and 0 updates in Employees.
00:14:28 | Info | NorthWind | Shippers....... | Processed 3 inserts, and 0 updates in Shippers.
00:14:28 | Info | NorthWind | Suppliers...... | Processed 29 inserts, and 0 updates in Suppliers.
00:14:28 | Info | NorthWind | Categories..... | Processed 8 inserts, and 0 updates in Categories.
00:14:28 | Info | NorthWind | Orders......... | Processed 2155 rows. Updated Order Details with Orders.
00:14:29 | Info | NorthWind | Products....... | Processed 2155 rows. Updated Order Details with Products.
00:14:31 | Info | NorthWind | Categories..... | process-cube ran successfully.
00:14:31 | Info | NorthWind | All............ | Process completed in 00:00:03.8312882.
Leveraging Apache SOLR
With more complex templates, and an Apache SOLR server, it is possible to integrate full text search into the process as well. Transformalize comes with a pair of templates that can build the necessary SOLR configuration files for schema, and data import handling.
<templates>
<add name="solr-data-handler" file="solr-data-handler.cshtml" cache="true">
<actions>
<add action="copy" file="C:\Solr\NorthWind\conf\data-config.xml"/>
</actions>
</add>
<add name="solr-schema" file="solr-schema.cshtml" cache="true">
<actions>
<add action="copy" file="C:\Solr\NorthWind\conf\schema.xml"/>
<add action="web" url="http://localhost:8983/solr/NorthWind/dataimport?command=full-import&clean=true&commit=true&optimize=true"/>
</actions>
</add>
</templates>
<search-types>
<add name="default" />
<add name="facet" analyzer="lowercase" store="true" index="true" />
<add name="standard" analyzer="standard_lowercase" store="false" index="true"/>
</search-types>
The Razor templates "solr-data-handler.cshtml" and "solr-schema.cshtml" render the SOLR configuration files. This is possible because the template manager passes the entire NorthWind configuration (that we've built with XML above) into the templates.
To control how the fields are handled in SOLR, "search types" are applied to each <field/>
or <calculated-field/>
. By default, each field is indexed and stored in the search index according to it's data type. To assign more complex text analysis, you can set the search-type attribute to facet, or standard, or any others you define. To exclude a field from search, set search-type to "none."
Running Tfl now produces:
tfl NorthWind.xml
...
00:48:25 | Info | NorthWind | Products...... | Processed 2155 rows. Updated Order Details with Products.
00:48:28 | Info | NorthWind | Categories.... | process-cube ran successfully.
00:48:28 | Info | NorthWind | Categories.... | Copied solr-data-handler template output to C:\Solr\NorthWind\conf\data-config.xml.
00:48:29 | Info | NorthWind | Categories.... | Copied solr-schema template output to C:\Solr\NorthWind\conf\schema.xml.
00:48:29 | Info | NorthWind | Categories.... | Made web request to http:
00:48:29 | Info | NorthWind | Categories.... | Process completed in 00:00:04.8287386.
In this example, the template action "web" triggers SOLR to clean and re-import the index. In a production environment, you'd want to reload the schema when it changes, and make use of full and delta imports appropriately. If all goes well, you see something like this in the SOLR admin:
Now, if you schedule Transformalize to run every couple minutes, you have near real-time OLAP and search engine services on top of your OLTP data. An OLAP cube supports more performant and complex reporting requirements, and a search engine allows for lightning fast and fuzzy searches for specific records. If your users want to see the data in different ways, and they will, all you have to do is add transforms and/or new calculated fields and re-initialize your output.
When Transformalize reads your production databases, it attempts to do so introducing as little contention as possible. Using version fields, it can keep a star-schema copy of very large databases up to date very quickly.
Summary
The NorthWind data is fairly clean. In reality, you'll face more challenging data sources.
Transformalize uses several other open source projects including
- Rhino ETL
- Razor Engine
- Jint
- Ninject
- NLog
- fastJSON
- Dapper-dot-net
- File Helpers
- Excel Data Reader
- Enterprise Library 6 Validation Block
- Elasticsearch.NET
- SolrNet
Where possible, I've included source code from these projects rather than the Nuget packages. The upside of doing this is I get to step into and learn from other people's code. The downside is it's a bit harder to keep these libraries up to date.