Introduction
Imagine that we are developing an application that needs to support multiple database systems.
What options do we have?
Basically, I would say that we have three options:
- Stick to ANSI SQL and make sure that all SQL runs on all supported databases.
- Keep separate versions of each statement for each target database.
- Use an OR mapper that abstracts the SQL away from your code.
Let’s quickly review each option.
ANSI SQL
The ANSI standard is a standard that aims to define a common way of expressing SQL syntax.
What this means is that we should be able to write one query and live happily ever after regardless of the target DBMS.
Sweet, but is this really the case?
Let’s put this to the test and write a query that returns the string
length of some column.
We start of using Microsoft SQL Server:
SELECT LEN(ProductName) FROM Products
What if we take the same query and execute it against MySql?
"FUNCTION northwind.LEN does not exist"
Now that is not exactly the result we were hoping for. What could be wrong?
Obviously MySql does not understand the LEN
function, but why?
If we investigate further, we find out that the ANSI SQL version of LEN
would be CHAR_LENGTH
.
Okay, let's rewrite the query and hit MySql again.
SELECT CHAR_LENGTH(ProductName) FROM Products
Success!!!
We now have a query that is compliant with the ANSI standard.
So this should work if we switch back to Microsoft SQL Server again too, right?
Let’s find out.
"'CHAR_LENGTH' is not a recognized built-in function name."
No good, SQL Server does not support this function and we are pretty much back maintaining two versions of this query, one for Microsoft SQL Server and one for MySql.
And we see this all over the place. Most database systems implement the functionality defined by the standard, but under different names.
You would be surprised to see how much they differ although they might claim to be ANSI compliant to some extent.
Keep Separate Versions of the Query for Each DBMS
I don't think so. :)
Use an OR Mapper
Most object relation mappers, such as NHibernate and Entity framework support a wide range of database systems.
The abstraction is built into the mapper itself so that we don’t need to think about the semantics of each database.
Back in the days, each OR mapper provides their own query language that pretty much tied your code to that specific mapper, such as NHibernate.
With Linq (Language Integrated Query Language), we are able to abstract ourselves from the underlying provider and work with data through a common interface (Linq).
The bits and pieces are starting to fall into place.
The only problem is that this abstraction is built into the Linq provider itself and cannot be used in other scenarios.
And what if we don’t wish to use an OR mapper and we still want to be DBMS independent?
Introducing DbExpressions
In an effort to solve the problems described above, the idea was to abstract the implementation details for each DBMS into an abstract syntax tree and provide a fluent interface to help build this tree.
The following requirements were set for this library:
Support all four query types (SELECT
, UPDATE
, DELETE
AND INSERT
).
Support most common basic string
, numeric
and datetime
functions.
Support most common language constructs, such as JOIN
, GROUP BY
, ORDER BY
and so on
Support a provider model that makes it easy to implement new providers for new database systems.
Support Microsoft SQL Server and MySql out of the box.
How Does This Work?
If you are familiar with the way Expressions works in .NET, you will soon get the idea behind this library.
While for instance, an Expression<T>
delegate compiles into IL, a DbExpression
“compiles“ into the vendor specific SQL dialect.
The query is represented by a set of nodes in a tree that when visited (Visitor pattern), translates each DbExpression
to the expected syntax.
There are a lot of different expression types and they are available using the static DbExpressionFactory
class.
This means that we can construct an expression tree manually like this:
var dbExpressionFactory = new DbExpressionFactory();
var columnExpression = dbExpressionFactory.Column("SomeColumn");
var constantExpression = dbExpressionFactory.Constant(1);
var binaryExpression = dbExpressionFactory.MakeBinary
(DbBinaryExpressionType.Equal, columnExpression, constantExpression);
Console.WriteLine(binaryExpression);
The output from this DbExpression
when translated(Microsoft SQL) would be:
"([SomeColumn] = @p0)"
While constructing the expression tree manually can be useful in some scenarios, it can be pretty cumbersome if all we want to do is to write a query and execute it.
The library provides a fluent interface that enables us to write a query like this:
selectQuery.Select(f => f.Length(f.Column("ProductID"))) _
.From(f => f.Table("Products"));
The Select
method has two overloads defined like this:
DbSelectQuery Select(this DbSelectQuery dbSelectQuery, _
params Func<dbexpressionfactory, dbexpression>[] expressionSelector)
DbSelectQuery Select(this DbSelectQuery dbSelectQuery, DbExpression expression)
An instance of the DbExpressionFactory
is passed into the Select
method so that it makes it easy to construct the query.
As we can see, there is also an overload that just takes a DbExpression
instance and that is very powerful if we are using a combination of manually constructing pieces of the query and the fluent API.
var factory = new DbExpressionFactory();
var projection = factory.List(new[] {factory.Column("SomeColumn"), _
factory.Column("SomeOtherColumn")});
var selectQuery = new DbSelectQuery();
selectQuery.Select(projection).From(f => f.Table("SomeTable"));
Now, that looks a lot better than string
concatenation and it is DBMS independent.
Paging
Paging is the ability to select a "page" of data that means that out of 1000 possible rows, we may want to return row 10 through 20.
The SQL syntax for this can vary a lot between the different SQL dialects.
MySql solves this very elegantly with the LIMIT
and OFFSET
keywords, while in SQL Server this becomes a little more complicated.
The DbExpressions
library provides Skip
and Take
methods similar to the ones we find in Linq.
selectQuery.Select(f => f.Column("ProductID"))
.From(f => f.Table("Products")).Skip(10).Take(10);
This means that we now can write queries with paging without worrying about the actual syntax needed.
Translating the Expression into SQL
Translating the query into SQL is just a simple matter of visiting every node in the expression tree and returning the appropriate syntax according to the target DBMS.
Some of you may recall the ExpressionVisitor
class first discussed by Matt Warren and later made available to us in the MSDN documentation.
The ExpressionVisitor
class takes care of visiting every node and rewrites the tree if something has changed.
This class is somewhat essential when it comes to translating a Expression<T>
delegate into another representation, such as SQL.
The DbExpressions
library has a similar class that is used as a base class whenever we need to inspect (or rewrite) the expression tree.
Derived from the DbExpressionVisitor
class, we find the abstract DbQueryTranslator
that serves as the base class for provider specific query translators.
If we take a look at the MySqlQueryTranslator
class, we find the following code that translates a DbColumnExpression
into something that MySql can understand.
protected override DbExpression VisitColumnExpression(DbColumnExpression columnExpression)
{
var sqlFragment = QuoteIdentifier(columnExpression.ColumnName);
return ExpressionFactory.Sql(sqlFragment);
}
If you want more information about how to implement custom query translators, take a look here.
Operator Overloading
The DbExpressons
library uses operator overloading so that we can write our queries in a more natural way.
So instead of writing this:
selectQuery.Select(f => f.Column("ProductID"))
.From(f => f.Table("Order Details")).Where_
(f => f.GreaterThan(f.Column("ProductID"),f.Constant(10)));
we can write:
selectQuery.Select(f => f.Column("ProductID"))
.From(f => f.Table("Order Details")).Where(e => e.Column("ProductID") > 10);
Parameterized Queries
We all know the danger of SQL injection and we need to make sure that every constant such as a string literal or number is represented as a parameter in the query.
The DbExpressions
library takes care of this by translating DbConstantExpression
instances into data parameters.
This means that if we write:
selectQuery.Select(f => f.Column("ProductID"))
.From(f => f.Table("Order Details")).Where(e => e.Column("ProductID") > 10);
We end up with the following output (Microsoft SQL):
SELECT [ProductID] FROM [Order Details] WHERE ([ProductID] > @p0)
Query Execution
So we have our query and we need to get it executed.
The first thing we need to do is to “compile” the syntax tree into the vendor specific SQL statement.
While the Expression<T>
delegate has the Compile
method, the counterpart for DbExpression
is called Translate
.
And what does it translate into? It translates into an IDbCommand
instance with all the parameters already populated.
var command = selectQuery.Translate();
using (IDbConnection connection = CreateConnection() )
{
var reader = command.ExecuteReader();
while(reader.Read())
{
}
}
Configuration
The configuration for the DbExpressions
library is very simple and consists of the ability to add new providers and specify the default provider.
="1.0"="utf-8"
<configuration>
<configSections>
<section name="dbExpressions"
type ="DbExpressions.Configuration.DbExpressionSettings, DbExpressions"/>
</configSections>
<dbExpressions defaultProvider="System.Data.SqlClient">
<queryTranslators>
<queryTranslator providerName="System.Data.SqlClient"
type ="SqlQueryTranslator, DbExpressions"></queryTranslator>
</queryTranslators>
</dbExpressions>
</configuration>
Note: This is just a sample configuration and is not necessary since support for MySql and Microsoft SQL Server is already bundled with the DbExpressions
library.
We can also add new providers or set the default provider using methods on the DbQueryTranslatorFactory
class.
DbQueryTranslatorFactory.RegisterQueryTranslator
("System.Data.SqlClient", typeof (SqlQueryTranslator));
DbQueryTranslatorFactory.SetDefaultProvider("System.Data.SqlClient");
Analyzing Expression Trees
In addition to translating the expression tree into SQL, we can also inspect, analyze and even alter the tree before it gets translated into SQL.
You might wonder how and when such needs may arise, but I will show you an example.
If you should want to rewrite parts of the expression tree, the DbExpressionVisitor
is the "man" for the job.
It takes care of visiting every node and rewrite the (sub)tree if anything changes.
This would involve inheriting from the DbExpressionVisitor
class and override the appropriate Visitxxx
method.
To make this a little easier, the library supports search and replace for DbExpressions
.
Given the following query:
dbSelectQuery.Select(f => f.Column("SomeColumn")).From(f => f.Table("SomeTable"));
Although this is a very simple example, we could find all column references by using the Find
method.
var result = expression.Find<DbColumnExpression>(c => c.ColumnName == "SomeColumn");
The Find
method searches for all DbColumnExpression
instances with the given predicate and returns the results.
Another feature is the ability to replace one expression with another.
When implementing the MySqlQueryTranslator
, I actually put this to good use.
As one of the supported mathematical functions, we find the Square
method that really just returns the square of any given number.
SQL Server already supports that function so that was just a matter of generating the syntax for calling the Square
method.
In MySql on the other hand, there is no function that does this. Well, at least not directly.
If we take the square of the number 3 that would be 9, right?
I figured it would also be the same if we took 3 and raised it to the power of 2.
Luckily we have a function in MySql that does exactly that (Power
).
So what I basically needed to do was replace any references to the Square
method with a call to the Power
function instead.
Using the Replace
method, this is like walking in the park.
The following code is extracted from the MySqlQueryTranslator
:
private DbExpression ReplaceSquareWithPower(DbExpression dbExpression)
{
return dbExpression.Replace<DbMathematicalFunctionExpression>(
dme => dme.MathematicalFunctionExpressionType ==
DbMathematicalFunctionExpressionType.Square,
te => ExpressionFactory.Power(te.Arguments[0], ExpressionFactory.Constant(2)));
}
What we are doing here is we are searching the tree, no matter how complex, for DbMathematicalFunctionExpression
instances that represent the Square
method.
For every occurrence, we replace that function call with a call to the Power
method instead.
MySql
In order to use this library with MySql, you need to install the MySql Connector for .NET. This is the native .NET data provider for MySql.
The library has been tested using version 6.3.5
Final Words
The DbExpressions
library is hosted on CodePlex and updates and fixes will be made available there.
If you come across any problems or have questions, please feel free to post in the discussion list or in the issue tracker.
If you would like to contribute to the project, that would be greatly appreciated and in particular someone with Oracle knowledge so that we can get support for that too.