Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / database / SQL-Server / SQL-Server-2008

DbExpressions - A Step Towards Independency

4.24/5 (12 votes)
2 Feb 2011CPOL9 min read 76.7K   317  
An abstract syntax tree implementation for SQL

Introduction

Imagine that we are developing an application that needs to support multiple database systems.

What options do we have?

Basically, I would say that we have three options:

  • Stick to ANSI SQL and make sure that all SQL runs on all supported databases.
  • Keep separate versions of each statement for each target database.
  • Use an OR mapper that abstracts the SQL away from your code.

Let’s quickly review each option.

ANSI SQL

The ANSI standard is a standard that aims to define a common way of expressing SQL syntax.
What this means is that we should be able to write one query and live happily ever after regardless of the target DBMS.

Sweet, but is this really the case?

Let’s put this to the test and write a query that returns the string length of some column.

We start of using Microsoft SQL Server:

SQL
SELECT LEN(ProductName) FROM Products    

What if we take the same query and execute it against MySql?

"FUNCTION northwind.LEN does not exist" 

Now that is not exactly the result we were hoping for. What could be wrong?

Obviously MySql does not understand the LEN function, but why?

If we investigate further, we find out that the ANSI SQL version of LEN would be CHAR_LENGTH.

Okay, let's rewrite the query and hit MySql again.

SQL
SELECT CHAR_LENGTH(ProductName) FROM Products

Success!!!

We now have a query that is compliant with the ANSI standard.

So this should work if we switch back to Microsoft SQL Server again too, right?

Let’s find out.

"'CHAR_LENGTH' is not a recognized built-in function name."

No good, SQL Server does not support this function and we are pretty much back maintaining two versions of this query, one for Microsoft SQL Server and one for MySql.

And we see this all over the place. Most database systems implement the functionality defined by the standard, but under different names.

You would be surprised to see how much they differ although they might claim to be ANSI compliant to some extent.

Keep Separate Versions of the Query for Each DBMS

I don't think so. :)

Use an OR Mapper

Most object relation mappers, such as NHibernate and Entity framework support a wide range of database systems.

The abstraction is built into the mapper itself so that we don’t need to think about the semantics of each database.

Back in the days, each OR mapper provides their own query language that pretty much tied your code to that specific mapper, such as NHibernate.

With Linq (Language Integrated Query Language), we are able to abstract ourselves from the underlying provider and work with data through a common interface (Linq).

The bits and pieces are starting to fall into place.

The only problem is that this abstraction is built into the Linq provider itself and cannot be used in other scenarios.

And what if we don’t wish to use an OR mapper and we still want to be DBMS independent?

Introducing DbExpressions

In an effort to solve the problems described above, the idea was to abstract the implementation details for each DBMS into an abstract syntax tree and provide a fluent interface to help build this tree.

The following requirements were set for this library:

Support all four query types (SELECT, UPDATE, DELETE AND INSERT).

Support most common basic string, numeric and datetime functions.

Support most common language constructs, such as JOIN, GROUP BY, ORDER BY and so on

Support a provider model that makes it easy to implement new providers for new database systems.

Support Microsoft SQL Server and MySql out of the box.

How Does This Work?

If you are familiar with the way Expressions works in .NET, you will soon get the idea behind this library.

While for instance, an Expression<T> delegate compiles into IL, a DbExpression “compiles“ into the vendor specific SQL dialect.

The query is represented by a set of nodes in a tree that when visited (Visitor pattern), translates each DbExpression to the expected syntax.

There are a lot of different expression types and they are available using the static DbExpressionFactory class.

This means that we can construct an expression tree manually like this:

C#
var dbExpressionFactory = new DbExpressionFactory(); 
var columnExpression = dbExpressionFactory.Column("SomeColumn"); 
var constantExpression = dbExpressionFactory.Constant(1); 
var binaryExpression = dbExpressionFactory.MakeBinary
    (DbBinaryExpressionType.Equal, columnExpression, constantExpression); 
Console.WriteLine(binaryExpression);

The output from this DbExpression when translated(Microsoft SQL) would be:

"([SomeColumn] = @p0)"

While constructing the expression tree manually can be useful in some scenarios, it can be pretty cumbersome if all we want to do is to write a query and execute it.

The library provides a fluent interface that enables us to write a query like this:

SQL
selectQuery.Select(f => f.Length(f.Column("ProductID"))) _
    .From(f => f.Table("Products")); 

The Select method has two overloads defined like this:

SQL
DbSelectQuery Select(this DbSelectQuery dbSelectQuery, _
params Func<dbexpressionfactory, dbexpression>[] expressionSelector)

DbSelectQuery Select(this DbSelectQuery dbSelectQuery, DbExpression expression)

An instance of the DbExpressionFactory is passed into the Select method so that it makes it easy to construct the query.

As we can see, there is also an overload that just takes a DbExpression instance and that is very powerful if we are using a combination of manually constructing pieces of the query and the fluent API.

var factory = new DbExpressionFactory(); 
var projection = factory.List(new[] {factory.Column("SomeColumn"), _
        factory.Column("SomeOtherColumn")}); 
var selectQuery = new DbSelectQuery(); 
selectQuery.Select(projection).From(f => f.Table("SomeTable"));

Now, that looks a lot better than string concatenation and it is DBMS independent.

Paging

Paging is the ability to select a "page" of data that means that out of 1000 possible rows, we may want to return row 10 through 20.

The SQL syntax for this can vary a lot between the different SQL dialects.

MySql solves this very elegantly with the LIMIT and OFFSET keywords, while in SQL Server this becomes a little more complicated.

The DbExpressions library provides Skip and Take methods similar to the ones we find in Linq.

SQL
selectQuery.Select(f => f.Column("ProductID")) 
    .From(f => f.Table("Products")).Skip(10).Take(10);

This means that we now can write queries with paging without worrying about the actual syntax needed.

Translating the Expression into SQL

Translating the query into SQL is just a simple matter of visiting every node in the expression tree and returning the appropriate syntax according to the target DBMS.

Some of you may recall the ExpressionVisitor class first discussed by Matt Warren and later made available to us in the MSDN documentation.

The ExpressionVisitor class takes care of visiting every node and rewrites the tree if something has changed.

This class is somewhat essential when it comes to translating a Expression<T> delegate into another representation, such as SQL.

The DbExpressions library has a similar class that is used as a base class whenever we need to inspect (or rewrite) the expression tree.

Derived from the DbExpressionVisitor class, we find the abstract DbQueryTranslator that serves as the base class for provider specific query translators.

If we take a look at the MySqlQueryTranslator class, we find the following code that translates a DbColumnExpression into something that MySql can understand.

C#
protected override DbExpression VisitColumnExpression(DbColumnExpression columnExpression)
{ 
    var sqlFragment = QuoteIdentifier(columnExpression.ColumnName); 
    return ExpressionFactory.Sql(sqlFragment); 
} 

If you want more information about how to implement custom query translators, take a look here.

Operator Overloading

The DbExpressons library uses operator overloading so that we can write our queries in a more natural way.

So instead of writing this:

SQL
selectQuery.Select(f => f.Column("ProductID")) 
    .From(f => f.Table("Order Details")).Where_
    (f => f.GreaterThan(f.Column("ProductID"),f.Constant(10))); 

we can write:

SQL
selectQuery.Select(f => f.Column("ProductID")) 
    .From(f => f.Table("Order Details")).Where(e => e.Column("ProductID") > 10);

Parameterized Queries

We all know the danger of SQL injection and we need to make sure that every constant such as a string literal or number is represented as a parameter in the query.

The DbExpressions library takes care of this by translating DbConstantExpression instances into data parameters.

This means that if we write:

SQL
selectQuery.Select(f => f.Column("ProductID")) 
    .From(f => f.Table("Order Details")).Where(e => e.Column("ProductID") > 10);

We end up with the following output (Microsoft SQL):

SQL
SELECT [ProductID] FROM [Order Details] WHERE ([ProductID] > @p0)

Query Execution

So we have our query and we need to get it executed.

The first thing we need to do is to “compile” the syntax tree into the vendor specific SQL statement.

While the Expression<T> delegate has the Compile method, the counterpart for DbExpression is called Translate.

And what does it translate into? It translates into an IDbCommand instance with all the parameters already populated.

C#
var command = selectQuery.Translate(); 
using (IDbConnection connection = CreateConnection() ) 
{ 
    var reader = command.ExecuteReader(); 
    while(reader.Read()) 
    { 
        //Do stuff 
    } 
}   

Configuration

The configuration for the DbExpressions library is very simple and consists of the ability to add new providers and specify the default provider.

XML
<?xml version="1.0" encoding="utf-8" ?> 
<configuration> 
    <configSections> 
        <section name="dbExpressions" 
	type ="DbExpressions.Configuration.DbExpressionSettings, DbExpressions"/> 
    </configSections> 
    <dbExpressions defaultProvider="System.Data.SqlClient"> 
        <queryTranslators> 
            <queryTranslator providerName="System.Data.SqlClient" 
		type ="SqlQueryTranslator, DbExpressions"></queryTranslator> 
        </queryTranslators> 
    </dbExpressions> 
</configuration> 

Note: This is just a sample configuration and is not necessary since support for MySql and Microsoft SQL Server is already bundled with the DbExpressions library.

We can also add new providers or set the default provider using methods on the DbQueryTranslatorFactory class.

C#
DbQueryTranslatorFactory.RegisterQueryTranslator
("System.Data.SqlClient", typeof (SqlQueryTranslator)); 

DbQueryTranslatorFactory.SetDefaultProvider("System.Data.SqlClient");

Analyzing Expression Trees

In addition to translating the expression tree into SQL, we can also inspect, analyze and even alter the tree before it gets translated into SQL.

You might wonder how and when such needs may arise, but I will show you an example.

If you should want to rewrite parts of the expression tree, the DbExpressionVisitor is the "man" for the job.

It takes care of visiting every node and rewrite the (sub)tree if anything changes.

This would involve inheriting from the DbExpressionVisitor class and override the appropriate Visitxxx method.

To make this a little easier, the library supports search and replace for DbExpressions.

Given the following query:

SQL
dbSelectQuery.Select(f => f.Column("SomeColumn")).From(f => f.Table("SomeTable")); 

Although this is a very simple example, we could find all column references by using the Find method.

C#
var result = expression.Find<DbColumnExpression>(c => c.ColumnName == "SomeColumn");  

The Find method searches for all DbColumnExpression instances with the given predicate and returns the results.

Another feature is the ability to replace one expression with another.

When implementing the MySqlQueryTranslator, I actually put this to good use.

As one of the supported mathematical functions, we find the Square method that really just returns the square of any given number.

SQL Server already supports that function so that was just a matter of generating the syntax for calling the Square method.

In MySql on the other hand, there is no function that does this. Well, at least not directly.

If we take the square of the number 3 that would be 9, right?

I figured it would also be the same if we took 3 and raised it to the power of 2.

Luckily we have a function in MySql that does exactly that (Power).

So what I basically needed to do was replace any references to the Square method with a call to the Power function instead.

Using the Replace method, this is like walking in the park.

The following code is extracted from the MySqlQueryTranslator:

C#
private DbExpression ReplaceSquareWithPower(DbExpression dbExpression)
{                                    
    return dbExpression.Replace<DbMathematicalFunctionExpression>(
        dme => dme.MathematicalFunctionExpressionType == 
		DbMathematicalFunctionExpressionType.Square,
        te => ExpressionFactory.Power(te.Arguments[0], ExpressionFactory.Constant(2))); 
} 

What we are doing here is we are searching the tree, no matter how complex, for DbMathematicalFunctionExpression instances that represent the Square method.

For every occurrence, we replace that function call with a call to the Power method instead.

MySql

In order to use this library with MySql, you need to install the MySql Connector for .NET. This is the native .NET data provider for MySql.

The library has been tested using version 6.3.5

Final Words

The DbExpressions library is hosted on CodePlex and updates and fixes will be made available there.

If you come across any problems or have questions, please feel free to post in the discussion list or in the issue tracker.

If you would like to contribute to the project, that would be greatly appreciated and in particular someone with Oracle knowledge so that we can get support for that too.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)