Using Information from the .NET DataProvider

Alaric Dailey

4.80/5 (7 votes)

14 Jan 2010CPOL7 min read

37.8K

Using information from the .NET DataProvider to improve dynamic SQL and allow the use of multiple databases

Introduction

The .NET data providers offer a lot of information about the database, so I want to show some neat tricks and share a little code.

Background

I see a lot of code in multiple languages, in multiple projects, that use dynamic SQL, and a few problems routinely surface:

Not using parameters, which provides better performance, and prevents SQL injection
Not properly wrapping column and table name identifiers
Hard-coding the database server. This can show up as hard-coding the provider, or relying on special features unique to that database server.

If you are lucky enough to be the designer of your database, you may simply avoid using anything that looks like it might be a SQL keyword, and hope that database upgrades or moves to a new database server won't break your code. This is so often the case, and so people are frightened of obvious keywords like user, default, table, and column.

In some environments, you are stuck making your "best guess". So, if you are coding against SQL Server or MySQL, and you choose to wrap your SQL, you end up wrapping your identifiers like this:

SQL

SELECT [user], [default] from [table]

The obvious problem with this is that this statement will not work if your database moves to a different server, say moving to Oracle or PostgreSQL. The first thought is to look at standards. The SQL92 standard, now 17 years old, says that the double quoting identifiers should actually work for all compliant databases.

SQL

SELECT "user", "default" from "table"

Unfortunately, SQL Server doesn't support this unless you set the Quoted Identifier option (on the connection, database, or server). Microsoft Access can't support quotes at all, and who can tell about other databases.

Often, for programmers of packaged software, these problems come down to a few choices:

Hardcode the application for a single database server.
Limit the number of databases supported, usually less than 3.
Use a tool like (N)Hibernate which hides the SQL behind its own language, and thus require support of that tool for your database. (Hope you aren't using an old, brand new, or obscure database, such as TimesTen. You can find the list of supported databases by clicking here.)
Do everything in Stored Procedures, which still limits the database options.
Put all the SQL in some external resource file that can be changed per database.

These options have their own benefits and drawbacks. However, the .NET data providers can make some of these choices a little easier. So, just like the choice to use XHTML rather than HTML to make your web pages available to more browsers, if you wish to make your code open to more databases, you have to make some choices.

In terms of moving from one database to another, the most flexible is carefully coded dynamic SQL. This is not to say that there aren't problems with this approach, not the least of which is certain SQL changes require a code redeployment. It does give us the most options, if it is done carefully. I will be looking into some ways to minimize some of the problems with this approach in future articles. So, the rules for writing are fairly straightforward:

Write your SQL as close to SQL92 standards, as possible.
Avoid server specific features.
When designing databases, tables, etc., assume the server is case insensitive, i.e., name things uniquely.
When writing your SQL, assume the DB is case sensitive; this in conjunction with the previous rule avoiding weird issues.
Don't hardcode to a single provider.
Wrap all database, schema, table, and column names properly.

Many of these simply require discipline, but the last two can be hard. So, let's handle the first one. First, a hardcoded example:

string connectionString = ConfigurationManager.AppSettings.Get("DSN");
using(var conn = new System.Data.SqlClient.SqlConnection())
{
    conn.ConnectionString = connectionString;
    conn.Open();
    using(var cmd = new System.Data.SqlClient.SqlCommand())
    {
        cmd.CommandText = "delete from table1";
        cmd.CommandType = System.Data.CommandType.Text;
        cmd.ExecuteNonQuery();
    }
}

The first issue here is that we are reading the connection string from appSettings rather than the connectionStrings area. The biggest advantage of putting our connection string in the connectionStrings area is that we now have a providerName. By grabbing that provider name, we can create objects of the proper type.

ConnectionStringSettings css = ConfigurationManager.ConnectionStrings["DSN"];
string connectionString = css.ConnectionString;
string providerName = css.ProviderName;
var factory =  System.Data.Common.DbProviderFactories.GetFactory(providerName);
using(var conn = factory.CreateConnection())
{
    conn.ConnectionString = connectionString;
    conn.Open();
    // use the connection to create the command, that way its 
    //already associated properly, though could have used
    //factory.CreateCommand()
    using(var cmd = conn.CreateCommand())
    {
        cmd.CommandText = "delete from table1";
        cmd.CommandType = System.Data.CommandType.Text;
        cmd.ExecuteNonQuery();
    }
}

By changing the code in this manner, we have eliminated probably the single largest problem with porting most code from one database to another: the hard-coding of the database provider.

Next, getting the wrapping characters properly for the data source. To do this, we need the CommandBuilder object or the DataSourceInformation table. Since the DataSourceInformation table contains a regex rather than simple characters, it's better for validation than wrapping. So, let's get the CommandBuilder, and wrap some SQL properly using it.

DbCommandBuilder commandBuilder = factory.CreateCommandBuilder();
string prefix = commandBuilder.QuotePrefix;
string suffix = commandBuilder.QuoteSuffix;

StringBuilder stringBuilder = new StringBuilder();

stringBuilder.Append("SELECT ");
stringBuilder.AppendFormat("{0}{1}{2}", prefix, "ID", suffix);
stringBuilder.AppendLine(",");
stringBuilder.AppendFormat("{0}{1}{2}", prefix, "user", suffix);
stringBuilder.AppendLine(",");
stringBuilder.AppendFormat("{0}{1}{2}", prefix, "password", suffix);
stringBuilder.AppendLine(",");
stringBuilder.AppendFormat("{0}{1}{2}", prefix, "default", suffix);
stringBuilder.AppendLine("");
stringBuilder.Append("FROM ");
stringBuilder.AppendFormat("{0}{1}{2}", prefix, "table", suffix);

Debug.WriteLine(stringBuilder.ToString());

The output of this when using the System.Data.SqlClient provider to connect to SQL Server is:

SQL

SELECT [ID],
[user],
[password],
[default]
FROM [table]

When using Oracle.DataAccess.Client or Npgsql, it generates:

SQL

SELECT "ID",
"user",
"password",
"default"
FROM "table"

The major issue with this is when you are using the OLEDB or ODBC providers. Because the DbCommandBuilder doesn't know which database you are connecting to, you have to query the provider. Coincidentally, the best place to get that information is from the same place we need to get the parameter capabilities from. So, let's go get that data.

The System.Data.Common.DbMetaDataCollectionNames class provides a number of constant strings that we can pass to get DataTables that provide information.

using (var conn = factory.CreateConnection())
{
    conn.ConnectionString = connectionString;
    conn.Open();
    DataTable dt = conn.GetSchema(
      System.Data.Common.DbMetaDataCollectionNames.DataSourceInformation);
    return dt;
}

Personally, I hate writing parsing code, and since I am planning on using this data, I decided a class would be much more helpful. I pass in the DataTable we just retrieved, and the following class fills itself in, making the data easier to get to.

internal class DataSourceInformation
{
    private static readonly Type _Type = typeof(DataSourceInformation);
    private static readonly Type _IdentifierCaseType = 
       Enum.GetUnderlyingType(typeof(IdentifierCase));
    private static readonly Type _GroupByBehaviorType = 
       Enum.GetUnderlyingType(typeof(GroupByBehavior));

    private static readonly Type _SupportedJoinOperatorsType =
        Enum.GetUnderlyingType(typeof(SupportedJoinOperators));

    //These are filled within the "switch/case"
    //statement, either directly, or thru reflection.
    //since Resharper can't tell they are being filled 
    //thru reflection, it suggests to convert them to
    //constants. DO NOT do that!!!!!

    // ReSharper disable ConvertToConstant.Local
    private readonly string _compositeIdentifierSeparatorPattern = string.Empty;
    private readonly string _dataSourceProductName = string.Empty;
    private readonly string _dataSourceProductVersion = string.Empty;
    private readonly string _dataSourceProductVersionNormalized = string.Empty;
    private readonly GroupByBehavior _groupByBehavior;
    private readonly string _identifierPattern = string.Empty;
    private readonly IdentifierCase _identifierCase;
    private readonly bool _orderByColumnsInSelect = false;
    private readonly string _parameterMarkerFormat = string.Empty;
    private readonly string _parameterMarkerPattern = string.Empty;
    private readonly Int32 _parameterNameMaxLength = 0;
    private readonly string _parameterNamePattern = string.Empty;
    private readonly string _quotedIdentifierPattern = string.Empty;
    private readonly Regex _quotedIdentifierCase;
    private readonly string _statementSeparatorPattern = string.Empty;
    private readonly Regex _stringLiteralPattern;
    private readonly SupportedJoinOperators _supportedJoinOperators;
    // ReSharper restore ConvertToConstant.Local

    public DataSourceInformation(DataTable dt)
    {
        //DataTable dt = Connection.GetSchema(
        //   DbMetaDataCollectionNames.DataSourceInformation);
        foreach (DataRow r in dt.Rows)
        {
            foreach (DataColumn c in dt.Columns)
            {
                string s = c.ColumnName;
                object o = r[c.ColumnName];
                //just for safety
                if (o == DBNull.Value)
                {
                    o = null;
                }
                if (!string.IsNullOrEmpty(s) && o != null)
                {
                    switch (s)
                    {
                        case "QuotedIdentifierCase":
                                _quotedIdentifierCase = new Regex(o.ToString());
                            break;
                        case "StringLiteralPattern":
                            _stringLiteralPattern = new Regex(o.ToString());
                            break;
                        case "GroupByBehavior":
                            o = Convert.ChangeType(o, _GroupByBehaviorType);
                            _groupByBehavior = (GroupByBehavior)o;
                            break;
                        case "IdentifierCase":
                            o = Convert.ChangeType(o, _IdentifierCaseType);
                            _identifierCase = (IdentifierCase)o;
                            break;
                        case "SupportedJoinOperators":
                            o = Convert.ChangeType(o, _SupportedJoinOperatorsType);
                            _supportedJoinOperators = (SupportedJoinOperators)o;
                            // (o as SupportedJoinOperators?) ??
                            //    SupportedJoinOperators.None;
                            break;
                        default:
                            FieldInfo fi = _Type.GetField("_" + s, 
                              BindingFlags.IgnoreCase | BindingFlags.NonPublic | 
                              BindingFlags.Instance);
                            if (fi != null)
                            {
                                fi.SetValue(this, o);
                            }
                            break;
                    }
                }
            }
            //there should only ever be a single row.
            break;
        }
    }

    public string CompositeIdentifierSeparatorPattern
    {
        get { return _compositeIdentifierSeparatorPattern; }
    }

    public string DataSourceProductName
    {
        get { return _dataSourceProductName; }
    }

    public string DataSourceProductVersion
    {
        get { return _dataSourceProductVersion; }
    }

    public string DataSourceProductVersionNormalized
    {
        get { return _dataSourceProductVersionNormalized; }
    }

    public GroupByBehavior GroupByBehavior
    {
        get { return _groupByBehavior; }
    }

    public string IdentifierPattern
    {
        get { return _identifierPattern; }
    }

    public IdentifierCase IdentifierCase
    {
        get { return _identifierCase; }
    }

    public bool OrderByColumnsInSelect
    {
        get { return _orderByColumnsInSelect; }
    }

    public string ParameterMarkerFormat
    {
        get { return _parameterMarkerFormat; }
    }

    public string ParameterMarkerPattern
    {
        get { return _parameterMarkerPattern; }
    }

    public int ParameterNameMaxLength
    {
        get { return _parameterNameMaxLength; }
    }

    public string ParameterNamePattern
    {
        get { return _parameterNamePattern; }
    }

    public string QuotedIdentifierPattern
    {
        get { return _quotedIdentifierPattern; }
    }

    public Regex QuotedIdentifierCase
    {
        get { return _quotedIdentifierCase; }
    }

    public string StatementSeparatorPattern
    {
        get { return _statementSeparatorPattern; }
    }

    public Regex StringLiteralPattern
    {
        get { return _stringLiteralPattern; }
    }

    public SupportedJoinOperators SupportedJoinOperators
    {
        get { return _supportedJoinOperators; }
    }
}

Without going into too much of an explanation, this class takes the DataTable and fills itself via Reflection. There are a few exceptions, which are shown in the switch statement. If we fill this class and keep it around, we now have a good deal of help to build SQL for each individual database. This data includes, but is not limited to:

the characters to wrap identifiers with
if the database supports named parameters
what the character is that denotes named parameters
the maximum length of the name of a named parameter

Named parameters are wonderful. They enable you to write SQL like this:

SQL

DELETE FROM table1 WHERE ID=@ID and column2=@column2

which is much easier to deal with than:

SQL

DELETE FROM table1 WHERE ID=? and column2=?

For many people, this second one isn't clear as to what is going on at all. Named parameters obviously make things clearer, and you can reuse the values.

SQL

SELECT a.* FROM table1 a
     INNER JOIN table2 b
        ON a.ID = b.A_ID
     WHERE a.NAME = @name
     AND b.SOMEOTHERCOLUMN = @name

There are a few problems here, besides the fact that I didn't wrap any of my table or column names. By using named parameters, I have drastically reduced the number of databases my application can support. There are only three that I know of: MS SQL Server, Oracle, and PostgreSQL. I don't know if MySQL can; it may in the future, if it doesn't now. This means using named parameters is only a convenience. It also means we don't get to take advantage of being able to reuse parameters. Another thing to be aware of is that the lengths of names vary radically; SQL Server supports names of up to 128 characters, while Oracle only supports names of up to 15 characters. I will come back to this point in my next article.

Using the Code

At this point, we have all the bits in place, so let's look at how this code can all be put together into a a block that generates a statement that is likely to be able to be used in any database provider for .NET. In another article, I will talk about making this cleaner, and get into fixing the shortcoming with OLEDB and ODBC providers.

ConnectionStringSettings css = 
    ConfigurationManager.ConnectionStrings["DSN"];
string connectionString = css.ConnectionString;
string providerName = css.ProviderName;
DataSourceInformation dsi = null;
DbProviderFactory factory = DbProviderFactories.GetFactory(providerName);

using (var conn = factory.CreateConnection())
{
    conn.ConnectionString = connectionString;
    conn.Open();
    dsi = new DataSourceInformation(conn.GetSchema(
               DbMetaDataCollectionNames.DataSourceInformation));
}

DbCommandBuilder commandBuilder = factory.CreateCommandBuilder();
string prefix = commandBuilder.QuotePrefix;
string suffix = commandBuilder.QuoteSuffix;

var parameter = factory.CreateParameter();
parameter.DbType = DbType.Int32;
parameter.Value = 10;
parameter.ParameterName = "ID";

StringBuilder stringBuilder = new StringBuilder();
stringBuilder.Append("SELECT ");
stringBuilder.AppendFormat("{0}{1}{2}", prefix, 
                           "ID", suffix);
stringBuilder.AppendLine(",");
stringBuilder.AppendFormat("{0}{1}{2}", prefix, 
                           "user", suffix);
stringBuilder.AppendLine(",");
stringBuilder.AppendFormat("{0}{1}{2}", prefix, 
                           "password", suffix);
stringBuilder.AppendLine(",");
stringBuilder.AppendFormat("{0}{1}{2}", prefix, 
                           "default", suffix);
stringBuilder.AppendLine("");
stringBuilder.Append("FROM ");
stringBuilder.AppendFormat("{0}{1}{2}", prefix, 
                           "table", suffix);
stringBuilder.AppendLine();
stringBuilder.Append("WHERE ID=");
if (dsi.ParameterNameMaxLength < 1)
{
    stringBuilder.Append("?");
}
else
{
    stringBuilder.AppendFormat("{0}{1}", 
      dsi.ParameterMarkerPattern[0], parameter.ParameterName);
}

Debug.WriteLine(stringBuilder.ToString());
using (DbConnection conn = factory.CreateConnection())
{
    conn.ConnectionString = connectionString;
    conn.Open();

    using (DbCommand cmd = conn.CreateCommand())
    {
        cmd.CommandText = stringBuilder.ToString();
        cmd.CommandType = CommandType.Text;
        cmd.Parameters.Add(parameter);
        using (DbDataReader dr = cmd.ExecuteReader())
        {
            while (dr.Read())
            {
                //do something with the data
            }
        }
    }
}

Notice I am using the default DbType enumeration to specify the data type of the parameter; this is very important if you want to make your code more portable.

Points of Interest

This code now does exactly what it set out to do. It generates SQL in a way that:

Can be used with all providers, even when there are keywords being used as table and column names.
Uses parameters, making the execution of the SQL both safer and faster.

If you look into the other collections that are available, you can even find a table that maps the databases types to System.Type and DbType.

This type of coding prevents your code from being tied to a specific version of a driver, which is the side effect of directly referencing a provider's DLL.

No amount of careful coding can completely handle every database; there are too many differences. However, this kind of careful work can open your application to be used with a greater number of databases, and thus a greater number of clients.

History

14^th January, 2010: Initial version

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)