Introduction
The .NET data providers offer a lot of information about the database, so I want to show some neat tricks and share a little code.
Background
I see a lot of code in multiple languages, in multiple projects, that use dynamic SQL, and a few problems routinely surface:
- Not using parameters, which provides better performance, and prevents SQL injection
- Not properly wrapping column and table name identifiers
- Hard-coding the database server. This can show up as hard-coding the provider, or relying on special features unique to that database server.
If you are lucky enough to be the designer of your database, you may simply avoid using anything that looks like it might be a SQL keyword, and hope that database upgrades or moves to a new database server won't break your code. This is so often the case, and so people are frightened of obvious keywords like user
, default
, table
, and column
.
In some environments, you are stuck making your "best guess". So, if you are coding against SQL Server or MySQL, and you choose to wrap your SQL, you end up wrapping your identifiers like this:
SELECT [user], [default] from [table]
The obvious problem with this is that this statement will not work if your database moves to a different server, say moving to Oracle or PostgreSQL. The first thought is to look at standards. The SQL92 standard, now 17 years old, says that the double quoting identifiers should actually work for all compliant databases.
SELECT "user", "default" from "table"
Unfortunately, SQL Server doesn't support this unless you set the Quoted Identifier option (on the connection, database, or server). Microsoft Access can't support quotes at all, and who can tell about other databases.
Often, for programmers of packaged software, these problems come down to a few choices:
- Hardcode the application for a single database server.
- Limit the number of databases supported, usually less than 3.
- Use a tool like (N)Hibernate which hides the SQL behind its own language, and thus require support of that tool for your database. (Hope you aren't using an old, brand new, or obscure database, such as TimesTen. You can find the list of supported databases by clicking here.)
- Do everything in Stored Procedures, which still limits the database options.
- Put all the SQL in some external resource file that can be changed per database.
These options have their own benefits and drawbacks. However, the .NET data providers can make some of these choices a little easier. So, just like the choice to use XHTML rather than HTML to make your web pages available to more browsers, if you wish to make your code open to more databases, you have to make some choices.
In terms of moving from one database to another, the most flexible is carefully coded dynamic SQL. This is not to say that there aren't problems with this approach, not the least of which is certain SQL changes require a code redeployment. It does give us the most options, if it is done carefully. I will be looking into some ways to minimize some of the problems with this approach in future articles. So, the rules for writing are fairly straightforward:
- Write your SQL as close to SQL92 standards, as possible.
- Avoid server specific features.
- When designing databases, tables, etc., assume the server is case insensitive, i.e., name things uniquely.
- When writing your SQL, assume the DB is case sensitive; this in conjunction with the previous rule avoiding weird issues.
- Don't hardcode to a single provider.
- Wrap all database, schema, table, and column names properly.
Many of these simply require discipline, but the last two can be hard. So, let's handle the first one. First, a hardcoded example:
string connectionString = ConfigurationManager.AppSettings.Get("DSN");
using(var conn = new System.Data.SqlClient.SqlConnection())
{
conn.ConnectionString = connectionString;
conn.Open();
using(var cmd = new System.Data.SqlClient.SqlCommand())
{
cmd.CommandText = "delete from table1";
cmd.CommandType = System.Data.CommandType.Text;
cmd.ExecuteNonQuery();
}
}
The first issue here is that we are reading the connection string from appSettings
rather than the connectionStrings
area. The biggest advantage of putting our connection string in the connectionStrings
area is that we now have a providerName
. By grabbing that provider name, we can create objects of the proper type.
ConnectionStringSettings css = ConfigurationManager.ConnectionStrings["DSN"];
string connectionString = css.ConnectionString;
string providerName = css.ProviderName;
var factory = System.Data.Common.DbProviderFactories.GetFactory(providerName);
using(var conn = factory.CreateConnection())
{
conn.ConnectionString = connectionString;
conn.Open();
using(var cmd = conn.CreateCommand())
{
cmd.CommandText = "delete from table1";
cmd.CommandType = System.Data.CommandType.Text;
cmd.ExecuteNonQuery();
}
}
By changing the code in this manner, we have eliminated probably the single largest problem with porting most code from one database to another: the hard-coding of the database provider.
Next, getting the wrapping characters properly for the data source. To do this, we need the CommandBuilder
object or the DataSourceInformation
table. Since the DataSourceInformation
table contains a regex rather than simple characters, it's better for validation than wrapping. So, let's get the CommandBuilder
, and wrap some SQL properly using it.
DbCommandBuilder commandBuilder = factory.CreateCommandBuilder();
string prefix = commandBuilder.QuotePrefix;
string suffix = commandBuilder.QuoteSuffix;
StringBuilder stringBuilder = new StringBuilder();
stringBuilder.Append("SELECT ");
stringBuilder.AppendFormat("{0}{1}{2}", prefix, "ID", suffix);
stringBuilder.AppendLine(",");
stringBuilder.AppendFormat("{0}{1}{2}", prefix, "user", suffix);
stringBuilder.AppendLine(",");
stringBuilder.AppendFormat("{0}{1}{2}", prefix, "password", suffix);
stringBuilder.AppendLine(",");
stringBuilder.AppendFormat("{0}{1}{2}", prefix, "default", suffix);
stringBuilder.AppendLine("");
stringBuilder.Append("FROM ");
stringBuilder.AppendFormat("{0}{1}{2}", prefix, "table", suffix);
Debug.WriteLine(stringBuilder.ToString());
The output of this when using the System.Data.SqlClient
provider to connect to SQL Server is:
SELECT [ID],
[user],
[password],
[default]
FROM [table]
When using Oracle.DataAccess.Client
or Npgsql
, it generates:
SELECT "ID",
"user",
"password",
"default"
FROM "table"
The major issue with this is when you are using the OLEDB or ODBC providers. Because the DbCommandBuilder
doesn't know which database you are connecting to, you have to query the provider. Coincidentally, the best place to get that information is from the same place we need to get the parameter capabilities from. So, let's go get that data.
The System.Data.Common.DbMetaDataCollectionNames
class provides a number of constant strings that we can pass to get DataTable
s that provide information.
using (var conn = factory.CreateConnection())
{
conn.ConnectionString = connectionString;
conn.Open();
DataTable dt = conn.GetSchema(
System.Data.Common.DbMetaDataCollectionNames.DataSourceInformation);
return dt;
}
Personally, I hate writing parsing code, and since I am planning on using this data, I decided a class would be much more helpful. I pass in the DataTable
we just retrieved, and the following class fills itself in, making the data easier to get to.
internal class DataSourceInformation
{
private static readonly Type _Type = typeof(DataSourceInformation);
private static readonly Type _IdentifierCaseType =
Enum.GetUnderlyingType(typeof(IdentifierCase));
private static readonly Type _GroupByBehaviorType =
Enum.GetUnderlyingType(typeof(GroupByBehavior));
private static readonly Type _SupportedJoinOperatorsType =
Enum.GetUnderlyingType(typeof(SupportedJoinOperators));
private readonly string _compositeIdentifierSeparatorPattern = string.Empty;
private readonly string _dataSourceProductName = string.Empty;
private readonly string _dataSourceProductVersion = string.Empty;
private readonly string _dataSourceProductVersionNormalized = string.Empty;
private readonly GroupByBehavior _groupByBehavior;
private readonly string _identifierPattern = string.Empty;
private readonly IdentifierCase _identifierCase;
private readonly bool _orderByColumnsInSelect = false;
private readonly string _parameterMarkerFormat = string.Empty;
private readonly string _parameterMarkerPattern = string.Empty;
private readonly Int32 _parameterNameMaxLength = 0;
private readonly string _parameterNamePattern = string.Empty;
private readonly string _quotedIdentifierPattern = string.Empty;
private readonly Regex _quotedIdentifierCase;
private readonly string _statementSeparatorPattern = string.Empty;
private readonly Regex _stringLiteralPattern;
private readonly SupportedJoinOperators _supportedJoinOperators;
public DataSourceInformation(DataTable dt)
{
foreach (DataRow r in dt.Rows)
{
foreach (DataColumn c in dt.Columns)
{
string s = c.ColumnName;
object o = r[c.ColumnName];
if (o == DBNull.Value)
{
o = null;
}
if (!string.IsNullOrEmpty(s) && o != null)
{
switch (s)
{
case "QuotedIdentifierCase":
_quotedIdentifierCase = new Regex(o.ToString());
break;
case "StringLiteralPattern":
_stringLiteralPattern = new Regex(o.ToString());
break;
case "GroupByBehavior":
o = Convert.ChangeType(o, _GroupByBehaviorType);
_groupByBehavior = (GroupByBehavior)o;
break;
case "IdentifierCase":
o = Convert.ChangeType(o, _IdentifierCaseType);
_identifierCase = (IdentifierCase)o;
break;
case "SupportedJoinOperators":
o = Convert.ChangeType(o, _SupportedJoinOperatorsType);
_supportedJoinOperators = (SupportedJoinOperators)o;
break;
default:
FieldInfo fi = _Type.GetField("_" + s,
BindingFlags.IgnoreCase | BindingFlags.NonPublic |
BindingFlags.Instance);
if (fi != null)
{
fi.SetValue(this, o);
}
break;
}
}
}
break;
}
}
public string CompositeIdentifierSeparatorPattern
{
get { return _compositeIdentifierSeparatorPattern; }
}
public string DataSourceProductName
{
get { return _dataSourceProductName; }
}
public string DataSourceProductVersion
{
get { return _dataSourceProductVersion; }
}
public string DataSourceProductVersionNormalized
{
get { return _dataSourceProductVersionNormalized; }
}
public GroupByBehavior GroupByBehavior
{
get { return _groupByBehavior; }
}
public string IdentifierPattern
{
get { return _identifierPattern; }
}
public IdentifierCase IdentifierCase
{
get { return _identifierCase; }
}
public bool OrderByColumnsInSelect
{
get { return _orderByColumnsInSelect; }
}
public string ParameterMarkerFormat
{
get { return _parameterMarkerFormat; }
}
public string ParameterMarkerPattern
{
get { return _parameterMarkerPattern; }
}
public int ParameterNameMaxLength
{
get { return _parameterNameMaxLength; }
}
public string ParameterNamePattern
{
get { return _parameterNamePattern; }
}
public string QuotedIdentifierPattern
{
get { return _quotedIdentifierPattern; }
}
public Regex QuotedIdentifierCase
{
get { return _quotedIdentifierCase; }
}
public string StatementSeparatorPattern
{
get { return _statementSeparatorPattern; }
}
public Regex StringLiteralPattern
{
get { return _stringLiteralPattern; }
}
public SupportedJoinOperators SupportedJoinOperators
{
get { return _supportedJoinOperators; }
}
}
Without going into too much of an explanation, this class takes the DataTable
and fills itself via Reflection. There are a few exceptions, which are shown in the switch
statement. If we fill this class and keep it around, we now have a good deal of help to build SQL for each individual database. This data includes, but is not limited to:
- the characters to wrap identifiers with
- if the database supports named parameters
- what the character is that denotes named parameters
- the maximum length of the name of a named parameter
Named parameters are wonderful. They enable you to write SQL like this:
DELETE FROM table1 WHERE ID=@ID and column2=@column2
which is much easier to deal with than:
DELETE FROM table1 WHERE ID=? and column2=?
For many people, this second one isn't clear as to what is going on at all. Named parameters obviously make things clearer, and you can reuse the values.
SELECT a.* FROM table1 a
INNER JOIN table2 b
ON a.ID = b.A_ID
WHERE a.NAME = @name
AND b.SOMEOTHERCOLUMN = @name
There are a few problems here, besides the fact that I didn't wrap any of my table or column names. By using named parameters, I have drastically reduced the number of databases my application can support. There are only three that I know of: MS SQL Server, Oracle, and PostgreSQL. I don't know if MySQL can; it may in the future, if it doesn't now. This means using named parameters is only a convenience. It also means we don't get to take advantage of being able to reuse parameters. Another thing to be aware of is that the lengths of names vary radically; SQL Server supports names of up to 128 characters, while Oracle only supports names of up to 15 characters. I will come back to this point in my next article.
Using the Code
At this point, we have all the bits in place, so let's look at how this code can all be put together into a a block that generates a statement that is likely to be able to be used in any database provider for .NET. In another article, I will talk about making this cleaner, and get into fixing the shortcoming with OLEDB and ODBC providers.
ConnectionStringSettings css =
ConfigurationManager.ConnectionStrings["DSN"];
string connectionString = css.ConnectionString;
string providerName = css.ProviderName;
DataSourceInformation dsi = null;
DbProviderFactory factory = DbProviderFactories.GetFactory(providerName);
using (var conn = factory.CreateConnection())
{
conn.ConnectionString = connectionString;
conn.Open();
dsi = new DataSourceInformation(conn.GetSchema(
DbMetaDataCollectionNames.DataSourceInformation));
}
DbCommandBuilder commandBuilder = factory.CreateCommandBuilder();
string prefix = commandBuilder.QuotePrefix;
string suffix = commandBuilder.QuoteSuffix;
var parameter = factory.CreateParameter();
parameter.DbType = DbType.Int32;
parameter.Value = 10;
parameter.ParameterName = "ID";
StringBuilder stringBuilder = new StringBuilder();
stringBuilder.Append("SELECT ");
stringBuilder.AppendFormat("{0}{1}{2}", prefix,
"ID", suffix);
stringBuilder.AppendLine(",");
stringBuilder.AppendFormat("{0}{1}{2}", prefix,
"user", suffix);
stringBuilder.AppendLine(",");
stringBuilder.AppendFormat("{0}{1}{2}", prefix,
"password", suffix);
stringBuilder.AppendLine(",");
stringBuilder.AppendFormat("{0}{1}{2}", prefix,
"default", suffix);
stringBuilder.AppendLine("");
stringBuilder.Append("FROM ");
stringBuilder.AppendFormat("{0}{1}{2}", prefix,
"table", suffix);
stringBuilder.AppendLine();
stringBuilder.Append("WHERE ID=");
if (dsi.ParameterNameMaxLength < 1)
{
stringBuilder.Append("?");
}
else
{
stringBuilder.AppendFormat("{0}{1}",
dsi.ParameterMarkerPattern[0], parameter.ParameterName);
}
Debug.WriteLine(stringBuilder.ToString());
using (DbConnection conn = factory.CreateConnection())
{
conn.ConnectionString = connectionString;
conn.Open();
using (DbCommand cmd = conn.CreateCommand())
{
cmd.CommandText = stringBuilder.ToString();
cmd.CommandType = CommandType.Text;
cmd.Parameters.Add(parameter);
using (DbDataReader dr = cmd.ExecuteReader())
{
while (dr.Read())
{
}
}
}
}
Notice I am using the default DbType
enumeration to specify the data type of the parameter; this is very important if you want to make your code more portable.
Points of Interest
This code now does exactly what it set out to do. It generates SQL in a way that:
- Can be used with all providers, even when there are keywords being used as table and column names.
- Uses parameters, making the execution of the SQL both safer and faster.
If you look into the other collections that are available, you can even find a table that maps the databases types to System.Type
and DbType
.
This type of coding prevents your code from being tied to a specific version of a driver, which is the side effect of directly referencing a provider's DLL.
No amount of careful coding can completely handle every database; there are too many differences. However, this kind of careful work can open your application to be used with a greater number of databases, and thus a greater number of clients.
History
- 14th January, 2010: Initial version