Background
I personally don't like to use databases directly so I always use or create some kind of ORM. Another thing that I hate is being constrained by the database types and that's why I usually avoid already made ORMs and create my own: To support user types (in .NET) without changing them (no need to implement a specific interface or base class).
This works great when the ORM does all the work but what if the ORM is not capable of doing some kind of query? I end-up needing to use normal ADO.NET calls, needing to do all required conversions by hand. So, I decided it was time for something new.
If you prefer to understand this library by its utilization, you can go directly to the Basic Utilization topic.
What this Library Does?
Before explaining how to use this library, it is important to understand what it really does.
It:
- Uses readers as typed-enumerators. Instead of moving to the next row and then reading data using
GetInt32
, GetString
and so on by indexes, a full instance is filled. And, as it is enumerable, all constructs that work over enumerables, including foreach
and in memory LINQ methods can be used. - Can use typed objects to fill all the parameters of a query. That is, if your class has 3 properties, it will use those 3 properties, with the right types, to fill parameters with the same names.
- Is able to use user defined .NET types without changes to the types themselves (that is, the type does not need to implement an interface or be based on a specific base class).
- Automatically caches recently prepared commands for faster reuse (which is configurable);
- Supports recursive transactions - That is, you can create a new transaction when one is already active without losing the all or nothing result from the outer transaction.
Note
I initially started to write this as a documentation for the library but, as I always present things as articles, I adapted it a little to look like an article. Unfortunately, at this moment, it is an article on how to use the library instead on how to create it. But I really think it is useful enough to be used as is, by only adding your own mappers if required, so, give it a try.
Why is This Library Better than Using ADO.NET Directly?
Maybe you are already thinking this is not really great. You can use DataTable
s, you know how to prepare commands, how to control transactions and, if performance is an issue, you can use a DataReader
. Even if it is not really amazing, it is easy to fill an object doing something like:
SomeType record = new SomeType();
record.Id = reader.GetInt32(0);
record.Name = reader.GetString(1);
return record;
So, which is the advantage of having a library to automatize this?
Well, there are many, but they may not be directly visible. Comparing a collection of typed objects to a DataTable
, we have the advantage of having all the properties accessible by intellisense, if we mistype something when accessing the object, we will get a compile-time error instead of a run-time exception, we will not need to cast values that are seen as object
to the right types, as they are already typed, and this also avoids boxing and consumes less memory than the DataTable
.
So, it is obvious that this approach wins over DataTable
s. But as I already shown, it is not hard to fill an object using a DataReader
... so, let's see some "common practices" when using DataReader
s to fill objects to fully understand why you may want to use this library:
List<SomeType> result = new List<SomeType>();
while(reader.Read())
{
var record = new SomeType();
record.Id = reader.GetInt32(0);
record.Name = reader.GetString(1);
result.Add(record);
}
return result;
List<SomeType> result = new List<SomeType>();
while(reader.Read())
{
var record = new SomeType();
record.Id = Convert.ToInt32(reader[0]);
record.Name = Convert.ToString(reader[1]);
result.Add(record);
}
return result;
List<SomeType> result = new List<SomeType>();
while(reader.Read())
{
var record = new SomeType();
record.Id = reader.GetInt32("Id");
record.Name = reader.GetString("Name");
result.Add(record);
}
return result;
List<SomeType> result = new List<SomeType>();
while(reader.Read())
{
var record = new SomeType();
record.Id = Convert.ToInt32(reader["Id"]);
record.Name = Convert.ToString(reader["Name"]);
result.Add(record);
}
return result;
Now, let's focus on the differences between such implementations. The first two implementations use the columns indexes while the last two use the columns names. In many good practice documents, we will see the authors saying to use the names, so the query can change to include a column or remove a column and we don't need to adapt all the indexes, we can simply add or remove the equivalent field and that's enough. Such advantage may not be very useful with two fields, but getting the right indexes can become more difficult with 30 fields/columns.
But we should not stop there. Did you notice that the first and the third implementation use the reader.GetInt32()
and reader.GetString()
while the second and the fourth implementation read the value directly by the indexer and then do a Convert.ToInt32()
and Convert.ToString()
. Are the ones that use Convert
simply worst implementations?
And the answer is no. Some databases don't keep the column exact types. It is possible that you write an int
but that during reads, it is considered a decimal
and the GetInt32()
may throw exception. For example, I was surprised that SQL Server accepts a char
as parameter to be stored in a NCHAR(1)
column, but it throws exceptions if I use the GetChar()
method to read such value, forcing me to use GetString()[0]
or to use the indexer with the Convert
solution to get the same result. But, using the indexer, it is guaranteed that the value will be read, independent of the format used by the database, and then a conversion will be done to give me the result in the right type. So it is more prepared for different results from different databases.
So, is reading the values as object
using the indexer and then using Convert
better? Well, for such a situation, it is, but for most cases, it is simply slower. Before finding a real solution, let's return to the name or index approach. We should remember that reading a column value by its column name requires a "search" for such column. Doing that for some rows (like 100) is not a problem. But what about 10 millions?
So, a better solution, that is a mix of column names and column indexes, could be one of these:
List<SomeType> result = new List<SomeType>();
int idIndex = reader.GetOrdinal("Id");
int nameIndex = reader.GetOrdinal("Name");
while(reader.Read())
{
var record = new SomeType();
record.Id = reader.GetInt32(idIndex);
record.Name = reader.GetString(nameIndex);
result.Add(record);
}
return result;
List<SomeType> result = new List<SomeType>();
int idIndex = reader.GetOrdinal("Id");
int nameIndex = reader.GetOrdinal("Name");
while(reader.Read())
{
var record = new SomeType();
record.Id = Convert.ToInt32(reader[idIndex]);
record.Name = Convert.ToString(reader[nameIndex]);
result.Add(record);
}
return result;
In those examples, we read the indexes of the columns only once, and then we use them. But, as you can imagine, if we have more columns (like 30), we will add 30 lines to get the indexes and we will still need 30 lines to do the effective read using such indexes. It starts to be ugly, don't you think? And we still have the problem of using a version that is slower and safer if the database types are different, and a version that is faster but less resistant to data-type changes.
I will continue to show some problems before showing the solution. One of the problems is that I used a list to put all the results until now. I did it that way because that's something I see a lot on many different systems that return objects to avoid returning the datareader
to the user. The problem about returning a list is:
- If we are doing a very large import, where we only need to deal with one record at a time, we will end-up having all the records in memory... or causing
OutOfMemoryExceptions
, as the list will be in memory from the start to the end; - If we do a
break
after using a small part of the records (like 10%) we still lost time reading them all and putting them into memory. Think about the FirstOrDefault()
method from LINQ: It will only read one record from the list but the method that returns a list already lost time reading all the records.
So, a better solution will be to create an enumerator. Thanks to the yield return
that can be pretty simple:
int idIndex = reader.GetOrdinal("Id");
int nameIndex = reader.GetOrdinal("Name");
while(reader.Read())
{
var item = new SomeType();
item.Id = Convert.ToInt32(reader[idIndex]);
item.Name = Convert.ToString(reader[nameIndex]);
yield return item;
}
Using the yield return
, we have an even smaller code. And, if the person wants to have the result as a list, as an array or similar, it is always possible to use methods like ToList()
on such enumerable and get the list. So, if it was so simple, why I did I start showing the wrong code?
And the answer is: Many programmers aren't used to yield return
yet and keep generating lists. When a library already does things the right way, even if it has a small overhead, you usually benefit from having the code that performs almost as fast as the ideal code all the time. This is better than having the code that performs the best sometimes... but performs extremely bad other times. And that's what usually happens when users need to write the code to fill objects every time.
But we didn't end with the problems yet. I said about the SQL Server problem, in which I can't read a char
like a char
, forcing me to read it as string
and then get the char
. Yet, in such a situation, I can still fill the parameter with a char
value. But what about a database that does not support some datatype
at all? This is especially common for the bool
type.
So, instead of writing something like:
parameter.Value = item.IsActive;
item.IsActive = reader.GetBoolean(columnIndexOrName);
We may have to write something like:
if (item.IsActive)
parameter.Value = 1;
else
parameter.Value = 0;
item.IsActive = reader.GetInt32(columnIndexOrName) != 0;
But this is not a good solution, as we may use two databases, one that supports booleans and another one that does not support booleans, and we want to keep the one that has booleans using it. So, we may end-up generating helper methods (or extension methods) like these:
public static void SmartSetBoolean(this IDbDataParameter parameter, bool value)
{
if (someInfoFromTheActualConnection)
{
if (value)
parameter.Value = 1;
else
parameter.Value = 0;
}
else
{
parameter.Value = value;
}
}
public static bool SmartGetBoolean(this IDbReader reader, int columnIndex)
{
if (someInfoFromTheActualConnection)
return reader.GetInt32(columnIndex) != 0;
return reader.GetBoolean(columnIndex);
}
And then, at each place where we write a boolean, we need to do something like this:
parameter.SmartSetBoolean(item.IsActive);
And at each place we read a boolean, we need to do something like this:
item.IsActive = reader.SmartGetBoolean(columnIndex);
But if you didn't plan to support other databases at the beginning, you may be already using the wrong read/write methods, so if there are 300 tables, it can become a very problematic change. If you used a solution that was prepared to deal with that kind of problem from the beginning, even if you didn't have the need to support many databases at the time, you will be avoiding problems when such change becomes a must.
Also, doing the test at each read will reduce speed again. The more tests and the more databases supported, the slower it gets.
So, is there a solution?
Yes. The same way we can get the column indexes only once before entering the while
and then we keep using the indexes, we can determine if the database uses real booleans or not only once and store a delegate
to do the read without any extra validations. But doing this by hand will make the code even uglier (and error prone).
That's why you may want to use this library. When opening a connection, it will identify the ADO.NET driver and will create the right "mappings" for such connection. That is, if it knows that booleans are not supported, it will already map booleans to something else, then when a read is done, it does not need to "check again" how to read a boolean. A delegate
is already there to do the right read. So, instead of doing a test per column or a test per execute, it will only do one test when the connection is opened.
Such initial mapping already works for other situations, like an int
that becomes a decimal
, so it already registers a mapping to read int
s as decimal
s (using the GetDecimal()
method, avoiding boxing) and will do a typed conversion (the conversion cast, not the unboxing cast). This is faster than reading things as object
to then use the Convert
class, and also supports conversions not done by the Convert
class (converting a boolean to the chars 0 and 1, and the chars 0 and 1 to boolean, for example).
Then, when preparing commands, it gets the index of each column by name and builds a delegate to read the column by index and set the property using the right types (again, avoiding boxing). Setting the values does not use reflection, reflection is used only to generate the delegate
, which will have the full speed to execute and so if you read one line or 1 million lines, you will have the column to property mappings done only once.
And the mapping idea doesn't end there, so you can generate "mappers" for types that aren't supported by any database. That is, you can map a Color
to an int
and a Point
to a long
and, if there is another mapping (from int
or long
to decimal
) it's not your problem, it will be done. That means you are not constrained to use only the types that are supported by all your databases. You can use all the types you want, as long as they are convertible to another type (in the worst case, almost all types are convertible to string
, but that can cause some problems that are discussed in the Limits topic) and you will only need to register the conversion once. An extra advantage is that the type itself does not need to implement an interface or base class, so you can convert third-party types that you are using in your objects.
At this moment, I hope you liked the idea. I can say that I compared performance with Dapper and this code performs better than it (even if I have more virtual calls, I avoid boxing, which dapper doesn't). Yet, that's only to inform you if you really care about performance. Dapper will not solve the problems of incompatible types with the database, but this library is surely bigger than Dapper. So, it's your choice.
Transactions
Another thing I consider very problematic is how transactions work in ADO.NET.
In theory, we can create three transactions in parallel in the same connection and each command may be bound to a different transaction. But most databases don't support more than one transaction per connection and, so, it is normal to use a secondary connection if a parallel transaction is really needed. The problem, then, is that some databases will not cause an error if you create a transaction and your command does not use it. In fact, this is one of the worst kind of errors, as you create the transaction, execute the commands, commit and when you check the data is there. The problem will only arise if there is an exception, in which case you will rollback, but the executed commands will not be rolled back, as they were not using the transaction.
I can go on and say that there are a lot of other problems. For example, you create a method like InsertRecord
. It starts and uses a transaction in the correct manner. But then, you want to import 1000 records. So, at each call to InsertRecord
, you should not allow a transaction to be created (as this will throw an exception in some databases, or will create a transaction completely independent on some other databases).
So a method that initially created its own transaction may need to be changed to receive an external transaction. This is already a problematic situation if the method is capable of receiving a transaction or creating its own, as people may forget to pass the transaction. Also, doing the methods that use a transaction this way will increase a lot the amount of code required.
What do you think about doing this?
void Insert1000Record(DatabaseConnection connection)
{
using(var transaction = connection.StartTransaction())
{
for(int i=0; i<1000; i++)
InsertRecord(connection, i);
transaction.Commit();
}
}
void InsertRecord(DatabaseConnection connection, int recordNumber)
{
using(var transaction = connection.StartTransaction())
{
var command = connection.CreateCommand();
var parameter = command.Parameters.Add("Number", personNumber);
command.CommandText = "INSERT INTO Record (Number) VALUES (" + parameter.PlaceHolder + ")";
using(var preparedCommand = command.PrepareNonQuery())
preparedCommand.Execute();
Log("InsertRecord " + personNumber);
transaction.Commit();
}
}
To understand this pseudo-code. A transaction is started and 1000 calls to InsertRecord
are done. But, inside the InsertRecord
method, it starts a transaction again, inserting the record and saving a log. What should happen if an exception is thrown while inserting record 500? Are the first records already committed or not? After all, there is a transaction.Commit()
inside the InsertRecord
method.
Well, what really happens in this library is that inner transactions don't really commit. Their call to Commit()
only tells the outer transaction that there was not a problem in such inner transaction. If an inner transaction does not commit, then the outer transaction can't be committed. Trying to commit an outer transaction when an inner transaction did not commit throws an exception.
So, anytime that you need a transaction, simply call StartTransaction()
. In fact, it does not create a new object every time, it will only increment/decrement a count for the inner transactions. With this, you are free to create a transaction every time you have 2 or more actions that should be transactional, even if you don't know if those actions will be part of another transaction or not. Also, you never need to set the transaction to the command. If you started a transaction for the connection, all the executes will use such transaction. Much simpler, isn't it?
Drivers, Factories, Connections and TypeMappings
ADO.NET drivers are not really very standardized. Some use the parameter placeholders as @ParameterName
, others use the parameter placeholders as :ParameterName
. Also, the ParameterName
itself is a problem, as some drivers expect the @
or :
to be part of the parameter name while others do not.
This library does not parse queries or correct those, but it tries to give a simple way to get the right name for the placeholders and for the parameters. When you create a Parameter
, you must give the parameter name without any placeholder information. That is, the ParameterName
is ParameterName
, not @ParameterName
. The driver will automatically correct the name of the IDbParameter
to include any modification needed. But as you still need to create the query, you can get the PlaceHolder
as a property of the created parameter. So, if the placeholder is @ParamenterName
, :ParameterName
or even {ParameterName}
, you can get it from the created parameter as long the driver provides it.
Drivers? What Drivers?
The drivers I am talking about are not the ADO.NET drivers, they are the drivers made for this library. In fact, you can create your own drivers and register them directly in the application and there is the option to use "default
" drivers (that is, only the basic mappings and using parameter names and placeholders as @
, as that's the most normal approach). If you need or want to implement your own driver, implement the IDatabaseDriver
interface. In it, you must provide methods to open connections by a connection string, to create the right IDbDataParameter
and methods to return how an unmodified name (like ParameterName
) are adapted to be the IDbDataParameter.Name
and how the placeholder is created.
Factory
A factory is bound to a driver and we can say it is the user-view of such driver. By default, a factory of a given name will use a driver with the same name. The driver is "low level" and highly linked to the ADO.NET itself. The factory is not. The factory is the ADO+.NET abstraction that lets you open typed database connections and allows you to put factory level configurations. The most important factory level configurations are the type-mappings. Here is where you say how a conversion to int
, boolean or any other type is done for all the connections created by this factory. The driver itself can already register the basic conversions (and all the drivers I am providing support the same database primitive types). But it is the user who will register any conversion that is pertinent to the application. And this is why different databases can have different conversions. You may register a conversion from boolean to int
for a Firebird factory, but really keep a boolean as boolean for a SqlServer factory.
Connection
An ADO+.NET connection is very similar to a ADO.NET connection (and in fact, it uses one internally). The difference is that the commands created by this connection are typed commands (effectively, you can use foreach
on them instead of calling ExecuteReader()
and then dealing with such reader). In the connection itself you can also further configure the type-mappings. It is done this way so you can have two databases of the same driver (for example, Firebird) that use different data-type conversions. This may be very useful when importing data from an old schema to a new one. Maybe before a boolean was an int
and now it is a char
. Who knows your specific needs?
Type Mappings
As you may have noticed, some type-mappings are already done in the default drivers and some must be done by you.
All the type-mappings are done by classes dedicated to a single data-type. There are events where you can create the type-mappings on the fly if you have a code capable of generating many conversions (that is used for enums and nullable types, for example) but the type-mapping instance is always bound to a single conversion. It is this way for performance and simplicity.
There are two main classes that you may need to use to create your own type mappings. The DataTypeMapper
generic class is used if you are creating a type-mapper that knows how to get and set the values directly from the data-readers and db-parameters. This is the one used by the drivers for the types directly supported by the database. But it is more likely that users will want to use the UserTypeMapper
class. With it, you tell how to convert a .NET type that is not supported by the database to a .NET type that is already supported by the database. For example, converting a Point
to a long
, when the database supports long
but obviously does not support Point
.
Here, we are not talking about .NET primitives or the types that all databases really support. Here, we are talking about the types that are expected to be supported by all the ADO+.NET drivers (be the ones that are coming with the article, be the ones that you decide to create). They are primitives in the sense that by supporting all of them, it is expected that any user-mapping will work correctly, as user mappings are usually made over one of those primitives. So, those primitives are:
bool
byte
byte[]
char
Date
DateTime
decimal
double
Guid
ImmutableArray<byte>
Int16 (short)
Int32 (int)
Int64 (long)
string
Time
- Any Nullables of those types (that is:
bool?
, byte?
, int?
, etc.) - Any
enum
that has an underlying type of int
User Types
I talked about user-types more than once. By user-types, I mean any type that is not expected to be supported by the database. A struct
like WPF Color
is not created by the user of the application, but it is a user-type in the sense it is not a database ready one. But as a "user type", you can create a conversion (like from Color
to int
) and use Color
as parameter values and when enumerating your queries.
Specialization
Generally, when reading data, we should not care about specific constraints (like if it is a VARCHAR(255)
or VARCHAR(100)
) but some databases require such specialization when preparing parameters.
This library does not allow you to set the Size
, Precision
or Scale
of a typed-parameter. It is the responsibility of the TypeMapper
to fill those when initializing a parameter. So, if you need to have VARCHAR(255)
and VARCHAR(100)
and want the drivers to use the specific size to prepare parameters, it is recommended that you chose one as the string
and that you create a user-type (like String100
) that will use VARCHAR(100)
.
At the first moment, you may consider this annoying, especially if you simply want to use this library to access old databases that use different sizes for all their strings. But this was done for simplicity. Instead of looking for a type and then looking for the property and possible custom attributes, only the type is used to do everything. And when combined with the real ORM and editors that work the same way, you may find this very good to keep a strong pattern.
Full ORM
This library is not a full ORM, on purpose. That is, it does the mappings, so you can consider it an ORM, but it does not build the queries for you. Its purpose is to make it simpler to fill parameters and read data on typed objects, which are faster, less memory consuming and easier to use than data-tables.
Full ORMs can be made using this library (I am already doing this) but the problem with full ORMs is that they change a lot how you interact with the database and many times have limited support for joins, aggregate values or even stored procedure calls. With this library, you are free to do any SQL you want but you will have less trouble dealing with data type conversions, transactions, parameter naming and command preparing/reusing.
This library solves many problems related to datatype conversions, yet it has limits... or better, you must take care about the conversions you do.
When you register your own data-type mapper, especially when you redirect an unsupported type to a supported type, you must remember that reading and writing a field value is not the only thing you do with fields. You usually compare them, you can use them in ORDER BY
clauses and you can use them to do math operations directly in the database.
For example, it is really easy to create a data-type mapper to map int
values as string
s. Considering you are storing 2 as "2
" and 10 as "10
", you will:
- Cause the
ORDER BY
or the greater/less than comparisons to work incorrectly (to a string, the order is "1
", "10
", "2
", the same way it is "b
", "ba
", "c
"), which is incorrect to an int
value - Avoid arithmetic operations. That is, normally you can do
IntField1 + IntField2
in the SQL, but as those fields will be, in fact, string
s, you can't do that.
So, we can say there is a limit. The ADO+.NET will not guarantee that your conversions will work, but if you simply do a bad conversion, you will probably notice it immediately. But if your conversion works but it is losing some traits (like having the wrong order) then, well, you may only notice that something is wrong after a long time.
Is this a bug?
Surely not. The ADO+.NET is already allowing you to use otherwise unsupported types. But it is up to you to use conversions that keep the same traits or, if that's not possible, to avoid doing specific actions (like arithmetics and order by) on the database, doing such actions only on the .NET code if it is necessary.
For example, it is possible to store a Date
, a Time
or a DateTime
as a string
. I will pick a Date
only to show how a conversion can be functional, but with the wrong order by
behavior and how it can be functional and with the good behavior.
Imagine that when you convert a Date
to a string
, you can store it in one of the following formats: MM-dd-yyyy, yyyy-MM-dd. I am keeping the -
to make it easier to understand the date, as in fact you can eliminate it to use less space, but that will not change the behavior I want to show.
Independent of which format you choose, as long as it is consistent (that is, you save it in such format, then you read it in such format) inserting data, updating data and even searching data by equality operators will work. But the order by is broken in the first format. Let's see the case three ordered dates, as the database will sort the strings:
01-01-2000, then 11-30-1998, then 12-31-1999
or
1998-11-30, 1999-12-31 then 2000-01-01
In the first situation, the month is becoming more important than the year to do the sort. By simply changing the format, we solve such situation. Yet, storing date
as string
is avoiding any math to be done. So, if you want to support adding a Date
and a Time
to generate a DateTime
, this may not be the indicated format. In fact, do all databases support doing Date + Time
and getting a DateTime
? Is this something that you need to do at the database level?
If your answers are no and then yes, you may consider storing DateTime
s as Int64
(or some numeric with an high enough precision). The format could be yyyyMMddHHmmss
. Then, you should store Date
only as yyyyMMdd000000
and Time
only as HHmmss
. In this case, if you simply add the numeric values of a Date
and a Time
, you will end-up with a numeric value for DateTime
. This is not yet perfect, as if you want to support adding and subtracting from TimeSpan
s, you may add one month to December and end in the month 13. If you want to correct such situation too, you should probably store the Ticks
of a DateTime
or TimeSpan
. In such case, all the arithmetic operations will work fine and you will have the full DateTime
precision but if you look at the values in the database, you will not see a comprehensive format. That all depends on how much you want to use the database directly and how much you prefer to simply read data and do all the operations in .NET.
The Sample
The main sample is far from being an useable application. It is simply an application that will create its own database (you should have SQLServerCe installed, or you should edit the configuration file to use a real SQLServer instance or Firebird), create a table using all the database primitives and also the Point
type insert two records and read those two records to see that all the conversions worked fine.
It can be useful if you create your own driver (for example, to Oracle) to test if all the primitives are working. Also, with the sample you can see how to do the queries, how to fill the parameters and even how to register your own type-mapping (in this case, used to convert a Point
to a long
).
I am of course lacking better examples, and that's something I plan to add with time. So, I hope you don't use the actual sample as a reason not to believe in the library.
The second sample is yet less utilisable. It uses a "Fake" database (that is, I implement the IDbCommand
, IDbDataReader
and similar interfaces) only to return records in the fastest possible manner. Then, I read those records using Dapper and using this library. As there is no real read, the measured time simply shows the overhead of the virtual calls used by this library and the overhead of virtual calls + boxing done by Dapper. The speed difference between both will be smaller on real situations, as the time spent with the virtual calls is smaller than the time spent doing real queries.
using(var connection = DatabaseConnection.OpenFromConfig("connectionStringName"))
{
var command = connection.CreateCommand();
command.CommandText = "INSERT INTO Table(Id, Name) VALUES (@Id, @Name)";
command.Parameters.Add("Id", 1);
command.Parameters.Add("Name", "Test");
using(var preparedCommand = command.PrepareNonQuery())
preparedCommand.Execute();
}
This very basic example will simply open a connection using a connection string that is found in the application configuration file.
If a parameter is not given to the OpenFromConfig()
call, the name "Default
" will be used to search for a connection string.
Having the connection a command is created. It is not an error that I am not disposing such command. The command itself is not disposable and it is also not linked to an ADO.NET command. In such command, we can set the SQL and add parameters. Then, we prepare the command. It is at this moment that an ADO.NET command will be created. So, this is the command that must be disposed.
As you can see, I used the PrepareNonQuery()
. A little different from the ADO.NET, in which you can create a command, optionally prepare it and then you can use any of the different Execute()
methods, in the ADO+.NET we should decide which kind of call we will be doing while we prepare the command. In this case, it is a insert, so we know there will be no result (so, it is the NonQuery version).
Finally, I execute the command. In this sample, I didn't use the result of such execute but, as happens with normal ADO.NET ExecuteNonQuery()
calls, an int
is returned telling me how many records were affected.
This example works and it is probable that you will use something like this when doing your first tests. Yet, it is far from using the potential of the ADO+.NET library. So, let's see what we could do to make it better.
Parameter Placeholders
I used @Id
and @Name
as the parameter place holders. This should work with most ADO.NET drivers, yet there are some drivers that use :
instead of @
.
So, I should use :Id
and :Name
which those drivers. So, how can we rewrite the CommandText
to make it work independent of the database?
Let's see a corrected version of the previous example:
using(var connection = DatabaseConnection.OpenFromConfig("connectionStringName"))
{
var command = connection.CreateCommand();
var idParameter = command.Parameters.Add("Id", 1);
var nameParameter = command.Parameters.Add("Name", "Test");
command.CommandText = "INSERT INTO Table(Id, Name) VALUES
(" + idParameter.PlaceHolder + ", " + nameParameter.PlaceHolder + ")";
using(var preparedCommand = command.PrepareNonQuery())
preparedCommand.Execute();
}
In this case, the parameters were created before filling the SQL, then idParameter.PlaceHolder
and nameParameter.PlaceHolder
were used to fill the SQL instead of filling @Id
and @Name
directly.
Note 1: It is the PlaceHolder
that is added to the SQL string, not the parameter.Value
. There is no risk of SQL Injection as data is still passed as a normal parameter. In fact, with the normal drivers the SQL will still have @Id
and @Name
, but for a driver that uses :
, it will be :Id
and :Name
.
Note 2: Is it not an error to concatenate string
s using the +
operator if you do all the operations in the same C# statement. The compiler is able to recognize that and use the string.Concat()
method. A StringBuilder
is better if there are really many Append()
calls done as different statements. Also, I can't really say that string.Format()
is a better solution, as it needs to parse the format string and it also requires the developer to count
the parameters. Maybe in an insert
(which already separates columns and values) there is not much difference, but in an update, doing "someColumn=" + someColumnParameter.PlaceHolder
is definetely easier than "someColumn={23}"
and then putting the someColumnParameter.PlaceHolder
at the right index.
Creating Parameters and Filling Them by Using Object Properties
Alternatively to getting the PlaceHolder
s from the already created parameters, it is possible to fill all the parameters using the public
properties of a given object.
using(var connection = DatabaseConnection.OpenFromConfig("connectionStringName"))
{
var command = connection.CreateCommand();
command.CommandText = "INSERT INTO Table(A, B, C, D, E, F, G, H, I, J)
VALUES (@A, @B, @C, @D, @E, @F, @G, @H, @I, @J)";
command.Parameters.CreateFromTypeProperties(typeof(TypeWithManyProperties));
using(var preparedCommand = command.PrepareNonQuery())
{
preparedCommand.Parameters.FillFromObject(objectWithManyProperties);
preparedCommand.Execute();
}
}
Well, the SQL is obviously not a real SQL, I will never create columns named A, B, C and so on. It is only an example to show that there are many properties. In this case, instead of creating many parameters by hand, I create all the parameters at once with a call to Parameters.CreateFromTypeProperties
.
Maybe you consider it a little strange that in this situation, you must first create the parameters and then you must fill them when the command is prepared. But in fact, the strange one is the version that allows to create a parameter already filling its values. To prepare the query, it is only important to know the parameter names and their types. The values are not important.
When the command is prepared, its Parameters
are read-only. You can't add parameters to the preparedCommand
anymore and adding a new parameter to the command
does not affect it. You can't remove parameters either. But you can fill their values. If you want to add many records you should only create the parameters once (in the command
), and then you will only need to fill them with new values (in the preparedCommand
) and call Execute()
many times.
In this situation, I still filled the parameter place holders by hand. That's not what you are supposed to do, but if you needed to get the parameter place holders by the parameters, it will be more useful to simply create all the parameters by hand. And that's our next example. I only wanted to show this one so you can see that such option exists (this may be useful if you are simply migrating an ADO.NET application to ADO+.NET and at the first moment you don't want to rewrite all the SQLs with fixed parameter names but still want to remove unnecessary and repetitive code).
Creating Parameters by Hand and Filling Values by Properties
using(var connection = DatabaseConnection.OpenFromConfig("connectionStringName"))
{
var command = connection.CreateCommand();
var idParameter = command.Parameters.Add<int>("Id");
var nameParameter = command.Parameters.Add<string>("Name");
command.CommandText = "INSERT INTO Table(Id, Name) VALUES
(" + idParameter.PlaceHolder + ", " + nameParameter.PlaceHolder + ")";
using(var preparedCommand = command.PrepareNonQuery())
{
preparedCommand.Parameters.FillFromObject(someObjectWithIdAndName);
preparedCommand.Execute();
}
}
In this situation, we have a mix of the last two situations. We still create our parameters by hand giving them the right names and types. But then, we don't use the parameters anymore to fill the values, we simply fill all the values from an existing object.
In fact, I think that if you create all the parameters by hand you should also fill all the parameters by hand as that's the fastest solution. But maybe you already have a code that generates the SQL correctly for you. So, if you only want to fill all the parameters at once which an object, use the FillFromObject()
method and all the work is done for you.
Reading Data
The previous examples were focused on filling parameters. But you will probably consider it much more powerful to read data. So, see this example:
using(var connection = DatabaseConnection.OpenFromConfig("connectionStringName"))
{
var command = connection.CreateCommand();
command.CommandText = "SELECT Id, Name FROM Table";
using(var preparedCommand = command.Prepare<Record>())
{
foreach(Record record in preparedCommand)
{
}
}
}
Enumerating a Single Column
The Prepare<T>()
method expects to fill instances of type T
. That is, each column will fill a property in such type.
But what if you only want to select a single column? It is a little strange (and a performance killer) to create an instance to fill a single property. It is better to return such column value directly.
That what the PrepareSingleColumn<TSingleColumn>()
method does. The following example shows how you can get all Id
s of a table:
using(var connection = DatabaseConnection.OpenFromConfig("connectionStringName"))
{
var command = connection.CreateCommand();
command.CommandText = "SELECT Id FROM Table";
using(var preparedCommand = command.PrepareSingleColumn<int>())
{
foreach(int id in preparedCommand)
{
}
}
}
ToArray() and ToAddCollectionSlim()
ADO.NET connections by default don't allow two active DataReader
s at once and ADO+.NET inherits this limitation.
For example, if you want to enumerate a secondary command when you are enumerating the first one, you will need either a secondary connection or you need to read all records from the first query and put them in memory.
I already saw many people solve this kind of problem in other ORMs by calling the ToList()
extension method. That works, but usually the LINQ ToArray()
method is faster.
Well, in ADO+.NET ToArray()
is yet faster, as it is explicity implemented and avoids generating the enumerator and the internal State Machine. Also the ToArray()
doesn't do many intermediate copies/resizes when reading large collections as List
s and even LINQ do. Only one copy is done, from an AddCollectionSlim<T>
to the resulting array.
But, if you only plan to iterate the array once, it is still faster to use such AddCollectionSlim
directly. So, if you need to buffer all the results in memory to then call a foreach
, opt to use the ToAddCollectionSlim()
method of the ADO+.NET commands.
Note: As I discussed earlier, putting all the records in memory can be problematic and throw OutOfMemoryException
s, so you should use such resource with care. Maybe doing a ToAddCollectionSlim()
solves your problem, maybe you really have to deal with a secondary connection.
How the AddCollectionSlim Works?
This topic is a little "misplaced" in this article as this is not an ADO+.NET specific resource. It is a resource that the ADO+.NET is using and that you can use without referencing the ADO+.NET at all.
But as I don't really think the AddCollectionSlim
deserves its own article (it will be too short) I will present it here for those who want to understand how it works. To understand it, let's first review how a List<T>
works.
A list allocates an internal array of a fixed size (in .NET 4.0, the first array is allocated only when adding the first item, with size 4, if you use the default constructor). Then, when you add items, the new item is put at the position Count, and then the Count is increased by one directly.
But, when you try to put an item when such inner array is already full, a new array is created (doubling the size of the previous one), and all the items are copied from one array to the other before adding the next item.
Even if the ToArray()
method from LINQ does not uses a List
to do its job, it uses exactly the same logic. Then, at the end, if the array is not completely filled (for example, an array of size 16 with 9 items), a new copy needs to be done (returning an array of size 9).
The AddCollectionSlim
never does such copy. The type holds a reference to the first block and to the actual block. The first block is by default of size 32. Then, when the block is full and a new Add
is called, a new block is allocated with the double size and is set as the actual one, but the previous block is kept there, only filling its next block field to reference the recently created block. So, there is no copy at all. New items will continue to be added to the "actual block" (this is, the latest created block) and, when iterating, it is simply a matter of starting from the first block and, after reaching the end of a block, going to the next one and continuing the process until the end.
This way, the more items you have, the faster the AddCollectionSlim
will be compared to the List
or to the actual LINQ's ToArray
implementation because it will avoid all the intermediate copies.
Maximum Performance
There is one situation that is the exactly opposite of storing all the values into an array or an AddCollectionSlim
and is relatively common in large exports.
You don't want to put all records in memory because that will consume too much memory. Also, you are completely sure that an item is not going to be used twice (that is, in a foreach
, you don't copy the value of the variable declared in such foreach
into another variable that will live longer than the foreach
scope).
So, in such a situation, instead of allocating a new object at each interaction (which will consequently need to be collected later) you can allocate a single object and, at each interaction, you simply let the reader fill the same object again and again.
This is a relatively "unsafe" technique because if you store such object to another variable (or give it to another thread) then, well, the object will have the incorrect value at the next iteration. But if you are sure that you will not use it in the wrong manner, it is probably the fastest way to read really large resultsets.
To use such resource, simply give a delegate that always returns the same object when preparing the command. So, the code could look like:
using(var connection = DatabaseConnection.OpenFromConfig("connectionStringName"))
{
var command = connection.CreateCommand();
command.CommandText = "SELECT Id, Name FROM Table";
Record sameRecord = new Record();
using(var preparedCommand = command.Prepare<Record>(() => sameRecord))
{
foreach(Record record in preparedCommand)
{
}
}
}
With ADO+.NET, filling the parameters can be easier, but it is not yet great.
Reading data is probably more interesting, yet you don't want to live using only the ADO+.NET primitives. You want a Color
to be a Color
, a DriverLicenseNumber
to be a DriverLicenseNumber
or, simply, you want any type that you create to be seen as that type, you don't want to use only database supported types (be it the specific types to your database or the ADO+.NET primitive types). So, to support all the types you need, you have some options:
- You can create an extra that does the conversion and redirects to the database-stored property. In this case, you will need two POCO classes, one that is database-bound (even if the theory says a POCO class is independent of its database representation) and another one that really represents the type you want, and you will need to convert from one to the other (this is almost as bad as using ADO.NET directly, so I don't recommend it);
- You can use two properties, one that uses the database format and is the persisted one and one that uses the converted (.NET specific) format, doing the conversions when reading and writing (so you have deviated POCO classes again);
- You can create a
UserTypeMapper
, that will tell how to convert any specific format to a database supported one (in this case, your POCO classes will be really ignorant to how they are persisted).
I think that by the comments inside the parentheses you know that you should use the third option. So, how does it work?
Creating the Mapper
To create the Mapper, you should inherit from the UserTypeMapper<TUser, TDatabase>
, where TUser
is the datatype not supported by the database (in this case, the Color
) and the TDatabase
is the type that's already supported by the database (in this case, int
as a normal color is 32 bit and can be stored as an int
).
In such class, you will need to implement two methods:
ConvertFromDotNetToDatabase
ConvertFromDatabaseToDotNet
I think the names are pretty clear. Surely, you need to know how to do the conversion, but I think the Color
to int
and int
to Color
is a well-known conversion (and our focus here is to understand how to create a converter, not the actual conversion).
Then, you should also create a constructor that receives a parameter. Such parameter is the DatabaseTypeMapper<TDatabase>
(in this case, TDatabase
is int
).
You need to do this because the UserTypeMapper<TUser, TDatabase>
will redirect the call to such mapper after the conversion. Yet, this is something that you do because you need to, as the constructor will be called for you if you register the converter globally.
Registering the User-Type Mapper
To register the User-Type Mapper so it will be supported by all connections of all factories, it is enough to call the DatabaseTypeMappersShared.GlobalRegisterUserTypeMapper()
passing the typeof()
of your newly mapper type as parameter.
Version History
- May, 5, 2013: Added the Adding Support To UserTypes topic
- April, 21, 2013: Added the Basic Utilization topic
- April, 16, 2013: Added an easier way to register a user-type mapper. Also, renamed the
DatabaseDotNetTypeMapper
to UserTypeMapper
, changed the generic arguments order to seem more natural and added examples of the use of alternative names. Now it is also possible to create plain ADO.NET commands from the ADO+.NET connection (you may need that if you want to use byte streaming from the readers, for example). - April, 12, 2013: Added support to
CommandBehavior
in the commands that return values and also set the default CommandBehavior
to SequentialAccess
and SingleResult
- April, 11, 2013: Initial version