(untagged)

Load Any Object From Most Any Database

Alaric Dailey

0.00/5 (No votes)

8 Feb 2010

Building An O/R Mapper Step 1

Introduction

It seems like everyone is now writing Object Relational Mappers, some good, some not so good. Opinions aside, (N)Hibernate is probably the most famous; however, sometimes it's more than is necessary, and sometimes it's just not what you are looking for. Sometimes you have to use databases other than what the host of existing O/R Mappers support. Whatever the reason, the premise of this article is filling your objects from most any database table. In fact, if you can connect to it with a .NET dataprovider, and write a query to get the data, then this code can be used to fill in your objects.

Background

I am not going to cover a few things, to keep things simple.

Here are some of the things I am not going to cover:

Getting the connection string and provider name from the config, and connecting to pretty much any database. Because I covered it in this article.
Inherited objects
Creating custom attributes. This could be used to solve some of the problems I am going to show. This is because this article is only a place to start, and not a complete solution.
Connection pooling to enhance performance
Threading to fill the objects, though if you want a good place to start, you will need a threadsafe list. So you may want to check out my articles on a Thread Safe List, and my other article on Simplifying ReaderWriterLocks
Tracking changes to the objects. Too much to cover in this article.
Saving the objects back to the databases.Too much to cover in this article.

For starters, I really don't want to mess around with Activator.CreateObject, so I will create a generic function. Next, I do not want to tie this to any particular DataProvider so my plan was to use DbDataReader, however Resharper was nice enough to point out that I could use the IDataReader interface to be even more generic. Because of the uniqueness of the function, I will touch on just the declaration first.

private static BindingList<T> 
	LoadObjectsFromDataReader<T>(IDataReader dr) where T : new()

I marked the function as static because I am not touching any member variable from the containing class. The BindingList<T> will give the most flexibility for binding to Windows controls. Probably the most confusing thing to people who haven't used Generics (or templates) much is the <T> at the beginning of the function declaration. That is going to allow me to use mostly any object I like and return it in a strongly typed BindingList. The where T : new(), tells the compiler that not just any object will do, but that the object must have a public parameterless constructor. So let's have a look at the body of the function.

private static BindingList<T> 
	LoadObjectsFromDataReader<T>(IDataReader dr) where T : new()
{
    //get the type of the object, without having to create one
    Type type = typeof(T);
    BindingList<T> retval = new BindingList<T>();

    //Mapping Block
    List<PropertyInfo> itemProperties = new List<PropertyInfo>();
    for (int i = 0; i < dr.FieldCount; i++)
    {
        string s = dr.GetName(i);
        var pi = type.GetProperty(s, BindingFlags.Instance | 
		BindingFlags.Public | BindingFlags.SetProperty);
        itemProperties.Add(pi);
    }

    //data block
    object[] oo = new object[itemProperties.Count];
    while (dr.Read())
    {
        dr.GetValues(oo);
        //could be threaded block                
        T item = new T();
        int fieldIndex = -1;
        foreach (var pi in itemProperties)
        {
            fieldIndex++;
            if (pi != null)
            {
                object o = oo[fieldIndex];
                if (DBNull.Value.Equals(o))
                {
                    o = null;
                }
                try
                {
                    pi.SetValue(item, o, null);
                }
                catch
                {
                    //eat data errors quietly
                }
            }
        }
        retval.Add(item);
        //end of could be threaded block
    }
    return retval;
}

I will break this down by the comments since the code is a little different than what most people are used to.

Mapping Block

To start this function, after declaring some variables, I loop through all the fields in the datareader, getting their names by ordinal (index position). Then using reflection, I find the property by that name. The BindingFlags makes sure I get the property if it has a public set function, and is not a static property. Regardless as to whether or not the property is found, I add it to my collection.

Data Block

Now I loop through all records in the datareader. Rather than looping through every field in the datareader, for every record, I get all fields into my array of objects with dr.GetValues(oo). At this point, I create a new object of the proper type.

Now comes the fun part, for each record, I loop through the collection of properties, because they are in ordinal order of the datareader, if a property was not found, or it did not have a set, I skip over it. This prevents me from doing any unnecessary work.

Because NULL values in the database are not returned as null in .NET, they are returned as DBNull. I check to see if it was returned for my columns value, and if so, I make it null.

At this point, I assign the value by calling the pi.SetValue, passing in the object that we are setting the value on, and the value I am setting. However, since there MAY be a data mismatch, I wrapped the code into a try/catch block.

Last thing, we add our newly filled in object to our BindingList<T>. When finished with all the records, I return the binding list with all the fresh data.

Using the Code

The first thing we need is a database to query. I am going to use my example database from one of my previous articles, here.

Next, we need a class to represent a row in the table, because the above code doesn't use anything other than properties, this class is nothing more than automatic properties.

public class Issuer
{
    public Guid ID { get; set; }
    public string Name { get; set; }
}

Ideally using the code would look a little like this, taking into consideration that you aren't going to want to spread your database code through your code. This block gives you what you need to make it work though. Thus this code may be spread out over several functions in different places, like a database class, or a base class for the business object.

Type type = typeof(Issuer); 
string name = type.Name;

string sql = "select * from " + name;
var factory = DbProviderFactories.GetFactory(providerName);
using (var connection = factory.CreateConnection())
{
    connection.ConnectionString = connectionString;
    connection.Open();
    using (var command = connection.CreateCommand())
    {
        command.CommandText = sql;
        command.CommandType = CommandType.Text;
        using (var dr = command.ExecuteReader(CommandBehavior.CloseConnection))
        {
            return LoadObjectsFromDataReader<T>(dr);
        }
    }
}

Using the poorly performing "select * from" statement will pull all data from the table by using the name of the class as the name of the table. After building the connection and command, we execute a datareader and pass it to our newly created function.

Shortcomings

This function has some serious shortcomings, so let's make them clear. This code assumes the column names and table names are all valid variable and class names in C#, so a column name like "Vehicle Identification Number" won't work at all. The block I use to call the code derives the table name from the class name, this will work in the majority of cases, most database designs don't take advantage of schemas, but this approach also means that things like multiple databases or linked servers also can't be used.

Wrapping the data assignment in a try/catch block isn't the best solution, it's just the quickest solution to preventing partially consuming the datareader, and throwing unexpectedly.

The select * from statement is always inefficient, it would be better to select the column names individually. However, because there is no way to know which properties on an object may be column names, or might have to be mapped because of column names that are not valid property names. A perfect example of this is in my example database, in Vehicle there is a column called Vehicle Identification Number, because it contains spaces, no property will ever be found with that name.

In this code, I am not wrapping the tablename in the proper characters to prevent issues with SQL and database specific keywords. I didn't include this because getting that information reliably is a little more than simply getting a commandbuilder. You could even get the command builder to fix your SQL statement using the "one last trick" from one of my previous articles here.

This code will be slower than necessary because we are setting properties. Because of this, we lose some benefit that we should have, the hiding and protecting of data. Do we really want the user to be able to overwrite the data in the primary key column(s)? Tracking changes and valid values on the object when the property is set is also much more difficult because of having to use the properties. This is because there is no easy way to know if the property is being set by this function, or from some other piece of code.

Had we chosen to fill the private fields, possibly from a base class, and used custom attributes we could fix some if not all of these problems.

Tracking changes to the class would be absolutely necessary to have the class be able to update the proper row in the database.

In the LoadObjectsfromDataReader function, you will see a block marked "could be threaded block", theoretically all the code within that block could be run on a separate thread, allowing faster reading of the datareader. The issue with doing it is that all of the fills have to finish before the return can happen. Something to think about though.

Points of Interest

Despite the shortcomings, this is still a reasonably efficient and very flexible way to get data from the database and into objects, objects that we can then query with LINQ to Objects.

Loading objects from multiple data sources, from different systems using this approach and querying the loaded data using "LINQ to Objects" is a great alternative to using some hard coded providers framework like LINQ to SQL, especially when you need to correlate data from multiple disparate sources.

In future articles, I will start going over how to address the shortcomings.

History

6^th February, 2010: Initial post
8^th February, 2010: Article updated

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here