Dynamic Table Mapping for LINQ-to-SQL

Zimin Max Yang

5.00/5 (17 votes)

21 May 2012CPOL7 min read

228.1K

2.3K

Dynamic table mapping for LINQ-to-SQL, suitable for data horizontal partitioning (Shard).

Introduction

Horizontal partitioning of large data sets is very popular in real life applications. Such partitioning, sometimes termed as “Shard”, can help reduce the amount of data a query has to go through before reaching its result. Tables participating in such partitioning differs from each other only in their names. They should have identical structure of data columns.

Using LINQ-to-SQL to access those tables poses one problem. LINQ-to-SQL requires a one class to one table mapping. That means for every possible table in the partition, there has to be a class defined. One may argue that it is not a problem to dynamically define classes and use them in DataContext.GetTable<TEntity>(). The result of such a method is not satisfactory. For those dynamically defined classes, only the ITable() interface can be used in code explicitly, thus Intelli-sense won’t be able to kick in to help identify element types, and the compiler won’t be able to run syntax checking, and the beautiful LINQ statement can not be used. It won’t help, in this case, even if those classes are derived from some base classes. The truth is ITable<Derived>() is assignable to ITable<Base>().

One failed attempt to solve this LINQ-to-SQL problem is trying to associate a table name with a class definition dynamically so that when its mapping information is needed, the associated table name gets returned, instead of the name specified by TableAttribute(). This is a dead end. Type information stored in Type() instances are singletons for all class definitions. And with the deferred execution nature of LINQ, there is no way to uniquely identify a table name assigned before.

There comes two solutions, one is only available on .NET 4.0.

Approach 1, using equivalent query, 3.5+

The method is to use a wrapper class and convert actions performed on it into actions to the underlying object. That is to say, pose the new class as a table and take CRUD operations down into the underlying table.

The wrapper class uses an equivalent query for the reading operation. To ultimately read the data, a new class with necessary mappings will need to be defined dynamically and used in DataContext.GetTable<TEntity>() as the TEntity. Since a LINQ-to-SQL reading operation uses IQueryable<TEntity>() interface only and does not explicitly make a different between tables or queries. The wrapper returns the equivalent query:

from r in DataContext.GetTable<DynamicEntity>() select (Base)r

whenever the table is needed, and further query composing is of no problem.

With the reading operation solved by the equivalent query, updating and deleting operations are also solved. The objects returned by the equivalent query are actually directly from the underlying table. Any change on those objects will be tracked by the data context and be committed to database upon submission. Deletion is the same case.

Creating a new record, or inserting an object, is a bit different. Because the object representing the new record is not of the same type as the row type of the underlying table. But those two types can be viewed all as the Base() type and a simple cloning could copy all the necessary values into the ultimate object being inserted into the underlying table. There is one fact that needs special attention. Since the values are copied from the input object to the table object, any change on the table object later are not reflected back into the input object. A method is provided in the following sections. However, horizontally partitioned tables do not always have DB generated values and insertion can be separated from other operations, this discrepancy issue can be avoided.

The IQueryable<TEntity>() and ITable() interfaces have not been implemented fully for the project. Implementing others should be trivial as the essence of the method is already presented.

The main implementation is shown below:

public static ATable<TEntity> GetTable<TEntity>(this DataContext context, string name) 
            where TEntity : class
{
    // Create the entity type
    Type type = DefineEntityType(typeof(TEntity), name);
	
    // Create the underlying table
    ITable refer = context.GetTable(type);

    // New instance of the wrapper
    return new ATable<TEntity>(refer, name);
}

public class ATable<TEntity> : IQueryable<TEntity>, ITable
            where TEntity : class
{
    /// <summary>
    /// Equivalent query
    /// </summary>
    private IQueryable _equivalent;
    /// <summary>
    /// Supporting table
    /// </summary>
    private ITable _table;

    public ATable(ITable inner, string name)
    {
        // Supporting table
        _table = inner;

        // Get the "Select" method
        MethodInfo select = GetGenericSelect();
        MethodInfo invokable = select.MakeGenericMethod(_table.ElementType, typeof(TEntity));

        // Prepare a conversion lambda
        ParameterExpression param = Expression.Parameter(_table.ElementType, "r");
        Expression body = Expression.Convert(param, typeof(TEntity));
        LambdaExpression lambda = Expression.Lambda(body, param);

        // Invoke the select and get the equivalent query from the supporting table
        _equivalent = (IQueryable)invokable.Invoke(null, new object[] { _table, lambda });
    }

    /// <summary>
    /// Retrieve the right "Select" method from the Queryable
    /// </summary>
    /// <returns></returns>
    private static MethodInfo GetGenericSelect()
    {
        foreach (var method in typeof(Queryable)
             .GetMethods(BindingFlags.Static | BindingFlags.Public))
        {
            if ((method.Name == "Select") &&
                (method.GetParameters()[1].ParameterType.GetGenericArguments()[0]
                     .GetGenericArguments().Length == 2))
                return method;
        }

        throw new Exception();
    }

    #region IQueryable interface implementation (all from equivalent)
    public IEnumerator<TEntity> GetEnumerator()
    {
        return (IEnumerator<TEntity>)_equivalent.GetEnumerator();
    }

    IEnumerator IEnumerable.GetEnumerator()
    {
        return _equivalent.GetEnumerator();
    }

    public Type ElementType
    {
        get { return _equivalent.ElementType; }
    }

    public Expression Expression
    {
        get { return _equivalent.Expression; }
    }

    public IQueryProvider Provider
    {
        get { return _equivalent.Provider; }
    }
    #endregion

    #region ITable interface implementation
    public void InsertOnSubmit(object entity)
    {
        // Input validation
        if ((entity == null) || (!typeof(TEntity).IsAssignableFrom(entity.GetType())))
            return;

        // Create a new table row
        TEntity instance = (TEntity)Activator.CreateInstance(_table.ElementType);

        // Transfer values into the table row
        foreach (var prop in typeof(TEntity).GetProperties())
            prop.SetValue(instance, prop.GetValue(entity, null), null);

        // Submission
        _table.InsertOnSubmit(instance);

        if (entity is Wrapper<TEntity>)
            ((Wrapper<TEntity>)entity).Inner = instance;
    }

    public void DeleteOnSubmit(object entity)
    {
        _table.DeleteOnSubmit(entity);
    }
    #endregion

Approach 2, 4.0+

Not sure how to name this solution. In short, the solution is to use a wrapper ITable<TEntity>() class that wraps around an ITable() object. In fact, the inner ITable() object is actually ITable<DynamicClass>() which holds the table name. The dynamic class is a derived class of TEntity() with no extra properties and fields. So the dynamic class can be treated as TEntity(), which will satisfy the .NET runtime check when results are returned back from the database. On the other hand, dynamic class instances can turn into TEntity() without losing any important information.

It is an accidental found, since in a ITable<TEntity>() implementation, TEntity() is referenced by Intelli-sense and used by the compiler, while LINQ-to-SQL uses property Expression and Provider. The decoupling of such information allows this trick to play.

Two blocks of implementation are of more importance than others in this solution. One being the creation of a dynamic class with TableAttribute() and ColumnAttribute() attached. The other one is an executable delegate to clone a dynamic class instance from a TEntity() instance, which is required only when insertion is performed. The implementation of both code blocks contains no tricky things and is easy to understand in the attached code.

The data object synchronization problem happens in this approach too. Please check the following code to see a solution to this.

Following is the implementation of this table wrapper:

/// <summary>
/// Retrieve a table from the data context which implements 
/// ITable&lt;TEntity&gt; uses specific backing table

/// </summary>
/// <typeparam name="TEntity">Entity Type</typeparam>
/// <param name="context">Data context</param>
/// <param name="name">Table name</param>
/// <returns></returns>
public static ATable<TEntity> GetTable<TEntity>(this DataContext context, string name)   
      where TEntity : class
    {
        // Create/Retrieve a type definition for the table using the TEntity type
        var type = DefineEntityType(typeof(TEntity), name);

        // Create the backup table using the new type
        var refer = context.GetTable(type);

        // Prepare the cloning method
        var cloneFrom = CompileCloning(typeof(TEntity), type);

        // Construct the table wrapper
        return new ATable<TEntity>(refer, cloneFrom);
    } 

    /// <summary>
    /// A table wrapper implements ITable&lt;TEntity&gt; backed by other ITable object
    /// </summary>
    /// <typeparam name="TEntity"></typeparam>
    public class ATable<TEntity> : ITable<TEntity> where TEntity : class
    {
        /// <summary>
        /// Backing table
        /// </summary>
        private readonly ITable _internal;
        /// <summary>
        /// Cloning method
        /// </summary>
        private readonly Delegate _clone;

        /// <summary>
        /// Construct from backing table
        /// </summary>
        /// <param name="inter"></param>
        /// <param name="from"></param>
        public ATable(ITable inter, Delegate from)
        {
            _internal = inter;
            _clone = from;
        }

        public void Attach(TEntity entity)
        {
            throw new NotImplementedException();
        }

        public void DeleteOnSubmit(TEntity entity)
        {
            // Directly invoke the backing table
            _internal.DeleteOnSubmit(entity);
        }

        public void InsertOnSubmit(TEntity entity)
        {
            // Input entity must be changed to backing type
            var v = _clone.DynamicInvoke(entity);

            // Invoke the backing table
            _internal.InsertOnSubmit(v);
        }

        public IEnumerator<TEntity> GetEnumerator()
        {
            throw new NotImplementedException();
        }

        System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
        {
            throw new NotImplementedException();
        }

        public Type ElementType { get { return _internal.ElementType; } }

        public System.Linq.Expressions.Expression Expression { get { return _internal.Expression; } }

    public IQueryProvider Provider { get { return _internal.Provider; } }
}

It is not required to implement all ITable<TEntity>() interface methods for an ordinary CRUD operation.

How to use with common structure defined as interface

The common structure of such tables can be defined in the interface, with directly attached ColumnAttribute(). The dynamic class will have copies of all those ColumnAttribute()s attached to the corresponding properties. In order to perform an insertion, a class implementing the interface, with ColumnAttribute() attached, is needed.

Here is the example using the interface:

/// <summary>
/// Result table interface
/// </summary>
public interface IResult
{
    [Column(IsPrimaryKey = true)]
    int Id { get; set; }
    [Column]
    string Name { get; set; }
    [Column]
    double Value { get; set; }
}
/// <summary>
/// A implementation of the "result"
/// </summary>
public class ResultImp : IResult
{
    public int Id { get; set; }
    public string Name { get; set; }
    public double Value { get; set; }
}
// Sample reading 
var context = new DataContext(SQLTest);
var table = context.GetTable<iresult>("result2012");
var query = from r in table where r.Id == 108 select r;
var list = query.ToList();

// Insertion  
table.InsertOnSubmit(
    new ResultImp { Id = NewId, Name = "Newly added", Value = 230.4595 });
context.SubmitChanges();</iresult>

How to use with common structure defined as class

The common structure of such tables can also be defined in a class, with virtual properties. However, ColumnAttribute() can not be used on properties in the 4.0+ approach, since it will confuse the LINQ-to-SQL runtime as the dynamically derived class will have another property with the same property name and column name defined. Therefore, a clone of ColumnAttribute() is used to carry the mapping information. The 3.5+ approach can still use ColumnAttribute().

Here is an example using the base class in 4.0+. A 3.5+ base class differs only in the column attribute.

/// <summary>
/// Result table class
/// </summary>
public class AResult
{
    [AlterColumn(IsPrimaryKey = true)]
    public virtual int Id { get; set; }
    [AlterColumn]
    public virtual string Name { get; set; }
    [AlterColumn]
    public virtual double Value { get; set; }
}

// Read from database
var context = new DataContext(SQLTest);
var table = context.GetTable<AResult>("result2012");
var query = from r in table where r.Id == 108 select r;
var list = query.ToList();

// Insertion
table.InsertOnSubmit(
     new AResult { Id = NewId, Name = "Newly added", Value = 230.4595 });
context.SubmitChanges();

How to synchronize the inserted object

This is done through the use of a thin wrapper object for the base data object. The base data object, which is being instantiated for insertion, uses its own fields to store values before insertion happens. After it is passed into ATable<TEntity>InsertOnSubmit(), the actual table object is attached. All reading or writing of virtual properties are synchronized onto the underlying table object then.

Following is what a synchronized base object would look like:

public class VResult : Utility.Wrapper<VResult>
{
    private int _id;
    private string _name;
    private double _value;

    [Column(IsPrimaryKey = true, IsDbGenerated = true)]
    public virtual int Id
    {
        get { return Inner == null ? _id : Inner.Id; }
        set { if (Inner == null) _id = value; else Inner.Id = value; }
    }

    [Column]
    public virtual string Name
    {
        get { return Inner == null ? _name : Inner.Name; }
        set { if (Inner == null) _name = value; else Inner.Name = value; }
    }

    [Column]
    public virtual double Value
    {
        get { return Inner == null ? _value : Inner.Value; }
        set { if (Inner == null) _value = value; else Inner.Value = value; }
    }
}

The synchronization happens on the following lines in ATable<TEntity>.InsertOnSubmit():

if (entity is Wrapper<TEntity>)
    ((Wrapper<TEntity>)entity).Inner = instance;

Running the sample

The 3.5+ sample is written in Visual Studio 2008 with .NET 3.5. It works on .NET 4 and .NET 4.5 (tested in Visual Studio 2011 Beta). Simply changing the configuration could make it work under a new framework.

The 4.0+ sample is written in Visual Studio 2010, with .NET 4. It does not work with .NET 3.5 even if you reconfig the project, because LINQ-to-SQL in .NET 4 has different class structures.

In order to run all the tests, please make sure you have SQL Express installed and there is a database named "test" on it. Run the test cases in "Preparation" first to prepare the data.

Extra thinking

There is one solution, if provided, can greatly reduce the complexity introduced in this article. The inheritance of generic classes. It is understandable that List<Base>() can be assigned from List<Derived>() since all elements inside the children list do have all traits of a parent object, whatever performed on List<Base>() is applicable to List<Derived>(). The same reasoning applies for ITable<Base>() and ITable<Derived>(). If that is so, there is even no need for this article and the above-mentioned method. Retrieve a table using the derived classes and cast them into the table of the parent class, and done. Unfortunately in .NET 4.0, those classes are still treated as totally different classes. Similarly, delegates consuming child classes carry no relationship with delegates consuming parent classes. Hope the .NET team can address this issue some time soon.

Finally

Thanks for reading. All comments and questions are welcomed.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)