Introduction
Horizontal partitioning of large
data sets is very popular in real life applications. Such partitioning, sometimes termed as “Shard”, can help reduce the amount of data a query has to go
through before reaching its result. Tables participating in such partitioning
differs from each other only in their names. They should have identical
structure of data columns.
Using LINQ-to-SQL to access those tables poses one problem.
LINQ-to-SQL requires a one class to one table mapping. That means for every
possible table in the partition, there has to be a class defined. One may argue
that it is not a problem to dynamically define classes and use them in DataContext.GetTable<TEntity>()
.
The result of such a method is not satisfactory. For those dynamically defined
classes, only the ITable()
interface can be used in code explicitly,
thus Intelli-sense won’t be able to kick in to help identify element types,
and the compiler won’t be able to run syntax checking, and the beautiful LINQ
statement can not be used. It won’t help, in this case, even if those classes
are derived from some base classes. The truth is ITable<Derived>()
is assignable to ITable<Base>()
.
One failed attempt to solve this LINQ-to-SQL problem is
trying to associate a table name with a class definition dynamically so that
when its mapping information is needed, the associated table name gets returned,
instead of the name specified by TableAttribute()
. This is a dead end.
Type information stored in Type()
instances are singletons for all class
definitions. And with the deferred execution nature of LINQ, there is no way to
uniquely identify a table name assigned before.
There comes two solutions, one is only available on .NET 4.0.
Approach 1, using equivalent query, 3.5+
The method is to use a wrapper class and convert actions performed on it into actions to
the underlying object. That is to say, pose the new class as a table
and take CRUD operations down into the underlying table.
The wrapper class uses an equivalent query for the reading operation.
To ultimately read the data, a new class with necessary mappings will need to be defined dynamically and used
in DataContext.GetTable<TEntity>()
as the TEntity
. Since
a LINQ-to-SQL reading operation uses
IQueryable<TEntity>()
interface only and does not explicitly make a different between tables or queries. The wrapper returns the equivalent query:
from r in DataContext.GetTable<DynamicEntity>() select (Base)r
whenever the table is needed, and further query composing is of no problem.
With the reading operation solved by the equivalent query, updating and deleting operations are also solved. The objects returned by the equivalent query are actually directly from the underlying table. Any change on those objects will be tracked by the data context and be committed to database upon submission. Deletion is the same case.
Creating a new record, or inserting an object, is a bit different. Because the object representing the new record is not of the same type as the row type of the underlying table. But those two types can be viewed all as the
Base()
type and a simple cloning could copy all the necessary values into the ultimate object being inserted into the underlying table. There is one fact that needs special attention. Since the values are copied from
the input object to the table object, any change on the table object later are not reflected back into the input object. A method is provided in the following sections. However, horizontally partitioned tables do not always have
DB generated values and insertion can be separated from other operations, this
discrepancy issue can be avoided.
The
IQueryable<TEntity>()
and ITable()
interfaces have not been implemented fully for the project. Implementing others should be trivial as the essence of the method is already presented.
The main implementation is shown below:
public static ATable<TEntity> GetTable<TEntity>(this DataContext context, string name)
where TEntity : class
{
Type type = DefineEntityType(typeof(TEntity), name);
ITable refer = context.GetTable(type);
return new ATable<TEntity>(refer, name);
}
public class ATable<TEntity> : IQueryable<TEntity>, ITable
where TEntity : class
{
private IQueryable _equivalent;
private ITable _table;
public ATable(ITable inner, string name)
{
_table = inner;
MethodInfo select = GetGenericSelect();
MethodInfo invokable = select.MakeGenericMethod(_table.ElementType, typeof(TEntity));
ParameterExpression param = Expression.Parameter(_table.ElementType, "r");
Expression body = Expression.Convert(param, typeof(TEntity));
LambdaExpression lambda = Expression.Lambda(body, param);
_equivalent = (IQueryable)invokable.Invoke(null, new object[] { _table, lambda });
}
private static MethodInfo GetGenericSelect()
{
foreach (var method in typeof(Queryable)
.GetMethods(BindingFlags.Static | BindingFlags.Public))
{
if ((method.Name == "Select") &&
(method.GetParameters()[1].ParameterType.GetGenericArguments()[0]
.GetGenericArguments().Length == 2))
return method;
}
throw new Exception();
}
#region IQueryable interface implementation (all from equivalent)
public IEnumerator<TEntity> GetEnumerator()
{
return (IEnumerator<TEntity>)_equivalent.GetEnumerator();
}
IEnumerator IEnumerable.GetEnumerator()
{
return _equivalent.GetEnumerator();
}
public Type ElementType
{
get { return _equivalent.ElementType; }
}
public Expression Expression
{
get { return _equivalent.Expression; }
}
public IQueryProvider Provider
{
get { return _equivalent.Provider; }
}
#endregion
#region ITable interface implementation
public void InsertOnSubmit(object entity)
{
if ((entity == null) || (!typeof(TEntity).IsAssignableFrom(entity.GetType())))
return;
TEntity instance = (TEntity)Activator.CreateInstance(_table.ElementType);
foreach (var prop in typeof(TEntity).GetProperties())
prop.SetValue(instance, prop.GetValue(entity, null), null);
_table.InsertOnSubmit(instance);
if (entity is Wrapper<TEntity>)
((Wrapper<TEntity>)entity).Inner = instance;
}
public void DeleteOnSubmit(object entity)
{
_table.DeleteOnSubmit(entity);
}
#endregion
Approach 2, 4.0+
Not sure how to name this solution. In short, the solution is to use
a wrapper ITable<TEntity>()
class that wraps around an
ITable()
object. In fact, the inner ITable()
object is actually ITable<DynamicClass>()
which holds the table name. The dynamic class is a derived class of TEntity()
with no extra properties and fields. So the dynamic class can be treated as
TEntity()
,
which will satisfy the .NET runtime check when results are returned back from
the database. On the other hand, dynamic class instances can turn into TEntity()
without losing any important information.
It is an accidental found, since
in a ITable<TEntity>()
implementation, TEntity()
is
referenced by Intelli-sense and used by the compiler, while LINQ-to-SQL uses
property Expression and Provider. The decoupling of such information allows
this trick to play.
Two blocks of implementation are
of more importance than others in this solution. One being the creation of a
dynamic class with TableAttribute()
and ColumnAttribute()
attached. The other one is an executable delegate to clone a dynamic class
instance from a TEntity()
instance, which is required only when
insertion is performed. The implementation of both code blocks contains no
tricky things and is easy to understand in the attached code.
The data object synchronization problem happens in this approach too. Please check the following code to see a solution to this.
Following is the implementation of this table wrapper:
public static ATable<TEntity> GetTable<TEntity>(this DataContext context, string name)
where TEntity : class
{
var type = DefineEntityType(typeof(TEntity), name);
var refer = context.GetTable(type);
var cloneFrom = CompileCloning(typeof(TEntity), type);
return new ATable<TEntity>(refer, cloneFrom);
}
public class ATable<TEntity> : ITable<TEntity> where TEntity : class
{
private readonly ITable _internal;
private readonly Delegate _clone;
public ATable(ITable inter, Delegate from)
{
_internal = inter;
_clone = from;
}
public void Attach(TEntity entity)
{
throw new NotImplementedException();
}
public void DeleteOnSubmit(TEntity entity)
{
_internal.DeleteOnSubmit(entity);
}
public void InsertOnSubmit(TEntity entity)
{
var v = _clone.DynamicInvoke(entity);
_internal.InsertOnSubmit(v);
}
public IEnumerator<TEntity> GetEnumerator()
{
throw new NotImplementedException();
}
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
throw new NotImplementedException();
}
public Type ElementType { get { return _internal.ElementType; } }
public System.Linq.Expressions.Expression Expression { get { return _internal.Expression; } }
public IQueryProvider Provider { get { return _internal.Provider; } }
}
It is not required to implement all ITable<TEntity>()
interface methods for
an ordinary CRUD operation.
How to use with common structure defined as interface
The common structure of such tables can be defined in the interface,
with directly attached ColumnAttribute()
. The dynamic class will have copies
of all those ColumnAttribute()
s attached to the corresponding properties. In
order to perform an insertion, a class implementing the interface, with ColumnAttribute()
attached, is needed.
Here is the example using the interface:
public interface IResult
{
[Column(IsPrimaryKey = true)]
int Id { get; set; }
[Column]
string Name { get; set; }
[Column]
double Value { get; set; }
}
public class ResultImp : IResult
{
public int Id { get; set; }
public string Name { get; set; }
public double Value { get; set; }
}
var context = new DataContext(SQLTest);
var table = context.GetTable<iresult>("result2012");
var query = from r in table where r.Id == 108 select r;
var list = query.ToList();
table.InsertOnSubmit(
new ResultImp { Id = NewId, Name = "Newly added", Value = 230.4595 });
context.SubmitChanges();</iresult>
How to use with common structure defined as
class
The common structure of such tables
can also be defined in a class, with virtual properties. However, ColumnAttribute()
can not be used on properties in the 4.0+ approach, since it will confuse the LINQ-to-SQL runtime as
the dynamically derived class will have another property with the same
property name and column name defined. Therefore, a clone of ColumnAttribute()
is used to carry the mapping information. The 3.5+ approach can still use ColumnAttribute().
Here is an example using the base class in 4.0+. A 3.5+ base class differs only
in the column attribute.
public class AResult
{
[AlterColumn(IsPrimaryKey = true)]
public virtual int Id { get; set; }
[AlterColumn]
public virtual string Name { get; set; }
[AlterColumn]
public virtual double Value { get; set; }
}
var context = new DataContext(SQLTest);
var table = context.GetTable<AResult>("result2012");
var query = from r in table where r.Id == 108 select r;
var list = query.ToList();
table.InsertOnSubmit(
new AResult { Id = NewId, Name = "Newly added", Value = 230.4595 });
context.SubmitChanges();
How to synchronize the inserted object
This is done through the use of
a thin wrapper object for the base data object. The base data object, which is being instantiated for
insertion, uses its own fields to store values before insertion happens. After it is passed into
ATable<TEntity>InsertOnSubmit()
, the actual table object is attached. All reading or writing of virtual properties are synchronized onto
the underlying table object then.
Following is what a synchronized base object would look like:
public class VResult : Utility.Wrapper<VResult>
{
private int _id;
private string _name;
private double _value;
[Column(IsPrimaryKey = true, IsDbGenerated = true)]
public virtual int Id
{
get { return Inner == null ? _id : Inner.Id; }
set { if (Inner == null) _id = value; else Inner.Id = value; }
}
[Column]
public virtual string Name
{
get { return Inner == null ? _name : Inner.Name; }
set { if (Inner == null) _name = value; else Inner.Name = value; }
}
[Column]
public virtual double Value
{
get { return Inner == null ? _value : Inner.Value; }
set { if (Inner == null) _value = value; else Inner.Value = value; }
}
}
The synchronization happens on the following lines in ATable<TEntity>.InsertOnSubmit()
:
if (entity is Wrapper<TEntity>)
((Wrapper<TEntity>)entity).Inner = instance;
Running the sample
The 3.5+ sample is written in Visual Studio 2008 with .NET 3.5. It works on .NET 4 and .NET 4.5 (tested in Visual Studio 2011 Beta). Simply changing the configuration could make it work under a new framework.
The 4.0+ sample is written in Visual Studio 2010, with .NET 4. It does not work with .NET 3.5 even
if you reconfig the project,
because LINQ-to-SQL in .NET 4 has different class structures.
In order to run all the tests, please make sure you have SQL Express installed and there is a database named "test" on it. Run the test cases in "Preparation" first to prepare the data.
Extra
thinking
There is one solution, if provided, can greatly reduce the
complexity introduced in this article. The inheritance of generic classes. It
is understandable that List<Base>()
can be assigned from
List<Derived>()
since all elements inside the children list do have all traits of a parent
object, whatever performed on List<Base>()
is applicable to
List<Derived>()
.
The same reasoning applies for ITable<Base>()
and ITable<Derived>()
.
If that is so, there is even no need for this article and the above-mentioned
method. Retrieve a table using the derived classes and cast them into the table
of the parent class, and done. Unfortunately in .NET 4.0, those classes are still
treated as totally different classes. Similarly, delegates consuming child
classes carry no relationship with delegates consuming parent classes. Hope
the .NET team can address this issue some time soon.
Finally
Thanks for reading. All comments and questions are welcomed.