Summary
In this article, I'll show the working prototype of the CLR extensions that provide an infrastructure for enforcing database-like data integrity constraints, such as Entity integrity, Domain integrity, Referential integrity, and User-defined integrity.
The approach I describe is based on my previous article where I've introduced a set of new metadata tables - so-called Metamodel Tables (Alex Mikunov, "The Implementation of Model Constraints in .NET). The technical part of the article also makes use of the various techniques I've described in the upcoming MSDN article (September issue, 2003): ".NET internals. Rewrite MSIL code on the fly with the .NET Profiling API" (http://msdn.microsoft.com/msdnmag/issues/03/09/default.aspx)
Introduction
In the previous part we've mostly concentrated on the theoretical foundations of the NET Metadata extensions. In a nutshell, we've introduced a set of new metadata tables (called Metamodel Tables), which extend the existing CLR metadata. These tables are populated by querying associated meta models and constraint definitions (business rules) and describe various types of database-like constraints/rules such as field/method level constraints (FieldConstraint and MethodConstraint tables), referential integrity constraints (TypeConstraint table) and so on.
Note that the proposed approach doesn't specify any particular format of the constraints description. It can be UML/OCL or an XML format. The only requirement is that these constraints have to be mapped properly to the metadata tables and MSIL (Microsoft Intermediate Language).
First, let me briefly describe the basic ideas from the previous article and show how it all works in the case of the method constraints. (I also assume that the reader of this article is already familiar with the basic concepts of the CLR, such as Metadata, MSIL, JIT compilation, .NET profiling. You should look into these a bit before you continue with the article although I will briefly cover some of the CLR basics here.)
Consider the following simple class C:
using System;
...
namespace SomeApplication
{
public class C
{
public int foo( int nNumber )
{
...
return nSomeValue
}
}
...
}
Let the method C::foo()
have one precondition for the input parameter in a form "0 < nNumber < 56
" and one postcondition for the return value: "nSomeValue > 4
". Those of you who are familiar with Object Constraint Language can describe it like this:
-- OCL code
C::foo(nNumber : int): int
pre : (nNumber > 0) and (nNumber < 56)
post: result > 4
Assuming that the method foo()
is coded as a metadata token 0x06000002 (i.e. stored in the 2nd row of the Method table) and tokens of the form 0x89XXXXXX are used to represent method constrains (which are stored in the MethodConstraint table) we would have the following metadata layout (Note that we've also added a new column to the Method table to point to the MethodConstraint table):
Figure 1. Layout of the Method and MethodConstraints tables for the foo method
That is, foo
's constraints are coded by the metadata tokens 0x89000001 and 0x89000002, respectively, and each row has a proper value in the Relevant Virtual Address (RVA) column which points to the actual IL implementation within the image file.
A general look of the Method-MethodConstraints relationship is shown here:
Figure 2. Method and MethodConstraints relationships
During JIT compilation the runtime encounters the C::foo
's metadata token (0x06000002) and uses this token to consult the second row of the Method table. After that it realizes that this row has an index into the MethodConstraint table. The runtime examines related records in MethodConstraint and uses them to get the RVAs of related MSIL implementation of pre- or postconditions and to add the corresponding IL to the method's body before it gets JIT compiled.
In other words, if the original method has the following IL:
C::foo(...)
{
method body
}
the CLR will add pre- and postconditions to the method�s implementation as follows:
C::foo(...)
{
method body
}
A generalization of this technique could use a generic function that gets called on the enter-function and on the exit-function events. Assume that the runtime has an internal class ConstraintsChecker
, which has a method CheckMethod
:
HRESULT ConstraintsChecker::CheckMethod ( ..., mdMethodDef md,
CorConstraintType ConstraintType, ... )
{
...
if ( ConstraintType & ctPreCondition )
{
if ( !(preconditions) )
else
}
if ( ConstraintType & ctPostCondition )
{
}
if ( ConstraintType & ctInvariant )
{
}
...
}
where the CorConstraintType
flags are described as follows:
typedef enum _CorConstraintType
{
ctNone = 0x0000,
ctPreCondition = 0x0001,
ctPostCondition = 0x0002,
ctInvariant = 0x0004,
...
} CorConstraintType
So, the resulting code will look like this:
C::foo(...)
{
HRESULT hr = ConstraintsChecker.CheckMethod ( ..., 0x06000002,
ctPreCondition | ctInvariant );
method body
hr = ConstraintsChecker.CheckMethod ( ..., 0x06000002,
ctPostCondition | ctInvariant );
}
The approach we've just described requires quite a bit of changes in the existing CLR architecture.
First of all we have to modify metadata tables (add a new column to the Method table) and to add the new ones (MethodConstraint). It also requires changes in the Metadata API to allow compilers/design tools to emit additional metadata/constraints definitions. Secondly we have to change the execution engine and the CLR assembly/class loaders ("fusion".) We should also take care of the compatibility issues with the current version of .NET.
It would be nice to find an intermediate approach that doesn't require many of the previously mentioned changes.
In this article I'll describe a simple approach that is based on the .NET Profiling API and runtime MSIL code rewriting and allows us to avoid any changes in the existing CLR.
I call this technique �.NET metadata extensions� or just �.NET extensions�.
The basic idea of this approach can be outlined as follows.
When the CLR loads a class and executes its method, the method's IL code is compiled to native instructions during the just-in-time (JIT) compilation process. The Profiling API provided as part of the CLR allows us to intercept this process. Before a method gets JIT-compiled we can modify its IL code. In the simplest scenario we can insert our customized prolog and epilog into the method's IL and give the resulting IL back to the JIT compiler. Depending on the application logic the newly generated IL could do some additional work before and after the original method's code is called.
In our case, these prologs and epilogs (emitted by our profiler) are simply calls to the special managed DLL - CCCore.dll (I call it ".NET extension DLL"). In other words, for a given .NET module and its method (let�s say C::foo
) the profiler instruments method�s IL by inserting some IL prolog which calls a special method implemented by CCCore.dll:
public static int CCC::__CheckMethodDefOnEnter(
int mdMethodDefToken, __arglist )
{
...
}
The first parameter is the method�s metadata token (C::foo
�s token), the second parameter is a collection of the actual method�s parameters (at the moment of the call).
The __CheckMethodDefOnEnter
does the parameter validation based on a special descriptor file, which, in fact, is an XML encoded representation of the MethodConstraint table (Consistency Checker Descriptor file - CCD file.)
The profiler also inserts an epilog that calls another method implemented by CCCore.dll:
public static int CCC::__CheckMethodDefOnExit( __arglist )
{
...
}
So, the overall picture looks like this.
Before compilation to native code (class C, method foo) we have
C::foo(...)
{
method body
}
The profiler makes the following changes:
C::foo(...)
{
call [CCCore]CCC::__CheckMethodDefOnEnter( foo�s token, params )
method body
call [CCCore]CCC::__CheckMethodDefOnExit( foo�s token, return value )
}
As you can see all the validation logic is moved to the NET extension dll (CCCore.dll), which does the actual job by analyzing method's parameters and the corresponding CCD file (XML-encoded MethodConstraint table). See Figure 3 for details.
Figure 3. Runtime IL code instrumentation and .NET extension
The major advantage of this approach is that our CLR/Rotor extensions are outside the runtime code. Every time we make changes in the code we don�t have to rebuild the "Rotor" source code. After we�re done with our changes we can merge our code and the CLR � the profiler code will become a part of the runtime engine. The CCCore dll can be merged into mscorlib or can be a separate library. The XML encoded metadata tables will become CLR metadata.
Implementation Details
First of all, the approach I propose preservers the identity of the classes. Unlike many other techniques that use custom attributes, proxy assemblies, remote proxies (context bound objects), etc. we make our changes at runtime, only! No changes in the original source code whatsoever. So, it's all absolutely transparent to the client.
Here's some picture that explains how I implement the IL rewriting (method instrumentation) and add a prolog and an epilog to the method.
A given method foo
having N (< 255) parameters + the "this":
ReturnType C::foo ( C* this , type1 param1, type2 param2,
type3 param3, ..., typeN paramN )
{
IL method body
}
will be rewritten by the profiler like this:
ReturnType C::foo ( C* this , type1 param1, type2 param2,
type3 param3, ..., typeN paramN )
{
ldc.i4 tkMethodDef
ldarg 0
ldarg 1
ldarg 2
...
call vararg int32 CCCore.CCC.__CheckMethodDefOnEnter( tkMethodDef, __arglist )
pop
orig method body with replaced "ret" opcode goes here
dup
ldc.i4 tkMethodDef
call vararg int32 CCCore.CCC.__CheckMethodDefOnExit( __arglist )
pop
ret
}
Secondly, a module that gets "instrumented" doesn't have to be linked against the CCCore.dll DLL (".NET extension" dll). Look at the CC.IL module provided as an example in \Barracuda2\CChecker\IL folder to see that it doesn't refer to CCCore.dll. So, we "dynamically link" this dll at runtime.
Finally, the method's parameters at the moment of the call can be XML-serialized. Thus, instead of CCD-like files (Consistency Checker Descriptors) we could use XML schemas/XPath expressions (=XPath assertions/rules, see SchemaTron assertion language for an example) to validate the input/output:
public static int CCC:: __CheckMethodDefOnEnter ( int mdMethodDefToken,
__arglist )
{
...
}
In any case it's a move toward a more standardized way of validation, which may also imply the creation of an infrastructure similar to the SoapExtension framework provided by the ASP.NET Web Services.
Example
The attached zip file includes two folders CChecker and CCCore. The first one contains the binary file which is a .NET profiler DLL (CChecker.DLL). The second folder contains a C# project implementing the .NET extension called CCCore.dll.
The CC.IL example module is provided in \Barracuda2\CChecker\IL folder. To see how it all works together follow those steps:
- Open MS-DOS Command prompt and change the current folder to \Barracuda2\CChecker\IL
- Run \Barracuda2\CChecker\IL\cc_on.bat to initiate profiler (make sure the profiler path is
valid)
- Launch cc.exe (this test is written in IL)
It'll show some output displaying various information about the method parameters and their validity.
To turn off the .NET extension and to see the difference just run cc_off.bat.
CCCore.dll uses the CC.exe.CCD.config file (Consistency Checker Descriptor file) to validate CC's methods. The Consistency Checker Descriptor file format is self-explanatory. We use the following XPath expression "/ccdescriptor/methods/method/@token
" to get all the method tokens in the descriptor file. To get a method's constraints we use "/ccdescriptor/methods/method[@token='sometokenvalue']/parameters/parameter
".
History
25 Aug 2003 - updated source download