(untagged)

The Implementation of Model Constraints in .NET - II

Alex Mikunov

0.00/5 (No votes)

24 Aug 2003

Runtime MSIL Code Instrumentation and .NET Metadata Extensions

Download source files - 159 Kb

Summary

In this article, I'll show the working prototype of the CLR extensions that provide an infrastructure for enforcing database-like data integrity constraints, such as Entity integrity, Domain integrity, Referential integrity, and User-defined integrity.

The approach I describe is based on my previous article where I've introduced a set of new metadata tables - so-called Metamodel Tables (Alex Mikunov, "The Implementation of Model Constraints in .NET). The technical part of the article also makes use of the various techniques I've described in the upcoming MSDN article (September issue, 2003): ".NET internals. Rewrite MSIL code on the fly with the .NET Profiling API" (http://msdn.microsoft.com/msdnmag/issues/03/09/default.aspx)

Introduction

In the previous part we've mostly concentrated on the theoretical foundations of the NET Metadata extensions. In a nutshell, we've introduced a set of new metadata tables (called Metamodel Tables), which extend the existing CLR metadata. These tables are populated by querying associated meta models and constraint definitions (business rules) and describe various types of database-like constraints/rules such as field/method level constraints (FieldConstraint and MethodConstraint tables), referential integrity constraints (TypeConstraint table) and so on.

Note that the proposed approach doesn't specify any particular format of the constraints description. It can be UML/OCL or an XML format. The only requirement is that these constraints have to be mapped properly to the metadata tables and MSIL (Microsoft Intermediate Language).

First, let me briefly describe the basic ideas from the previous article and show how it all works in the case of the method constraints. (I also assume that the reader of this article is already familiar with the basic concepts of the CLR, such as Metadata, MSIL, JIT compilation, .NET profiling. You should look into these a bit before you continue with the article although I will briefly cover some of the CLR basics here.)

Consider the following simple class C:

// C# code

using System;
...
namespace SomeApplication
{
    public class C
    {
      public int foo( int nNumber )
      {
         // code goes here

         ...
          
         return nSomeValue
       
      } // foo()

    } // class C

    ...
}

Let the method C::foo() have one precondition for the input parameter in a form "0 < nNumber < 56" and one postcondition for the return value: "nSomeValue > 4". Those of you who are familiar with Object Constraint Language can describe it like this:

-- OCL code
C::foo(nNumber : int): int
pre : (nNumber > 0) and (nNumber < 56)
post: result > 4

Assuming that the method foo() is coded as a metadata token 0x06000002 (i.e. stored in the 2nd row of the Method table) and tokens of the form 0x89XXXXXX are used to represent method constrains (which are stored in the MethodConstraint table) we would have the following metadata layout (Note that we've also added a new column to the Method table to point to the MethodConstraint table):

Figure 1. Layout of Method and MethodConstraints tables for the foo method

Figure 1. Layout of the Method and MethodConstraints tables for the foo method

That is, foo's constraints are coded by the metadata tokens 0x89000001 and 0x89000002, respectively, and each row has a proper value in the Relevant Virtual Address (RVA) column which points to the actual IL implementation within the image file.

A general look of the Method-MethodConstraints relationship is shown here:

Figure 2. Method and MethodConstraints relationships

During JIT compilation the runtime encounters the C::foo's metadata token (0x06000002) and uses this token to consult the second row of the Method table. After that it realizes that this row has an index into the MethodConstraint table. The runtime examines related records in MethodConstraint and uses them to get the RVAs of related MSIL implementation of pre- or postconditions and to add the corresponding IL to the method's body before it gets JIT compiled.

In other words, if the original method has the following IL:

// MSIL code

C::foo(...) // before JIT compilation

{
 method body //MSIL

}

the CLR will add pre- and postconditions to the method�s implementation as follows:

// MSIL code

C::foo(...) // before JIT compilation

{
 // IL code for

 // if !(preconditions) throw an exception;

 
 method body // original IL code with replaced 'ret' opcodes 


 // IL code for 

 // if !(postconditions) throw an exception;

}

A generalization of this technique could use a generic function that gets called on the enter-function and on the exit-function events. Assume that the runtime has an internal class ConstraintsChecker, which has a method CheckMethod:

HRESULT ConstraintsChecker::CheckMethod ( ..., mdMethodDef md, 
                                     CorConstraintType ConstraintType, ...  )
{
// use Method and MethodConstraint tables

// to find constraints for a given mdMethodDef token and validate them

...
if ( ConstraintType & ctPreCondition )
{
 // check preconditions //{native code}

 // return result and/or throw an Exception

 // implementation could work like this

 if ( !(preconditions) ) 
   // set error code, and/or throw an exception;

 else
   // OK 

}

if ( ConstraintType & ctPostCondition )
{
 // check postconditions //{native code}

 // return result and/or throw an exception

}


if ( ConstraintType & ctInvariant )
{
 // check invariants //{native code}

 // return result and/or throw an exception

}
...

} // ConstraintsChecker::CheckMethod

where the CorConstraintType flags are described as follows:

typedef enum _CorConstraintType
{
    ctNone          = 0x0000,
    ctPreCondition  = 0x0001,
    ctPostCondition = 0x0002,
    ctInvariant     = 0x0004,
...
} CorConstraintType

So, the resulting code will look like this:

//class C, Method foo (after JIT compilation to native code)

C::foo(...)
{

// call CLR's ConstraintsChecker class for method C::foo

// to check preconditions

HRESULT hr = ConstraintsChecker.CheckMethod ( ..., 0x06000002, 
                                           ctPreCondition | ctInvariant );

 method body //{native code}


// call CLR's ConstraintsChecker class for method C::foo

// to check postconditions

hr = ConstraintsChecker.CheckMethod ( ..., 0x06000002, 
                                      ctPostCondition | ctInvariant );

} // C::foo

The approach we've just described requires quite a bit of changes in the existing CLR architecture.

First of all we have to modify metadata tables (add a new column to the Method table) and to add the new ones (MethodConstraint). It also requires changes in the Metadata API to allow compilers/design tools to emit additional metadata/constraints definitions. Secondly we have to change the execution engine and the CLR assembly/class loaders ("fusion".) We should also take care of the compatibility issues with the current version of .NET.

It would be nice to find an intermediate approach that doesn't require many of the previously mentioned changes.

In this article I'll describe a simple approach that is based on the .NET Profiling API and runtime MSIL code rewriting and allows us to avoid any changes in the existing CLR.

I call this technique �.NET metadata extensions� or just �.NET extensions�.

The basic idea of this approach can be outlined as follows.

When the CLR loads a class and executes its method, the method's IL code is compiled to native instructions during the just-in-time (JIT) compilation process. The Profiling API provided as part of the CLR allows us to intercept this process. Before a method gets JIT-compiled we can modify its IL code. In the simplest scenario we can insert our customized prolog and epilog into the method's IL and give the resulting IL back to the JIT compiler. Depending on the application logic the newly generated IL could do some additional work before and after the original method's code is called.

In our case, these prologs and epilogs (emitted by our profiler) are simply calls to the special managed DLL - CCCore.dll (I call it ".NET extension DLL"). In other words, for a given .NET module and its method (let�s say C::foo) the profiler instruments method�s IL by inserting some IL prolog which calls a special method implemented by CCCore.dll:

public static int CCC::__CheckMethodDefOnEnter( 
                                             int mdMethodDefToken, __arglist )
{
// first parameter is a method�s metadata token

// second parameter is a collection of the actual method�s

// parameters (at the moment of the call).


// checks method�s parameters based

// on XML-encoded MethodConstraint table

...
// and returns result

}

The first parameter is the method�s metadata token (C::foo�s token), the second parameter is a collection of the actual method�s parameters (at the moment of the call).

The __CheckMethodDefOnEnter does the parameter validation based on a special descriptor file, which, in fact, is an XML encoded representation of the MethodConstraint table (Consistency Checker Descriptor file - CCD file.)

The profiler also inserts an epilog that calls another method implemented by CCCore.dll:

public static int CCC::__CheckMethodDefOnExit( __arglist )
{
// __arglist should have two parameters:

// 1) the orig method's return value

// 2) method token


// checks return value based

// on XML-encoded MethodConstraint table

    ...
// and returns result

}

So, the overall picture looks like this.

Before compilation to native code (class C, method foo) we have

C::foo(...) // ( before JIT compilation)

{
 method body //MSIL

}

The profiler makes the following changes:

C::foo(...) //(before JIT compilation with profiler�s changes)

{
 // to check method�s preconditions/invariants

 call [CCCore]CCC::__CheckMethodDefOnEnter( foo�s token, params )

 method body // orig IL code with replaced 'ret'


 // to check postconditions/invariants


 call [CCCore]CCC::__CheckMethodDefOnExit( foo�s token, return value )

}

As you can see all the validation logic is moved to the NET extension dll (CCCore.dll), which does the actual job by analyzing method's parameters and the corresponding CCD file (XML-encoded MethodConstraint table). See Figure 3 for details.

Figure 3. Runtime IL code instrumentation and .NET extension

The major advantage of this approach is that our CLR/Rotor extensions are outside the runtime code. Every time we make changes in the code we don�t have to rebuild the "Rotor" source code. After we�re done with our changes we can merge our code and the CLR � the profiler code will become a part of the runtime engine. The CCCore dll can be merged into mscorlib or can be a separate library. The XML encoded metadata tables will become CLR metadata.

Implementation Details

First of all, the approach I propose preservers the identity of the classes. Unlike many other techniques that use custom attributes, proxy assemblies, remote proxies (context bound objects), etc. we make our changes at runtime, only! No changes in the original source code whatsoever. So, it's all absolutely transparent to the client.

Here's some picture that explains how I implement the IL rewriting (method instrumentation) and add a prolog and an epilog to the method.

A given method foo having N (< 255) parameters + the "this":

ReturnType C::foo ( C* this /*invisible param*/, type1 param1, type2 param2, 
                    type3 param3, ..., typeN paramN )
{
     IL method body
}

will be rewritten by the profiler like this:

ReturnType C::foo ( C* this /*invisible param*/, type1 param1, type2 param2,
                    type3 param3, ..., typeN paramN )
{
// prolog >> 

// Arguments of the method are loaded on the stack in

// order of their appearance

// in the method signature, with the last signature param

// being loaded last.

// So for instance methods the "this" is always the first argument:

//    ----------

//   | paramN   |

//   | ...      |

//   | param3   |

//   | param2   |

//   | param1   |

//   | this     | <-- for instance methods, goes first ( slot 0 )

//      ----------

 ldc.i4    tkMethodDef    // load C::foo's token


 ldarg 0    // load param0 on the stack ( _param0 )

 ldarg 1    // load param1 on the stack ( _param1 )

 ldarg 2    // load param2 on the stack ( _param2 )

 ...

// analyze params by calling CChecker

 call vararg int32 CCCore.CCC.__CheckMethodDefOnEnter( tkMethodDef, __arglist )
 pop        // remove CCCore.CCC.Check's result

// prolog <<


 orig method body with replaced "ret" opcode goes here

// epilog >>

 dup    // to copy method's ret value and avoid adding

     // new local vars!!!

 ldc.i4    tkMethodDef    // load method's token


// analyze params by calling CChecker

 call vararg int32 CCCore.CCC.__CheckMethodDefOnExit( __arglist )
// __arglist should have the orig method's return value (goes first!!!)

// + method token


 pop        // remove CCCore.CCC.Check's result

 ret        // retun method's result

// epilog <<

}

Secondly, a module that gets "instrumented" doesn't have to be linked against the CCCore.dll DLL (".NET extension" dll). Look at the CC.IL module provided as an example in \Barracuda2\CChecker\IL folder to see that it doesn't refer to CCCore.dll. So, we "dynamically link" this dll at runtime.

Finally, the method's parameters at the moment of the call can be XML-serialized. Thus, instead of CCD-like files (Consistency Checker Descriptors) we could use XML schemas/XPath expressions (=XPath assertions/rules, see SchemaTron assertion language for an example) to validate the input/output:

public static int CCC:: __CheckMethodDefOnEnter ( int mdMethodDefToken,
                                                  __arglist )
{
// first parameter is a method�s metadata token

// second parameter is a collection of the actual method�s

// parameters (at the moment of the call).


// serialize input as an XML (e.g. SOAP)

// and validate it against a schema file or a set of XPath expression

...
// and returns result

}

In any case it's a move toward a more standardized way of validation, which may also imply the creation of an infrastructure similar to the SoapExtension framework provided by the ASP.NET Web Services.

Example

The attached zip file includes two folders CChecker and CCCore. The first one contains the binary file which is a .NET profiler DLL (CChecker.DLL). The second folder contains a C# project implementing the .NET extension called CCCore.dll.

The CC.IL example module is provided in \Barracuda2\CChecker\IL folder. To see how it all works together follow those steps:

Open MS-DOS Command prompt and change the current folder to \Barracuda2\CChecker\IL
Run \Barracuda2\CChecker\IL\cc_on.bat to initiate profiler (make sure the profiler path is
valid)
Launch cc.exe (this test is written in IL)

It'll show some output displaying various information about the method parameters and their validity.

To turn off the .NET extension and to see the difference just run cc_off.bat.

CCCore.dll uses the CC.exe.CCD.config file (Consistency Checker Descriptor file) to validate CC's methods. The Consistency Checker Descriptor file format is self-explanatory. We use the following XPath expression "/ccdescriptor/methods/method/@token" to get all the method tokens in the descriptor file. To get a method's constraints we use "/ccdescriptor/methods/method[@token='sometokenvalue']/parameters/parameter".

History

25 Aug 2003 - updated source download

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here