Summary
The first part of the article describes the .NET metadata extensions for enforcing database-like data integrity constraints, such as Entity integrity, Domain integrity, Referential integrity, User-defined integrity. (See the second part for the working prototype and additional details.)
A set of new metadata tables - so-called Metamodel Tables is introduced. These tables are populated by querying associated meta models and constraint definitions expressed, say, in modeling/constraint languages such as Unified Modeling Language (UML)/Object Constraint Language (OCL), and Object Role Modeling (ORM)/Conceptual Query Language (ConQuer).
The Metadata API (Unmanaged and Reflection APIs) should be extended accordingly to allow compilers/design tools to emit additional metadata/constraints definitions. The runtime/class loader should also be updated in order to handle additional logic based on new metadata information.
Note that the proposed approach doesn�t require the use of UML/OCL or ORM/ConQuer (or any other language) to describe constraints. They can be expressed in any format (e.g. they can be XML encoded). The only requirement is that they have to be mapped properly to the metadata tables and MSIL (Microsoft Intermediate Language).
For background information see:
- .NET metadata tables layout. http://msdn.microsoft.com/net/ecma/, TG3, CLI Partition II section, PartitionIIMetadataOct01.doc file.
- OMG Unified Modeling Language Specification. Version 1.3, June 1997.
- Object Constraint Language Specification. Version 1.1, September 1997.
- Halpin, T.A. and Bloesch, A.C. 1999, �Data modeling in UML and ORM: a comparison� Journal of Database Management, vol. 10, no. 4, Idea group Publishing Company, Hershey, USA, pp. 4-13.
- Halpin, T.A. 2001, �Augmenting UML with Fact-orientation�. In workshop proceedings : UML: a critical evaluation and suggested future, HICCS-34 conference (Maui, January 2001).
- Halpin, T.A. 1999, �Entity Relationship modeling from an ORM perspective Part 1-3�.
- Halpin, T.A. 1999, �UML data models from an ORM perspective: Part 1-10�. Journal of Conceptual Modeling, InConcept, Minneapolis USA.
- Enterprise Java Beans Specification. Version 2.0, August, 2001.
- Bertrand Meyer: Object-oriented Software Construction. Prentice Hall PTR, Upper Saddle River (1997) 2 nd edition, 1260 S.
- Eiffel on the Web: Integrating Eiffel Systems into the Microsoft .NET Framework http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dndotnet/html/pdc_eiffel.asp
- Geoff Eldridge: Java and �Design by Contract� and the lack thereof... http:// www.elj.com/eiffel/feature/dbc/java/ge/
1. Introduction
One of the most significant features of the CLR is the language independent representation of important information about types and their members in form of metadata. This information provides the runtime a specification of the types and their members as well as global information about the application.
Unfortunately, more complex forms of metadata information, such as relationships between instances of classes, predefined/user defined constraints, etc. cannot be mapped to the existing metadata tables layout with no extra work.
In some simple cases, for example, we could use custom .NET attributes (derived from ContextAttribute) and .NET interceptors (for preprocessing and post-processing calls to the objects) - see http://msdn.microsoft.com/msdnmag/issues/02/03/AOP/AOP.asp for details. But, it's still a partial solution.
In this article we will consider how to extend the .NET Framework in order to represent additional meta data related to meta models and constraints.
The existing metadata format is described in [1]. UML 1.3 and OCL 1.1 are used as examples of modeling/constraint languages [2, 3]. Relationships between UML and other approaches, such as ORM and Entity Relationship (ER) modeling are discussed in a perfect form in [4, 5, 6, 7].
Let's start from the model constraints.
UML models (Class diagrams, Activity diagrams, ...) can have basic predefined constraints such as multiplicity constraints on associations (like *, 1..n, 0..1), subset, XOR, aggregation/composition, etc. UML also allows users to add constraints, which can be expressed in OCL [3]. A typical example is declaring pre- and post-conditions for a given method:
-- OCL code
ClassName::MethodName(param1 : Type1, ... ): ReturnType
pre : param1 > SomeValue ...
post: result = ...
Another simple example declares an invariant "a customer cannot be older 75":
-- OCL code
Customer
self.age <= 75
Usually, these conditions are implemented as assertions in a programming language. Moreover, for example, in Eiffel, pre-, post-conditions, and class invariants can be expressed in the programming language by itself. This corresponds with the Design by Contract notion of invariants [9].
For other languages targeting .NET the process of adding preconditions, postconditions, and class invariants to a class without rebuilding existing assemblies can be implemented as follows (note that a similar approach is used in the Contract Wizard tool, see [10] for details). Let's say we have an assembly A with a namespace A, which contains a class C, which has a method void foo():
...
.namespace A
...
[A]A.C::foo()
{
...
}
Then, the appropriate tool will generate a new proxy assembly A1 and will add something like this:
.assembly extern A
...
.namespace A1
...
[A1]A1.C::foo()
{
...
call [A]A.C::foo()
...
}
This solution has some obvious limitations, because we have to use a proxy assembly that implements the contracts and then call the non-contracted original component. Thus, the identity of the object is not preserved. It's also unclear how to implement constraints in more complex cases such as referential integrity constraints or composing specifications (joints or subtypes, e.g. precondition weakening and postcondition strengthening).
Another solution, implemented in iContract - like tools (see [11]), is to add pre- and postconditions as special code comments. This approach also has obvious disadvantages.
In contrast, our approach is similar to the way constraints are implemented in relational database management systems.
Let�s take a look at a typical database design and implementation process in the case of MS SQL Server.
An ER-model and constraints descriptions can be translated to a Transact-SQL script, actually a Data Definition Language (DDL) script and then, as a result of executing, stored in the database catalog system tables: sysobjects, syscolumns, sysreferences, sysindexes, syscomments, etc. Consider a pair of tables A and B (as part of the mentioned ER model) which are linked by the simplest relationship
Figure 1. Entity Relationship Diagram - A is a referencing table, B is a referenced table.
Assume that they also have several additional constraints. The resulting DDL script could look like this (for the sake of simplicity we're using Primary Key and Foreign Key constraints to implement referential integrity and won't consider triggers):
create table B
(
B_id int identity,
f1 int,
CONSTRAINT PK_B PRIMARY KEY (B_id),
CONSTRAINT f1_check CHECK (f1 > 1)
)
go
create table A
(
A_id int identity,
B_id int NOT NULL,
f2 int,
CONSTRAINT PK_A PRIMARY KEY (A_id),
CONSTRAINT FK_A_B FOREIGN KEY (B_id) REFERENCES B(B_id)
ON DELETE NO ACTION ON UPDATE NO ACTION,
CONSTRAINT f2_check CHECK (f2 > 0)
)
go
Upon running this script the SQL server will automatically populate system tables with required records.
Let�s say we also have a set of stored procedures, which do some standard things like add/update/delete operations on tables A and B.
Since we've declared a referential integrity constraint (with the NO ACTION option) the database server will ensure data integrity when we modify related records in both tables - we cannot delete a record in B if it has referencing records in A; since the server checks integrity without programming this is often called declarative referential integrity. Another constraint (B_id int NOT NULL) means that records from the table A cannot exist without corresponding records in the table B (a mandatory role).
The SQL server will also prevent the user from having wrong values in the columns f1 and f2 (less than 2 and 1, respectively). These constraints and validation rules are "implemented dynamically" by the server, which uses related meta information stored in system tables. We won't discuss details (which are complicated enough) of how the server does this work for us. Usually, CHECK constraints are used to enforce domain integrity, PRIMARY KEY and UNIQUE constraints enforce entity integrity, and FOREIGN KEY constraints provide referential integrity.
By looking at this approach we can notice several important details:
- constraints can be considered as part of the corresponding ER models;
- constraint definitions are stored in meta data tables and separated from stored procedures (in fact, the SQL Server stores the Transact-SQL creation script in the syscomments table for each view, rule, default, trigger, CHECK constraint, DEFAULT constraint, and stored procedure); for instance, the CHECK column constraint on column f1 will be stored in syscomments.text field as a SQL statement: ([f1] > 1) ;
- constraints implementation can be modified independently from stored procedures implementation and, by providing a proper design, modification of constraints does not affect implementation of stored procedures (or related Transact-SQL scripts).
Moreover, our ER model and corresponding constraints can be mapped to any other RDBMS that supports a similar metadada format (which is, basically, true for most of the database systems).
Thus, constraints are the features of the associated meta models and they are language independent. The CLR already provides the language independent representation of important information about types and their members in form of metadata tables (which, in some ways, look similar to the system tables). Therefore, we may expect that constraints definitions and their implementation can be effectively separated from the class implementation. We could also assume that the runtime is capable of supporting relationships between instances (equivalent to rows in tables) of related classes (equivalent to tables) in a database-like manner. Hence, we need to "explain" to the system what the constraints/business rules are and, then, the system will automatically enforce them.
To add constraint descriptions and their implementation to the existing infrastructure, we introduce new metadata tables:
- MethodConstraint - describes method constraints, used for pre- and postconditions;
- FieldConstraint - describes field constraints, intended for implementing invariants;
- TypeConstraint - describes class constraints, intended for implementing referential integrity.
These tables are called the Metamodel Tables. Their layout is described in sections 2.5. - 2.6.
The Common Language Runtime should also provide an extension of the Metadata API to allow compilers/design tools to emit (at design/compile time) constraint definitions and their implementation in PE files along with existing metadata and MSIL (�PE� stands for Portable Executable, the format used for executable (EXE) and dynamically linked library (DLL) files). See Figure 2 for details.
The Metadata API extension can be used in both ways - retrieving and emitting meta information from/to Metamodel Tables. The class loader will be able to retrieve the constraint definitions (along with other metadata information) during loading/JIT compilation. This API can also be used by applications at run-time. We should be able to update Metamodel Tables without rebuilding the entire assembly.
In some cases the CLR should also be able to dynamically generate an implementation of constraints based on their definitions. For example, in order to implement referential integrity constraints, constraints related to class inheritance (e.g. precondition weakening and postcondition strengthening) or prevent chains of aggregate links from having cycles.
There should be a tool which is capable of translating model constraints (expressed in UML/ORM, etc. ) either directly to .NET assemblies or to corresponding repository engine formats (such as Microsoft Meta Data Services models), so they can be later used by constraint emitters. This tool can be part of the appropriate modeling system (such as Rational Rose or Visio).
In the case of Microsoft Meta Data Services (former Microsoft Repository) information models can be compiled and stored in Repository databases using Microsoft Model Development Kit (MDK) model compiler with the appropriate extension. The model compiler will generate various preselected application files necessary to deploy the models using Meta Data Services. Since the information models are stored in Repository databases they can be retrieved and used via the Repository API. This API also supports XML encoding to provide import and export of stored meta data in XML format. In order to handle constraint definitions and their implementation the Repository API has to be extended accordingly.
At compile time the source code compiler (targeting CLR) consults the associated information model to place metadata and MSIL instructions associated to the class constraints in the Metamodel Tables as part of the PE file along with regular metadata information (such as type/class/fields/method/... definitions) and the generated MSIL. To consult the meta information compilers can be integrated with the appropriate tools. Notice that the proposed approach doesn�t require using of UML/OCL or ORM/ConQuer (or any other languages) to describe constraints. Business rules can be expressed in any format. The only requirement is that they have to be mapped properly to the metadata tables and set of MSIL.
The "side effect" of the proposed method is that the constraint related features will be available for all languages targeting .NET. The very important detail is that our methodology is entirely based on the existing CLR infrastructure. Constraints are mapped to the CLR metadata tables (with some additional extensions) and managed by the runtime as well.
Note that the Enterprise Java Beans specification 2.0 (EJB 2.0) also allows multiple entity beans to have container-managed relationships among themselves. The EJB container maintains the referential integrity of the entity bean relationships using an abstract persistence scheme (provided by the Bean Provider) for each entity bean, which defines its container-managed fields and relationships, and determines the methods for accessing them. The bean developer also defines the deployment descriptor, which specifies the relationships between entity beans. Unlike our approach, the container-managed referential integrity is a feature of the related EJB container.
2. Implementation
2.1. Invariants, Pre- and Postconditions.
Consider the following class C:
using System;
...
namespace SomeApplication
{
public class C
{
public int m_f;
public C()
{
...
}
public void foo()
{
...
}
public static int Main(string[] args)
{
...
return 0;
}
}
...
}
Assume that the method foo() is coded as a metadata token 0x06000002, i.e. stored in the 2nd row of Method table and the m_f field as 0x04000001 (1st row in the Field table). Assume that we use tokens of the form 0x89XXXXXX to represent method constrains which are stored in the MethodConstraint table (see 2.5. - 2.6. for tables layout). Let the method C::foo() have one precondition and one postcondition constraint that were, previously, emitted by a constraint compiler and stored as a set of IL instructions in 1st and 2nd rows of the MethodConstraint table. That is, related tokens are 0x89000001 and 0x89000002, respectively, and each row has a proper value in the Relevant Virtual Address (RVA) column which points to the actual IL implementation within the image file.
Obviously, the Method table also should have a column MethodConstraintList (index into MethodConstraint table). This relationship looks similar to the Method - Param link, which implemented via ParamList column (see PartitionIIMetadataOct01.doc for details):
During JIT compilation the runtime encounters a metadata token for the C::foo method (0x06000002) and uses this token to consult the second row of the Method table. After that it realizes that this row has an index into the MethodConstraint table. The runtime examines related records in MethodConstraint and uses them to get the RVA of related MSIL implementation of pre- or postconditions.
Here�s an example of how it could work:
- before compilation to native code (class C, method foo);
C::foo(...)
{
method body
}
after JIT compilation to native code, the CLR adds pre- and postconditions to the method�s implementation using their RVAs found in the Metamodel tables:
C::foo(...)
{
if ( !(preconditions && invariants) )
method body
if ( !(postconditions && invariants) )
}
A more generic way to implement this is as follows. Assume that the runtime has an internal class ConstraintsChecker, which has a method CheckMethod:
HRESULT ConstraintsChecker::CheckMethod ( ..., mdMethodDef md,
CorConstraintType ConstraintType, ... )
{
...
if ( ConstraintType & ctPreCondition )
{
if ( !(preconditions) )
else
}
if ( ConstraintType & ctPostCondition )
{
}
if ( ConstraintType & ctInvariant )
{
}
...
}
where mdMethodDef is defined in CorHdr.h and constraint types are described as follows:
typedef enum _CorConstraintType
{
ctNone = 0x0000,
ctPreCondition = 0x0001,
ctPostCondition = 0x0002,
ctInvariant = 0x0004,
...
} CorConstraintType
So, the resulting code could look like this:
C::foo(...)
{
HRESULT hr = ConstraintsChecker.CheckMethod ( ..., 0x06000002,
ctPreCondition | ctInvariant );
method body
HRESULT hr = ConstraintsChecker.CheckMethod ( ..., 0x06000002,
ctPostCondition | ctInvariant );
}
We assume that a method can have any number of constraints, and the condition can include multiple logical expressions combined with AND and OR. The resulting condition must evaluate to a Boolean expression.
An invariant of the type must be true for all instances of that type at any time. For the sake of simplicity we assume that invariants express consistency rules about required relationships between attributes within the same class. Therefore, for a given class, class invariants can be implemented as a set of field constraints.
In our considerations a field within a class (a data member) can have any number of constraints, and the condition can include multiple logical expressions combined with AND and OR. The resulting condition must evaluate to a Boolean expression and cannot reference another class. This is similar to the CHECK constraints, which are used to enforce domain integrity.
An invariant condition should evaluate to TRUE after any external invocation of a method on the object. To store these constraints we use the FieldConstraint table whose layout is similar to the MethodConstraint table and should have at least two columns. See section 2.5. for tables layout.
So, in the previous example, if the m_f field (the 1st row in the Field table) had a constraint (or a set of constraints), the field would index into the corresponding row(s) in the FieldConstraint table.
Since constraint definitions are separated from a class implementation we can change them independently. For instance, if we�ve modified a precondition for a given method foo() of the class C we do not have to update the method�s IL code. We only update the associated Metamodel Tables (and, possibly, constraint related IL code) which causes the runtime environment to add necessary constraints during JIT compilation.
2.2. Invariants, Pre- and Postconditions and subclassing.
In the case of inheritance we can use well-known rules (see [9]):
- Invariant Accumulation rule: the invariants of all the parents of a class apply to the class itself (logical "AND" of all invariants)
- Assertion redeclaration rules: the new precondition must be weaker or equal to the original precondition (logical "OR" of all preconditions), and the new postcondition must be stronger than or equal to the original postcondition (logical "AND" of all postconditions).
For a given class's method the CLR tracks down the extension chain based on TypeDef.Extends field to find all the parent constraints for this method. After that the runtime follows Invariant Accumulation and Assertion redeclaration rules to implement the proper verification of constraints.
For example, consider a class D which extends class B and overrides its virtual function foo(). Then metadata tables layout could look like this:
The runtime examines the B::foo's and D::foo's constraints using TypeDef, Method and MethodConstraint tables. Assuming that 0x06000001 and 0x06000004 are method tokens for B::foo and D::foo, respectively, it can work this way:
class B
{
...
virtual void B::foo(...)
{
...
method body
...
}
...
}
class D: public B
{
...
virtual void D::foo(...)
{
HRESULT hrPreB = ConstraintsChecker.CheckMethod ( ...,
0x06000001, ctPreCondition );
HRESULT hrPreD = ConstraintsChecker.CheckMethod ( ...,
0x06000004, ctPreCondition );
method body
HRESULT hrPostB = ConstraintsChecker.CheckMethod ( ...,
0x06000001, ctPostCondition );
HRESULT hrPostD = ConstraintsChecker.CheckMethod ( ...,
0x06000004, ctPostCondition );
}
...
}
2.3. Referential integrity constraints and cascading Referential integrity constraints.
In databases the referential integrity indicates that the relationships between tables have been properly maintained, that is data in one table should only point to existing rows in another table; it should not point to rows that do not exist. In MS SQL Server we can declare a referential integrity constraint using corresponding Primary Key and Foreign Key constraints and specifying ON DELETE { CASCADE | NO ACTION } or ON UPDATE { CASCADE | NO ACTION } options. CASCADE allows deletions or updates of key values to cascade through the tables defined to have foreign key relationships that can be traced back to the table on which the modification is performed.
In UML there are three types of associations (see [7], Part 8):
ordinary associations (no aggregation, with no diamond), shared or simple aggregation (with a hollow diamond), composite or strong aggregation (with a filled diamond).
For binary associations, there are four possible uniqueness constraint patters:
- 1:1 (one-to-one);
- 1:n (one-to-many);
- n:1 (many-to-one);
- m:n (many-to-many).
Four possible mandatory role patterns:
- only the left role is mandatory;
- only the right role is mandatory;
- both roles are mandatory;
- both roles are optional.
Two types of directions:
- unidirectional;
- bidirectional.
We assume that all associations are bidirectional (it's not a very significant restriction). Thus, there are 16 possible multiplicity combinations for binary association (considering only bidirectional links):
typedef enum _eRelationTypeEnum
{
rtNone = 0x0000,
rtOneToOneBidirectionalLeftMandatory = 0x0001,
rtOneToOneBidirectionalRightMandatory = 0x0002,
rtOneToOneBidirectionalBothMandatory = 0x0004,
rtOneToOneBidirectionalBothOptional = 0x0008,
rtOneToManyBidirectionalLeftMandatory = 0x0010,
rtOneToManyBidirectionalRightMandatory = 0x0020,
rtOneToManyBidirectionalBothMandatory = 0x0040,
rtOneToManyBidirectionalBothOptional = 0x0080,
rtManyToOneBidirectionalLeftMandatory = 0x0100,
rtManyToOneBidirectionalRightMandatory = 0x0200,
rtManyToOneBidirectionalBothMandatory = 0x0400,
rtManyToOneBidirectionalBothOptional = 0x0800,
...
} eRelationTypeEnum;
In our example we are going to model these associations using the Primary Key/Foreign Key - like attributes and Set/Get methods.
Constraint emitter:
- Populates Metamodel Tables:
- Specifies fields which represent PK and FK constraints for a given pair of classes (say, A and B)
- Specifies Set and Get methods to access these fields, optional, unlike Container-Managed Persistence (CMP) in Enterprise Java Beans specification 2.0.
- Sets cascading referential integrity options: ON DELETE { CASCADE | NO ACTION } or ON UPDATE { CASCADE | NO ACTION }
The runtime:
- Uses data members related to PK and FK attributes to handle constraints.
- Dynamically implements (in the absence of Setter and Getter methods) methods to access constraints-related fields.
- Provides implementation of cascading referential integrity constraints such as ON DELETE.
Let's start form a one-to-many bidirectional association, which is also a composite aggregation between classes A and B, Ra is optional, that is, each B has zero or more instances of A, each A belongs to exactly one B:
To illustrate some important details consider a C++ implementation which uses the STL library:
class A
{
private:
B* m_pB;
...
public:
...
void Set_B( const B* pB ) { m_pB = pB; }
B* Get_B() const { return m_pB; }
...
};
class B
{
private:
vector<A> m_vectorA;
public:
...
B()
{
for ( int i = 0; i < n; ++i )
{
A a;
a.Set_B( this );
m_vectorA.push_back( a );
}
}
void Set_A( const vector<A>& vectorA )
{
}
const vector<A>& Get_A() const
{
return m_vectorA;
}
static void ShowVectorA( const vector<A>& vectorA )
{
vector<A>::const_iterator iter = vectorA.begin();
while ( iter != vectorA.end() )
{
cout << "ShowVectorA " << (*iter).Get_B() << endl;
++iter;
}
}
};
Here's an example of use:
int main(...)
{
...
B* pB1 = new B();
B* pB2 = new B();
cout << "-- Show B1:vector<A> before ... " << endl;
B::ShowVectorA( pB1->Get_A() );
cout << "-- Show B2:vector<A> before ... " << endl;
B::ShowVectorA( pB2->Get_A() );
vector<A> vectorA1 = pB1->Get_A();
vector<A> vectorA2 = pB2->Get_A();
cout << "-- Set B1 ... " << endl;
pB1->Set_A( vectorA2 );
cout << "-- Set B2 ... " << endl;
pB2->Set_A( vectorA1 );
cout << "-- Show B1:vector<A> after ... " << endl;
B::ShowVectorA( pB1->Get_A() );
cout << "-- Show B2:vector<A> after ... " << endl;
B::ShowVectorA( pB2->Get_A() );
...
}
We added a comment to the B::Set_A method:
Otherwise, after exchanging vectors, they will point to a wrong value of B.
Basically, there are several ways to implement the B::Set_A method. One of the possible implementations could look like this:
void Set_A( const vector<A>& vectorA )
{
m_vectorA.clear();
vector<A>::const_iterator iterSrc = vectorA.begin();
while ( iterSrc != vectorA.end() )
{
m_vectorA.push_back( (*iterSrc) );
++iterSrc;
}
}
In this case the contents of the target collection are replaced with the contents of the source collection. We can also notice that, in general case, the A's copy constructor should implement deep-copy semantics.
What's going to happen with the source instance ( pB2 ) after its collection has been copied?
Let's first see how similar things work in the case of SQL Server. Consider two records in the table B whose IDs are b1 and b2 respectively, (see Figure 1) which are referenced by {a1, a2} and {a3, a4} records from the Table A. Assume that we need to "attach" records {a3, a4} to the "instance" b1. Basically, there are four possible scenarios:
Case 1: replace {a1, a2}, that is {a1, a2} are deleted and {a3, a4} set to point to the b1; b2 doesn't have any referencing records.
Case 2: merge {a1, a2} and {a3, a4}, that is, {a1, a2, a3, a4} point to the b1; b2 doesn't have any referencing records.
Case 3: replace {a1, a2} by a copy of {a3, a4}, that is, {copy of a3, copy of a4} point to the b1; {a3, a4} still point to b2.
Case 4: merge {a1, a2} and a copy of {a3, a4}, that is, {a1, a2, copy of a3, copy of a4} point to the b1; {a3, a4} still point to b2.
We can say that these cases define different set policies (since they are related to the Set method). We can see that, in general, the type of a referential constraint doesn't fully determine the set policy. So, we need to specify the policy type explicitly.
For example, in last two cases (Case 3 and Case 4) copying a3 and a4 may not work, because of the entity integrity constraints (PRIMARY KEY and UNIQUE constraints).
In the case of unmanaged code we could change implementation of the STL's vector class by providing additional template parameters, which specify the parent class (container) and the type of relationship (and, possible, some other info, say, which set policy to use):
template<class T, class A = allocator<T>, class Container,
eRelationTypeEnum RelationType, ... >
class vector
{
public:
typedef A allocator_type;
...
}
So, the class B now might look like this:
class B
{
private:
vector<A, B, rtOneToManyBidirectionalLeftMandatory, ...> m_vectorA;
...
}
Consider the Case 1 and let both roles be mandatory, that is, the field A.B_id is not null and any record from B should be referenced by at least one record (in the real world that could be implemented by triggers). To assign b2's records to b1 we could do something like this:
SELECT * INTO #deleted_b1 FROM A WHERE B_id = b1
UPDATE A SET B_id = b1 WHERE B_id = b2
DELETE B WHERE B_id = b2
DELETE A WHERE A_id IN ( SELECT A_id FROM #deleted_b1 )
DROP TABLE #deleted_b1
So, in our previous example it could be written like this:
pB1->Set_A( pB2->Get_A() );
Before this call the following assertions are true:
assert ( {a1, a2} == pB1->Get_A() );
assert ( {a3, a4} == pB2->Get_A() );
After this call the following assertions are true:
assert ( {a3, a4} == pB1->Get_A() );
assert ( null == pB2 );
Likewise, in CLR the pB2 instance becomes eligible for garbage collection. If there are other references to pB2, then pB1->Set_A( pB2->Get_A() ) call should fail. Possible situations can be described as follows:
We can represent it like this:
typedef enum _CorSetPolicyTypeEnum
{
spNone = 0x0000,
spReplaceDelete = 0x0001,
spMergeDelete = 0x0002,
spReplaceCopy = 0x0004,
spMergeCopy = 0x0008,
...
} CorSetPolicyTypeEnum
In CLR we should have appropriate meta model tables populated and each class should have fields which implement PK/FK relationships (referenced/referencing fields).
For the sake of simplicity let's assume that in the case of one-to-many relationship a referenced class (class B) holds referencing instances (class A) using a collection type which implements either ICloneable, ICollection, IEnumerable, and IList interfaces or ICloneable, ICollection, IEnumerable and IDictionary. That is, it's implemented via ArrayList and Hashtable - like classes:
using System;
using System.Collections;
...
namespace SomeApplication
{
...
public class A
{
private B b_B;
private int m_n;
...
public B _B
{
get { return b_B; }
set { b_B = value; }
}
...
}
public class B
{
ArrayList vectorA;
...
public B()
{
vectorA = new ArrayList();
for ( int i = 0; i < N; ++i )
{
A a = new A();
a._B = this;
...
vectorA.Add( a );
}
}
public ArrayList _A
{
get { return vectorA; }
set {
...
}
}
...
}
Depending on the Set operation policy we may require that the referencing class also implements the ICloneable::Clone method as a deep copy. Note that if a referenced class is also referencing another class it should implement ICloneable interface using the deep-copy semantics.
The runtime environment controls:
- the collection that holds child (=referencing) instances of A;
- the A's data member field that stores a pointer to the corresponding instance of B.
It can be done via hooks into implementation of IDictionary::Item property, IDictionary::Add and IDictionary::Remove, etc. methods.
Every time we modify A and B instances, the environment should ensure that the relationships between class instances are properly maintained by enforcing corresponding constraints.
The deep-copy implementation of ICloneable interface allows the CLR to use a mechanism similar to the deleted and inserted tables in SQL Server to verify constraints. If we're deleting records from the table A we could create a trigger to verify that none of those records reference the table B:
CREATE TRIGGER tr_A_D
ON A FOR DELETE
AS
If EXISTS ( select * from deleted d, B b where d.B_id = b.B_id )
begin
RAISERROR ('records from A point to B, cannot delete...', 16, 1)
ROLLBACK TRANSACTION
end
GO
Likewise, let a method foo modify the input collection of A:
public bool foo( ArrayList vectorA )
{
foreach (A a in vectorA )
}
During JIT compilation the runtime adds a "trigger" to verify constraints:
public bool foo( ArrayList vectorA )
{
method body
}
after JIT compilation to native code, the CLR adds required implementation using information found in TypeConstraint and TypeConstraint_Method tables (see 2.6.3. and 2.6.4. for tables layout):
public bool foo( ArrayList vectorA )
{
{native code}
{native code}
{native code}
method body
{native code}
}
In this case the runtime is able to "rollback" changes if something went wrong.
In more complex scenarios the deleted instances should cascade the removal (or any other changes) to all related instances with which the "deleted rows" had previously been in relationships for which the ON DELETE CASCADE option was specified. Otherwise, if the ON DELETE NO ACTION is set, the CLR should prevent the method from deleting the corresponding instances of A:
public bool foo( ArrayList vectorA )
{
...
method body
...
}
2.4. Update Meta data hierarchy.
The extended metadata tree looks like this:
Assembly
Module
Type
TypeConstraint
MethodInfo
ParameterInfo
MethodConstraint
FieldInfo
FieldConstraint
EventInfo
PropertyInfo
...
We declare the new metadata tokens:
...
typedef mdToken mdMethodConstraintDef
typedef mdToken mdFieldConstraintDef
typedef mdToken mdTypeConstraintDef
typedef enum CorTokenType
{
...
mdtTypeConstraintDef = 0x87000000,
mdtFielConstraintdDef = 0x88000000,
mdtMethodConstraintDef = 0x89000000,
} CorTokenType
We also need to extend types and functionality provided by in the System.Reflection.Emit namespace to allow compilers/tools to emit constraints (metadata and related MSIL instructions). For example, TypeBuilder, MethodBuilder and FieldBuilder classes have to be modified to support type, method and field constraints.
2.5. Metadata tables changes.
2.5.1. Method table changes.
The MethodConstraintList column (index into MethodConstraint table) marks the first of a contiguous run of Method Constraints owned by the current method. This run continues to the smaller of:
- the last row of the MethodConstraint table
- the next run of Method Constraints, found by inspecting the MethodConstraintList column of the next row in the Method table.
2.5.2. Field table changes.
The FieldConstraintList column (index into FieldConstraint table) marks the first of a contiguous run of Field Constraints owned by the current field. This run continues to the smaller of:
- the last row of the FieldConstraint table
- the next run of Field Constraints, found by inspecting the FieldConstraintList column of the next row in the Field table.
2.5.3. TypeDef table changes.
The TypeConstraintList column (index into TypeConstraint table) marks the first of a contiguous run of Type Constraints owned by this type (class). This run continues to the smaller of:
- the last row of the TypeConstraint table
- the next run of Type Constraints, found by inspecting the TypeConstraintList column of the next row in the TypeDef table.
Changes are summarized in this figure:
2.6 Metamodel Tables layout.
2.6.1. MethodConstraint describes constraints for each method.
Each row in the MethodConstraint table is owned by one, and only one, row in the Method table. The MethodConstraintList column (index into MethodConstraint table) in the Method table marks the first of a contiguous run of Method Constraints owned by the current method.
2.6.2. FieldConstraint describes constraints for each field of a class.
2.6.3. TypeConstraint describes constraints for each class type
Each row in the TypeConstraint table is owned by one, and only one, row in the TypeDef table. The TypeConstraintList column (index into TypeConstraint table) in the TypeDef table marks the first of a contiguous run of Type Constraints owned by the current type. This table looks similar to sysforeignkeys/sysreferences tables in SQL Server
Note that in order to describe Setters/Getters method we could use the MethodSemantics-like mechanism (used by Event/Properties tables). In this case we need also to extend MethodSemanticsAttributes type
2.6.4. TypeConstraint_Method describes which methods of the referencing and referenced classes have to be verified in order to support referential integrity constraints between these classes.
Its role is similar to the MethodSemantics table when Events and Properties are implemented.
New tables are shown in this figure: