Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

.NET Type Internals - From a Microsoft CLR Perspective

0.00/5 (No votes)
13 Sep 2007 9  
Understand the internals of .Net types from a CLR perspective

Index

  1. Introduction
  2. User Defined Types (Value Types and Reference Types)
  3. Delegates
  4. Enumerations
  5. Arrays
  6. Generics
  7. References

Introduction

This article contains technical information about implementation of different categories of types (Value Types, Reference Types, Delegates, etc) in Microsoft CLR 2.0 (hereafter called as CLR). The concepts that are presented in this article are based on my analysis and study of type behavior in .NET using Son Of Strike (SOS) debugging extensions for VS.NET 2005 and the C# compiler. These concepts are also discussed in different MSDN blogs, MSDN articles and books. However, these forms are often in the context of a broader topic where finer and important points are not easily visible. I created this article to provide a single point of reference for these finer and important points about the inner workings of CLR with regards to types.

This article assumes that the readers have a working knowledge on a different category of types in .NET. Also this article do not discuss how to create and use different category of types, rather it progresses by examining the implementation details of different aspects of a different category of types in CLR. Initially I created this content as a document for my reference, but felt like publishing it as an artcile so that it may of be of some benefit to the .NET community. As such I would be very happy to receive any comments and feedback to improve the content of this article.

In .NET there are two main classifications of types and every type is derived from a Root Reference Type named System.Object (directly or indirectly through another base type).

  • Value Type
    • User Defined Value Types (Structures)
    • Enumeration
  • Reference Type
    • User Defined Types (Classes)
    • Array
    • Delegate

Apart from simple differences on whether the instances of these types are allocated on stack or heap there are core internal design differences in the definition, behavior and instances of these types. Compilers and CLR together creates and maintain differentiation between a Value Type and a Reference Type during compile time and runtime. Understanding how CLR implements these types and how it works with these types will allow developers to design better .NET applications.

2. User Defined Types (Value Types and Reference Types)

Memory Location

Value Types are allocated on stack. This is done to primarily reduce the contention on GC heap for types that simply represent basic data items. The contention could be due to heavy allocations, GC cycles and dynamic memory that need to be requested from the OS. This in turn will make the performance of using Value Types acceptable and efficient. Reference Types are allocated on GC Heap.

Memory Layout

The instance of a Value Type contains JUST the values of its fields. Whereas the instance of a Reference Type contains additional baggage to deal with GC, Synchronization, AppDomain identity and Type information. This extra baggage adds 8 bytes to every instance of a Reference Type. The variable referring to an instance of Value Type represents the starting address of the Value Type instance allocated on the stack. The address of a local variable representing a Value Type instance is called a Managed Pointer and is normally used for reference parameters on the stack.

The variable referring to an instance of a Reference Type is called Object Reference and is a pointer to the starting address of Reference Type instance + 4 bytes. The starting address of any Reference Type instance is a 4 byte value called sync block address and points to a location in a process-wide table that contains structures used for synchronizing access to instances of Reference Types. The next 4 bytes contains the address of a memory (AppDomain specific) that contains a structure holding the Method Table of the Reference Type for which the object is instantiated or points to. This Method Table inturn contains a reference to another structure, which holds the Runtime Type Information of the corresponding Reference Type.

Presence of the pointer to the type's Method Table (and inturn the type's RTTI) in their instance is what makes Reference Types self describing types. Programs and CLR can discover information about the type of a Reference Type instance and can be used in several runtime facilities, like type casting, polymorphism, reflection, etc. Value Type instances on the other hand, due to lack of pointer to the type's Method Table in their instance, are simple chunks of memory without any clue to anyone about what that memory is. For this reason Value Types are NOT self describing types.

The figures below show the memory layout of Value Type and Reference Type instances.

Value Type Instance Memory Layout � Block Diagram

Figure-1: Value Type Instance Memory Layout � Block Diagram

Reference Type Instance Memory Layout � Block Diagram

Figure-2: Reference Type Instance Memory Layout � Block Diagram

Instantiation

Instance of a Value Type is created when it is declared. There is NO default constructor for Value Type. During instantiation, all the fields of a Value Type are initialized to either 0 (Value Type field) or null (Reference Type field). Ideally a Value Type instance fields should be used or accessed after instantiating the Value Type using new operator. But technically it is not required as the fields are already zeroed out by CLR during declaration of the Value Type instance. But languages like C# will NOT allow programs to use or access Value Type instance fields until they are set explicitly to some value or the instance of the Value Type is created using new operator. This behavior is for saving additional constructor calls for simple Value Types that are heavily used in applications. Value Types can have parameter constructors, which can be explicitly called to create an instance of a Value Type. In this case the parameter constructor has to initialize all the fields of its Value Type.

Reference Types on the other hand MUST be allocated using new operator and should have a default constructor or parameter constructor. While creating instance of a Reference Type, CLR will first initialize all the fields of the Reference Type and then will call into either default or parameter constructor based on the constructor specified in the new operator.

Variables and GC Roots

  • A variable of a Value Type is direct representation of the address of the Value Type instance on stack
  • A reference variable to a Value Type instance is called Managed Pointer and is a pointer to the starting address of the Value Type instance on stack
  • A variable of a Reference Type (UDT, Array, String, Delegate and Interface type variables) is a pointer to the Reference Type instance created on GC heap
  • CPU registers can contain Managed Pointers or Object References
  • AppDomain wide Handle tables contain GC handles that are pointers to the pinned Reference Type instances in memory. These Handle tables also contain Managed Pointers (or Object References?) to the static Value Type instances and Object References to the static Reference Type instances
  • Thread Local Storage (TLS) can contain Object References
  • FReachable Queue contains Object References of the Reference Types that are NOT referenced by any of the above variable types and for which finalize method call is pending

    CLR's Garbage Collector uses the above variables, also called GC roots, to track down the Object References during garbage collection phase. Any Reference Type instance located in GC heap, for which there is no Object Reference in any of the above variable types (except for the FReachable Queue), is considered a candidate for garbage collection and is removed from GC heap. If the Reference Type instance being removed implements the Finalize method, then the Object Reference is put on FReachable Queue for calling the Finalize method by a separate Finalizer thread. Once the Finalizer thread completes Finalize method call on an Object Reference, the corresponding Reference Type instance is removed from the GC heap.

    Constructors

    As discussed in the Instantiation section, Value Types cannot and do not have default constructor. They can have parameter constructors and need to be explicitly called using new operator. Constructors either for Value Types or Reference Types are just another instance method of a type. They are implicitly used by compilers to initialize the type fields and do some initialization operations. But compilers do not allow programs to explicitly call constructors on the type variable after it is instantiated. But technically a constructor of a type can be called after it is instantiated (i.e can be called several times in the lifetime of a type instance). This can be done by cheating compilers and writing some IL code. One can disassemble the assembly into IL using ILDASM, modify the IL and include calls to constructor of a type instance and reassemble the assembly using ILASM. However, this is not required and if initialization of the same type instance is required multiple times in its lifetime, then developers can define a public method on the Type that can do the same Job as its constructor.

    The difference between a Value Type constructor and Reference Type constructor is that, a Reference Type constructor should always call the default constructor of its base class. This is because the CLR will not do it automatically and it is the responsibility of the Reference Type to do it. This is NOT required for Value Type for obvious reasons because Value Type does not have any default constructor and it cannot act as a base class. For Reference Type, compilers (like C#) automatically insert the calls to base type constructor from within the derived type constructor, while compiling the source to MSIL.

    Inheritance (A Way to Distinguish Value Types from Reference Types)

    Value Types are derived from a special type named System.ValueType, which inturn is derived from System.Object. Any type derived from System.ValueType is treated as Value Type by CLR. CLR uses this knowledge to work on the type's instances as discussed in the above sections and following sections. Reference Types do not have System.ValueType in their base class hierarchy. In fact, some compilers will not allow any type to derive from System.ValueType directly. Compilers always provide an indirect enforceable way to specify a type as being derived from System.ValueType and enforce certain definition rules on those types. Also based on this enforceable language rule, compilers will generate MSIL code that is appropriate to Value Types. This is required because, while coding at MSIL level (or in ILASM), any type can be derived from System.ValueType and it can even contain default constructor and incorrect MSIL instructions, which should NOT be present (because a default constructor is never called) for a Value Type.

    Value Types cannot be used as base types for other types. Definitely a Value Type cannot be a base class for a Reference Type as the Value Type do not have default constructor, its memory layout is different than that of a Reference Type, etc. But why a Value Type cannot be a base class for another Value Type? The answer lies in the memory layout of the Value Type instances. .NET uses Method Table to achieve runtime polymorphism (virtual method dispatch). Since the instance of a Value Type does not contain Method Table, CLR cannot use it to correctly dispatch the virtual method calls (Method dispatch internals are discussed in the following sections). Due to this .NET cannot provide runtime polymorphism with Value Types. And without runtime polymorphism, inheritance is incomplete from an Object Oriented Design perspective. So all the compilers including ILASM will mark any Type derived from System.ValueType as sealed. And any type that is sealed CANNOT be used as a base and this restriction is enforced by CLR during runtime while loading a type. If CLR finds a type being loaded has a base type that is marked as sealed, it will stop loading the derived type and throws a System.TypeLoadException exception

    Equality and Hash code

    System.ValueType overrides the Equals and GetHashCode virtual methods of its base type System.Object. Any type that is derived from System.ValueType that uses Equals for comparison of its instances, need to override these two methods to improve performance. This is because the Equals method uses reflection for comparison of two Value Type instances if it cannot compare the fields of the Value Type instances at bit level. To eliminate reflection and its associated performance penalty during equality comparison of two Value Type instances, it is better for a structure (derived type of a System.ValueType) to override these methods and perform custom equality check and hash code generation.

    For Reference Type instances, which are directly or indirectly derived from System.Object, the Equals and GetHashCode methods are implemented in System.Object and work based on the Object Reference value, rather than the actual field values of the type instance.

    Most commonly these two methods are overridden together so that two instances of the Value Type can be compared equal either using Equals or using GetHashCode. Another reason for considering overriding of ToString, Equals and GetHashCode methods of System.Object for a user defined Value Type (struct) is that, without overriding if these methods are called using Value Type variable, CLR has to box the Value Type instance and then calls the method on the boxed instance of the Value Type.

    Instance Copy Semantics (Method argument passing by value and variable assignment)

    Value Type instance is always copied by value into another variable of the same Value Type. This copy could occur while passing a variable of a Value Type as argument to a method parameter expecting the same Value Type. This copy could also occur while assigning (NOT initializing during instantiation) a Value Type variable to another variable of the same Value Type. A Value Type may contain fields that are all primitive (cannot be broken down further) data types, Value Types and/or Reference Types. If a field is a primitive type or Value Type its value is copied as it is. If a field is a Reference Type the Object Reference present in the field is copied over to the corresponding field of target Value Type instance

    'this' Pointer

    The 'this' pointer available to the instance methods of a Value Type points to the address of the first instance field of the type's instance memory. So if an instance field is accessed within an instance method, it would be accessed directly from the address pointed to by the 'this' pointer + the offset of the field as layed out by the CLR. This field offset is determined by the CLR during loading of the type and is used for all the instances of the type through the lifetime of the AppDomain.

    The 'this' pointer available to the instance methods of a Reference Type points to the address of the Method Table information block of the type's instance memory. So if an instance field is accessed within an instance method, it would be accessed from the address pointed to by the 'this' pointer + 4 bytes + the offset of the field as layed out by the CLR. Adding a value of 4 brings the pointer to the first instance field of the instance. This field offset is determined by the CLR during loading of the type and is used for all the instances of the type through the lifetime of the AppDomain

    Boxing and Unboxing

    A Value Type instance on stack without any Method Table is called unboxed value of the Value Type instance. If this value has to be assigned to a System.Object variable, then we need an instance of the Value Type in memory that has Method Table in its memory layout. This is required because System.Object is a Reference Type with Virtual methods (Equals, GetHashCode and ToString). So any calls to those methods require a valid non-null Object Reference. But just hang on. Let us say that a program overrides the GetHashCode method and creates its custom one using the field data of the Value Type. Now within the GetHashCode override method if any type's fields are used they are accessed using the starting address of the 'this' pointer. But if we do boxing and create an Object Reference for the Value Type instance and call GetHashCode method using that, then the starting address of the Object Reference, which is the starting address of the Method Table is passed as this pointer into the overridden method. This will cause problems during method execution as the method is expecting its field value instead of Method Table. To avoid this when a Value Type instance is boxed CLR makes sure that the virtual methods called on the System.Object variable will have the starting address of the first field item in the type instance. To this effect CLR takes care of passing correct starting address, either Object Reference or Managed Pointer, based on the type instance, which is either direct Value Type instance, boxed Value Type instance or direct Reference Type instance.

    Boxing involves creating an Object Reference pointing to the heap based memory location, created to hold the Value Type data (fields), copying the data from the Value Type instance to the field portion of the type's instance. Unboxing involves creating a stack based instance of the type and copying the field data from the Object Reference to it. Unboxing happens when a System.Object variable is type casted to a Value Type instance

    Non Virtual Instance Method Dispatch (Method Call)

    Non Virtual instance methods dispatched in both Value Types and Reference Types are the same. It is done using the call IL instruction, which requires that the type's instance pointer, 'this' pointer, be available on the stack as first method argument, before any other arguments of the method. The JIT while compiling the method's call site will burn the address of the method body (code) present in the instance method slot, into the machine code. This method address is taken from the type's Method Table structure.

    As discussed in Memory Layout section, the Object Reference points to a 4 byte address at the starting of the type instance, which contains the address of type's Method Table. The Method Table contains the following information apart from other information.

    • Pointer into the AppDomain wide Interface Offset Table (IOT) that contains the address of the starting slot where the interfaces IIDs that are implemented by the type are present in sequence in which they are implemented by the type. This is used during interface based dispatch, which is discussed in detail in next section
    • Number of interfaces implemented by the type
    • Method table where each slot contains the address of the method code. The content of the addressed location contains a flag identifying the type of the code, MSIL or JITTED code (machine code). In case of MSIL the addressed location after the flag contains the MSIL code. In case of JITTED code the addressed location after the flag contains a JMP statement to the memory where the JITTED code is present. The Method table is arranged in the order of inherited virtual methods, implemented virtual methods, instance methods and static methods. CLR determines the offset of a method within this Method table based on its token, which is computed by the compiler during compilation of the assembly containing the type. Call site (method call instruction in the code using the type) contains reference to the method token. Since CLR knows the method slot offset based on its token, during JITTING phase all call sites are properly patched with method slot address based on the method token and the type of the variable (instance method calls) or the type pointed to by the Object Reference (virtual method calls)
    • Static fields. For primitive types each slot contains the value itself and for UDT the slot contains the address of the slot in the AppDomain wide Handle table. As described in the Variables and GC Roots section, each slot in the Handle table contains either Pinned Object Reference, Managed Pointer or Object Reference
    • Method slots for each interface implemented by the type

    CLR does not check the validity of the instance pointer while making non virtual instance method calls. So practically it is possible to call an instance method using a type variable that is not yet instantiated or a type variable having null Object Reference (you can try this by modifying IL of an assembly). But remember if the method being called using a null Object Reference contains access to the type's fields or calls to other methods that access type's fields then a System.NullReference exception will be thrown by CLR while executing such instructions within the method. This is because CLR checks for the validity of the Object Reference while accessing instance fields. This is again because fields are present in the memory location allocated for the type's instance, and for a null instance, no memory is allocated.

    Allowing method calls on null Object References is considered dangerous and unpredictable. For this reason many .NET compilers like C# will emit callvirt IL instruction even while calling non virtual instance methods. Callvirt IL instruction will instruct CLR to generate machine code that first checks for the validity of the Object Reference. If the Object Reference is null, the generated machine code will raise an exception, else the method is called. Remember, using callvirt by C# for even non virtual method calls will NOT have performance penalty. This is because callvirt on a non virtual instance method has only one additional instruction of comparing the Object Reference with null apart from an instruction that directly jumps to the method address for method execution. In case of callvirt being used on virtual instance methods, there are additional instructions trying to figure out the method address based on the type that is actually pointed to by the Object Reference. Virtual method dispatch is discussed in detail in the next section

    CLR coverts the call IL instruction into the following machine code (Psuedocode)

    • Move the Object Reference/Managed Pointer address into ECX register. 'this' pointer is always loaded into ECX register and is the first hidden argument that need to be passed while calling any instance method (virtual or non virtual). This is because CLR uses fastcall calling convention, which requires that the first two method arguments are used from registers, ECX and EDX as much as possible for fast access
    • CLR burns the address of the method code and emits a call to that address. CLR does this only once for each call site during the JIT phase of the method containing the call site. CLR uses the Method Table of the type of the variable used for method call, NOT the Method Table of the object pointed to by the variable, while gathering the address of the method code

    Shown below are the close equivalent of IA32 instructions for the above psuedocode.

    mov ecx, esi                ; assuming esi has Object Reference
    call dword ptr ds:[567889h] ; call into the address where method code 
                                  resides
    

    Virtual Method Dispatch (Method Call)

    Virtual methods can only be instance methods. This is because the virtual method dispatch mechanism is used to call a method based on the type of the object pointed to by the variable, instead of the type of the variable itself. Since we need the object of a type, it makes sense to have virtual method dispatch mechanism on instance methods.

    In case of Value Types, virtual methods dispatch (call) similar in behavior to non virtual instance methods. This is because Value Types cannot be inherited and as such there is no need (enforced by CLR) and no way (enforced by compilers like C#) to define polymorphic behavior (virtual methods) on Value Types. So, languages like C# do not allow Value Types (struct) to define virtual methods. Also callvirt IL instruction is not required while calling methods on Value Types, because instance memory is created and initialized for the Value Type variable at the declaration instruction itself. So there is no point of checking the Value Type variable for null before calling methods using it. But MSIL allows a method definition to be virtual and be called using callvirt IL instruction. But nevertheless CLR, while generating the machine code, will optimize it to a regular instance method call.

    The virtual dispatch mechanism is applicable to a Value Type only when the virtual methods from System.Object type are called using the Value Type variable. In this case if a System.Object method which is NOT overriden in Value Type is called, CLR will box the Value Type instance before calling the method. If a System.Object method is overriden in a Value Type and is called using the Value Type variable, then CLR calls the method directly without any boxing and virtual dispatch. This difference in behavior during runtime is achieved by CLR using an IL instruction named constrained <Type Token>. The constrained IL instruction checks if the type represented by the token is a Value Type. If it is a Value Type and if the Value Type implements the method being called by following callvirt IL instruction, then it simply emits a direct call instruction to the method. If the Value Type does not implement the method being called by following callvirt IL instruction, then it will box the Value Type instance and emits a virtual call to the method.

    In case of Reference Types virtual methods are dispatched based on the corresponding virtual method address present in the Method Table of the target object pointed to by the type variable.

    CLR converts callvirt IL instruction into following machine code (Psuedocode) for virtual method calls.

    1. Compare the Object Reference value for null
    2. If null throw System.NullReference exception, else continue
    3. Move the address of Object Reference into ECX register
    4. Retrieve the address present in the method slot as identified by (Method Table address [first 4 bytes of value present in the memory pointed to by Object Reference] + Relative offset of the method from the starting address of the Method Table structure in memory)
    5. Call into the machine code present in the location pointed to by the address retrieved in step-4

    Shown below are the close equivalent of IA32 instructions for the above psuedocode.

    mov ecx, esi                ; move the Object Reference to ecx
    cmp dword ptr [ecx], ecx    ; try de-referencing ecx. If ecx is null, CLR 
                                  will catch the memory access violation and will 
                                  convert that to System.NullReferenceException
    mov eax, dword ptr [ecx]    ; move the Method Table structure address of the 
                                  type to eax
    call dword ptr [eax + 40h]  ; call into address where method code resides
    

    Remember the memory layout of the Method Table structure from section Non Virtual Instance Method Dispatch. The method slots where the inherited virtual method addresses are stored are NOT duplicated if the virtual methods are overridden by the derived types. Instead if a virtual method is overridden in the derived class, its slot in the top most parent type is replaced by the address of the corresponding virtual method overridden in the bottom most derived type. This technique allows CLR to maintain the same offset for a method in the Method Table structure based on the method token.

    CLR converts callvirt IL instruction into following machine code (Psuedocode) for non virtual instance method calls.

    • Compare the Object Reference value for null
    • If null throw System.NullReference exception, else continue
    • Move the address of Object Reference into ECX register
    • CLR burns the address of the method code and emits a call to that address. CLR does this only once for each call site during the JIT phase of the method containing the call site. CLR uses the Method Table of the type of the variable used for method call, NOT the Method Table of the object pointed to by the variable, while gathering the address of the method code

    Shown below are the close equivalent of IA32 instructions for the above psuedocode.

    mov ecx, esi                ; move the Object Reference to ecx
    cmp dword ptr [ecx], ecx    ; try de-referencing ecx. If ecx is null, CLR 
                                  will catch the memory access violation and will 
                                  convert that to System.NullReferenceException
    call dword ptr ds:[567889h] ; call into the address where method code resides
    

    Interface based Method Dispatch (Method Call)

    Interface based method dispatch happens when a method is called on a type instance using a variable of an interface type, where the interface is implemented by the type. Since any type can implement multiple interfaces in any order, the Method Table of a type instance is required while calling methods on it using a variable of one of the interfaces that the type implements. Since a Value Type instance does not have Method Table, it has to be first boxed before assigning it to an interface variable, so that any calls on the interface variable can use the Method Table of the Value Type from the boxed instance (Object Reference).

    Once boxed, interface-based method dispatch on a boxed Value Type instance and a reference Type instance is same. callvirt IL instruction is used to dispatch interface method calls. CLR converts callvirt IL instruction into following machine code (Psuedocode).

    1. Compare the Object Reference value for null
    2. If null throw System.NullReference exception, else continue
    3. Move the Object Reference into ECX register
    4. Move the value of the Object Reference, which is the Method Table structure address for the type into EAX register
    5. Retrieve the address at 12 bytes offset within the Method Table structure. This memory slot contains the starting address of the interface slots for the interfaces implemented by the type in the AppDomain wide IOT. The IOT contains one interface slot for each interface implemented by each type loaded into the AppDomain. This means there will be multiple interface slots for the same interface, which is implemented by multiple types loaded into the AppDomain. This interface slot in turn contains the memory location within the Method Table structure of the type, which is the starting point of method table for that interface
    6. Starting from the address retrieved in step-5 find the interface slot offset that corresponds to the variable's interface type. This search is done based on the interface IID, which is unique across the process for distinct interface types implemented by loaded types within all AppDomains. This offset has to be computed during runtime by CLR every time the call site is executed because the offset of the variable's interface type may be different for different concrete types implementing the interface
    7. The address in this offset location contains the starting address of the interface method table for the variable's interface type, within the Method Table structure of the instance type. This address is moved to the EAX register
    8. CLR uses the interface method token and generates the offset of the method within the interface type. A call instruction is emitted to address that is computed using the address obtained in step-7 + offset of the method obtained at the beginning of this step. The method offset is burned into the machine code generated at the call site

    Shown below are the close equivalent of IA32 instructions for the above psuedocode.

    mov ecx, esi                    ; move the Object Reference to ecx
    cmp dword ptr [ecx], ecx        ; try de-referencing ecx. If ecx is null, CLR
                                      will catch the memory access violation and 
                                      will convert that to 
                                      System.NullReferenceException
    mov eax, dword ptr [ecx]        ; move the Method Table structure address of 
                                      the type to eax
    mov eax, dword ptr [eax + 0ch]  ; move the address of starting IOT slot for 
                                      this type into eax
    ...                             ; find the offset (02h) of the interface 
                                      starting from eax
    mov eax, dword ptr [eax + 02h]  ; move the address of IMT starting slot for 
                                      the interface into eax 
    call dword ptr [eax + 3h]       ; call into address where method code resides

    JIT does some optimizations for interface based dispatch. If the same type is being used for method dispatch at a given call site, JIT can identify it and optimize the above 8 steps to just a direct jump to the method address. But this involves overhead of maintaining a call counter, checking for specific number of calls, storing the current type's Method Table address, storing the current type's method address, comparing the current type's Method Table address with incoming type's Method Table address, etc. CLR 2.0 uses this technique to dramatically improve the performance of interface based method calls. But this may also have severe performance penalty in case the call site is being provided with different type instances very frequently.

    Delegates

    Delegate is a special kind of Reference Type. Each delegate represents a UDT (class). The signature of the Invoke method of a delegate UDT type should match the signature of the method that the delegate is used to call. The bodies of the Invoke method of a delegate UDT including its asynchronous version BeginInvoke method are generated at runtime by CLR during instantiation of the delegate UDT. Delegate UDT constructor takes two parameters, the target object and the address of the target method (address where the method code resides) to invoke on the target object. Target object could be null in which case the method is considered static and is invoked based on the type of the target object variable. Target object can be a Reference Type instance or a Value Type instance. In case of Value Type instance, it needs to be boxed before passing it on to the delegate instance.

    Delegate based Method Dispatch (Method Call)

    But in Managed world how can we get the address of the target method? MSIL has two opcodes to load the address of a type's method onto the stack. These are ldvirtftn and ldftn. ldftn will load the method address on to the stack from a method token. The type of a method can be retrieved from a method token, which CLR uses to lookup the Method Table of the type and will get the address of the method from the Method Table of the type's Method Table. ldvirtftn is used to load the address of a virtual method. ldvirtftn requires an Object Reference on the stack, which it uses to get the Method Table and then figure out the method address based on the method toke. Compilers (like C#) uses one of these two IL instructions when a method name is specified as argument to a delegate UDT constructor, based on whether the method is a virtual, non virtual or static method. This process of retrieving and storing the method address within a delegate instance is called delegate binding. This is the costly operation in the whole process of executing a method using a delegate instance. Once a method is bound to a delegate instance, CLR writes up the Invoke method body, which is very efficient and has following machine code (Psuedocode) instructions.

    • Copy the target Object Reference into ECX register
    • Call into the method address stored inside the delegate instance

    Shown below are the close equivalent of IA32 instructions for the above psuedocode.

    mov ecx, [ecx + 0ch]         
    ; ecx contains the delegate instance address and 12 bytes within the 
      delegate instance contains target Object Reference
    call dword ptr ds:[567889h]    ; call into the address where method 
                                     code resides
    

    Also, Invoke method of a Delegate type created by CLR at runtime uses jmp IL instruction to jump to the target method. jmp instruction simply jumps from the current method to the destination method without clearing up the stack. So the arguments passed at call site are available to the method being jumped to. The return address on the stack would be the return address of the original caller of the Invoke method. So when target method completes, it directly returns to the caller by passing the Invoke method. So once bound, performance of a delegate based method dispatch almost equals the performance of a virtual method call.

    Multiple Method Dispatch using Delegate Chain

    Delegate instances can be combined together to form a chain, where the last added instance will at the head of the chain. When the delegate instance at the head of a delegate chain is invoked, it passes on the invocation to next delegate instance in the chain. This passing continues until there is no next delegate instance. The delegate based method call starts at this final delegate instance (the one that is added first to the chain and is last in the chain) and comes back until the head of the delegate instance calls the method it wraps.

    While executing a delegate chain the return value and the output parameters values will be ones that are set by the last delegate handler added to the chain (And first delegate in the chain, the head of the chain).

    Enumerations

    Enumeration is a special category of Value Type. Each Enumeration type contains a single instance field to hold the value of the enumeration and is of the same data type as that of the enumeration (By default, System.Int32 in C#). Each Enumeration also should contain one or more static literal values representing the named data constants exposed by the Enumeration type. CLR allows explicit conversion between the named data constants of an Enumeration type and the underlying data type of the Enumeration. So, if an Enumeration is of type System.Int32, then one of the named data constants of the Enumeration Type can be assigned to a variable type System.Int32 using type casting and vice versa. The vice versa business is a dangerous one and need to be used judiciously. This is because, CLR does not throw any exception if an integer value that does not belong to one of the data constants exposed by the Enumeration type is assigned to a Enumeration variable using explicit type casting. Rather it simply assigns it to the internal instance member of the Enumeration instance. After this casting, the Enumeration instance contains invalid value, which is only a logical error that may cause unpredictable results in the program.

    The named data constants used by Enumeration types, when assigned to a variable or used in expressions (like switch case) are directly replaced by the compilers with the actual constant value during compile time. This is because MSIL and thus CLR have no mechanism to load a constant/default values for a field during runtime. All the constant values are stored in metadata against the corresponding field. Compilers use this value stored in metadata and replace the name of the constant with the actual value during compilation. This has an interesting side effect if you are shipping assemblies with modified enumeration data constants and the client application is run without recompiling against the modified assembly. In such cases client application will have a value of X, hard coded in its code for a named enumeration data constant, where as the same value X may mean a different named data constant in the modified assembly. This will lead to unpredictable behavior of the programs.

    Arrays

    Arrays are special category of Reference Type. For each distinct array of Value Type (including primitive types like int, byte, etc) defined in a program, CLR creates and maintains a Reference Type at runtime, which will be derived from System.Array Reference Type. This synthesized array type has a constructor that takes the size of the array as argument for single dimensional array or jagged array and dimension sizes for multidimensional array. This constructor is exposed in different ways to the programs by .NET programming languages. For instance C# exposes this constructor using typical C/C++ array index syntax, int[] i = new int[10];. For all arrays of Reference Types CLR creates and maintains a single Reference Type named System.Object[] at runtime, which will be derived from System.Array. This is because all the elements of a Reference Type array just hold the Object Reference, which is of same size and have same internal memory representation.

    Single dimensional and jagged array elements are directly accessed or modified from the memory location of the instance of the runtime array type based on their index. For this CLR provides a special IL instruction named ldelem and stelem, which retrieves and modifies an array element based on its index and the array Object Reference available on the stack. Whereas the multidimensional array elements are accessed or modified using method calls on the synthesized runtime type of the array type. This makes jagged arrays much faster than the multidimensional arrays and is recommended way of using arrays in case multiple dimensions are required. Jagged arrays are CLS compliant and are mistakenly documented as NOT CLS compliant. This fact is acknowledged in MSDN2.

    Generics

    Generic Reference Types

    A Generic type can be a open type where some or all of the Type Parameters are not yet replaced with a non-generic type or closed generic type. A Generic type can be a closed type where all of its Type Parameters are replaced with non-generic types or closed generic types. Only closded Generics types can be instantiated. The reason is obvious because there would be methods called on the type parameter variables within the methods of the Generic type. So without knowing the type of the variable CLR cannot instantiate a type for the variable. For each closed Generic Type that is supplied with a Value Type for a Type Parameter, CLR creates a new type during runtime and uses it for instantiations and other purposes. For all closed Generic Types of a given Generic Type that is supplied with a Reference Type, CLR creates one type where the Type Parameter for which a Reference Type is supplied, is replaced with a special type named System.__Canon. Let this type be named as Generic<T.System.__Canon>.

    For each closed Generic Type of a given Generic Type, with a distinct Reference Type Parameter, CLR creates a distinct type with Type Parameter replaced with the supplied Reference Type. This Reference Type will be a dummy type to serve the type safety during passing around. This type's EEClass contains pointer to the EEClass structure of the type Generic<T.System.__Canon> CLR dispatches any method calls on the closed Generic Type instance using the Method Table of the Generic<T.System.__Canon> type. BTW, EEClass is a companion structure to Method Table and is created by CLR for each loaded type in the AppDomain. EEClass has all the type information for a given type. Method Table has a pointer to the EEClass.

    public class DisplayClass : IDisplay
    {
        public void Display()
        {
            Console.WriteLine("From DisplayClass");
        }
    }
    
    public struct DisplayStruct : IDisplay
    {
        public void Display()
        {
            Console.WriteLine("From DisplayStruct");
        }
    }
    
    public class Test<T> where T : IDisplay
    {
        public T _field;
        public void Show(T temp)
        {
            temp.Display();
        }
    }
    
    public static void Main(string[] args)
    {
        // Reference Type as Type Parameter
    
        //
    
        Test<DisplayClass> objTestClass = new Test<DisplayClass>;
        objTest.Show(new DisplayClass());
        
        // Value Type as Type Parameter
    
        //
    
        Test<DisplayStruct> objTestStruct = new Test<DisplayStruct>;
        objTestStrucr.Show(new DisplayStruct());    
    }

    When the above code is compiled using a C# compiler, it creates a type named Test`1<(Test.IDisplay) T), which acts as placeholder for the types that CLR would be creating during runtime. During runtime CLR will create a type named Test.Test`1[[Test.DisplayClass, ConsoleApplication5]] for the closed Generic Type Test<DisplayClass>. The prefix Test. is the namespace name and the name ConsoleApplication5 is the assembly name. This type is a empty type without any methods. All the methods that are defined in the open Generic Type Test<T> are defined in a class named Test.Test`1[[System.__Canon, mscorlib]].

    When the method Show is called on the variable objTestClass as shown in the above code snippet, CLR at runtime will call into the Show method defined by the type Test.Test`1[[System.__Canon, mscorlib]]. This design on having all the method definitions of a closed Generic Type whose Type Parameter is a Reference Type, in a single type, will reduce code bloat. This design works because all the method calls on any Type Parameter variable inside a Generic Type is only through known interfaces or base class. So CLR need not re-write the method IL for each Reference Type provided as parameter. Also any arguments to methods or return values involving Type Parameters would all be Object References, which are memory layout wise and copy smeantics wise same for any Reference Type Parameter.

    Whereas for the closed Generic Type Test<DisplayStruct> CLR creates a non-empty type named Test.Test`1[[Test.DisplayStruct, ConsoleApplication5]]. This type has the all the methods as defined in the open Generic Type Test<T>. Though method call behavior on the Type Parameters is same for Value Type Parameter and Reference Type Parameter, the copy semnatics and memory layout are different. So if a method in open Generic Type, Test<T>, is returning a Type Parameter variable then the method should return the exact Value Type instance. For this reason it is not possible to have a common base class for all closed Generic Types constructed from a open Generic Type, whose Type Parameter is a Value Type.

    There could be scenario where a open Generic Type has multiple Type Parameters and the program creates closed Generic Types with mix of Reference Types and Value Types for the Type Parameters. In this case, CLR creates one base type for a set of closed Generic Types with same combination of Value Types and any combination of Reference Types for Type Parameters.

    When the method Show calls the method Display using the Type Parameter variable (temp.Display() as shown in the above code snippet), C# emits a callvirt on the approriate interface method, callvirt instance void Test.IDisplay::Display() (or a callvirt on the base class method if the Type Parameter is provided with a base class constraint). The callvirt IL instruction emitted by C# is preceded with constrained IL instruction also. This is because C# does not know the type of the Type Parameter during compile time. constrained IL instruction will allow CLR to check the type of the Type Parameter variable at runtime and if it is a Reference Type, a virtual dispatch to the method is done. If the typeof the Type Parameter variable is a Value Type, the call is dispatched as explained in the Virtual Method Dispatch (Method Call) section.

    Though the type of the Type Parameter is known during runtime CLR does not modify the IL of the method. This makes sense when a Type Parameter is a Reference Type because all the closed Generic Types with any Reference Type share the same method code for a method of the open Generic Type. But when a closed Generic Type is constructed using just Value Types, even then CLR doe not modify the IL of the closed Generic Type method to make NON-constrianed calls. This is one of the questions for which I do not have answer yet.

    Generic Value Types

    CLR handles the Generic Value Types similar to that of the Generic Reference Types, like using System.__Canon type, using constrained virtual method calls, etc.

    References

  • License

    This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

    A list of licenses authors might use can be found here