Introduction to CLR World

Pedro Benevides

4.93/5 (29 votes)

4 Nov 2015CPOL5 min read

33K

It is important to know how things work behind the scenes in .NET.

Introduction

.NET Framework, specially C# language, is widely used all over the world because it is very easy, powerful and productive. But just a few developers bother to know how .NET works. So let's start diving into this world.

Background

Languages in the framework are interchangeable, this is possible, in part, thanks to a series of rules and pattern that every language must follow, and are specified in the CTS (Common Type System) and CLS (Common Language Specification), every language of the framework, when it is compiled, is also transformed into an intermediate language, that has a common syntax to every high level languages of the framework. This language is MSIL (Microsoft Intermediate Language), that is when we write such a code in C#:

public void FooSum()
{
    var number1 = 100;
    var number2 = 400;

    Console.WriteLine(number1 + number2);
}

The compiler, turns your code into this IL:

MSIL

.method public hidebysig 
	instance void FooSum () cil managed 
{
	// Method begins at RVA 0x2070
	// Code size 18 (0x12)
	.maxstack 2
	.locals init (
		[0] int32 number1,
		[1] int32 number2
	)

	IL_0000: ldc.i4.s 100
	IL_0002: stloc.0
	IL_0003: ldc.i4 400
	IL_0008: stloc.1
	IL_0009: ldloc.0
	IL_000a: ldloc.1
	IL_000b: add
	IL_000c: call void [mscorlib]System.Console::WriteLine(int32)
	IL_0011: ret
}

The basic workflow that a program through:

The development of the code
When the code is built, the C# compiler will turn C# into IL
As the code is executed, CLR will turn the IL code into machine code

It is important to say that the conversion of IL into machine code is by demand, which means, that only the part of the code that will be executed at that time that will be converted.

Note: Microsoft is about to release a precompilation technology for building Windows Apps known as .NET Native, and a big advantage is that it automatically compiles version of apps that are written in managed code, directly to native code, and .NET runtime does not have to be installed into client's machine.

Memory Management

To manage all that flow, CLR makes an abstraction of computer's memory, and is divided into some parts:

Local Variables: This part will keep all variables that are created inside the method that is in execution.
Parameter Variables: Like the name suggests, this part of memory is where the parameters that are passed to the method will stay.
Static Attributes: Every static variable.
Stack: This is the most important memory, because all the others will "communicate" with the stack. This is where all value-type objects will stay. In fact, every object will be passed to the stack, is the object itself, or just the reference to it. The size of stack is informed at the beginning of the entry point, like the main method.
Heap: All reference-type objects will stay here, which means when we create a new object, what CLR does, is to put this object into the heap, and when this object is about to be used, a reference to this object will be put into the stack, a pointer, to be more specific, and every time this object is needed, CLR checks into the stack, his address, and goes there to get the information.
Dynamic: Here will stay objects whose size is not known at compile time, only in execution, like most part of collections.

Types in .NET

The framework classifies types, basically in two categories, Value Type and Reference Type.

A value type is the most "cheap" object to use, because it directly contains the value, which means, the object carries its own values), and it is allocated into the Stack. It is not possible to derive from a value type, and each value type is derived from the System.Value.Type, and has a implicit constructor to initialize the default value of that type. The most used value types are: bool, int, float, long, char, struct. When the program needs to use a value type variable, CLR allocates the size and pushes the object into the stack.

A reference type is retained into the Heap memory, which is more expensive to obtain, because the way CLR treats reference types, each type has a overhead of 8 bytes, referring to two extra fields, the Sync Block Index and Object Type Pointer. The most important one is Object Type Pointer, that keeps a reference to a data structure, that describes the object type of that element. So, to get an element from Heap, the CLR checks into the stack, the address that the pointer keeps, then go there, and gets the values into the heap, and check its type, even going to the saved address in Object Type Pointer.

To prove to you, we can use Visual Studio to debug our application, a simple one, and show in debug mode, how CLR works, so let's do it:

Using the Code

Let's create a simple class, called Person:

public class Person
{
    public int Id { get; set; }
    public string Name { get; set; }
}

And now in our Console Application, we create into Program.cs two instances of Person, declare an Id and a Name, only to facilitate learning.

static void Main(string[] args)
{
    var person1 = new Person();
    person1.Name = "Pedro";
    person1.Id = 1;

    var person2 = new Person();
    person2.Name = "John";
    person2.Id = 2;
}

When we run our application, let's put the breakpoint at the starting line, and open some debug windows, let's use Memory window and Registers window. When we instantiate our first person, let's copy the value on register EAX, and paste it in the Address textbox in memory window.

Now we can see our instance in the Heap memory.

Here, we can see that our object person1 is located at the 0x025017D4 start address, to 0x025017E0 end address.

(0x025017D4): The initial address keeps the Sync Block Index, that is used when the object needs some type of synchronization.
(0x025017D8): This address contains the position in memory where the data structure that represents the Person type, if we have two objects of the same type, the Object Type Pointer will have the same value, i.e., points to the same structure.
(0x025017DC): Contains the position of memory where this field is in Heap memory, since its string is Reference type.
(0x025017E0): Keeps the value itself, since it is an integer, that is a value type.

After covering all lines, we can see that the objects are distincts, but has the same type, since it points to the same address.

So that's it, CLR is a much bigger world, this was just an introduction, of what happens when we run our code.

References

Infosec Institute: http://resources.infosecinstitute.com/net-framework-clr-common-language-runtime/
Elemar Júnior: http://elemarjr.net/
Alberto Monteiro: http://blog.albertomonteiro.net.br/2013/03/25/conhecendo-intermediate-language-il-revista-net-magazine-99/
MSDN: https://msdn.microsoft.com/pt-br/library/8bs2ecf4(v=vs.110).aspx

History

5^th November, 2015: Initial version

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)