Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

A Possible Enhancement of the MSIL

0.00/5 (No votes)
24 Oct 2002 1  
Microsoft Intermediate Language (MSIL) may be improved by adding a few new instructions

Abstract

Microsoft’s .NET Framework is a new platform designed to simplify the development of distributed applications. At the bottom of the .NET lies the CLR, the so called common language runtime. CLR provides an environment that controls the .NET code, such as memory management, thread execution, code checking, security services and garbage collection. CLR also implements a strict infrastructure called CTS (common type subsystem) for checking the code and types, i.e. types created in one language can be safely used and referred to in another language. The managed execution environment solves certain problems related to the traditional native code execution. The CLR automatically handles the arrangement of object and their references, freeing them when they are no longer necessary, thus avoiding invalid references and memory leaks. The CLR does not rely on the native x86 instruction set. Instead, Microsoft has created an intermediate language (IL) that is generated by all compilers with support for the .NET platform. Microsoft Intermediate Language (MSIL) is a processor independent instruction set and includes instructions for execution control, direct memory access, exception handling, logic and arithmetic operations, as well as instructions for loading, storing and initialization of object. The MSIL code cannot be directly executed; it must be converted to a processor-specific code by a JIT (just-in-time) compiler. Any supported architecture must provide its specific JIT compiler.

When I was writing the code generator for the Delta Forth .NET compiler (http://www.dataman.ro/dforth) I discovered that the IL could be improved to a certain extent. This paper is about a proposed enhancement for the Microsoft Intermediate Language.

Microsoft Intermediate Language

MSIL is different from what we know about the traditional assembly languages. Although endowed with low level instructions, the language also works with high level concepts such as exception handling and memory management. The IL also has a virtual parameter stack that can hold objects of different types, such as strings, integers, floats, pointers, etc. The stack discipline is stricter, meaning that at the time of execution of a command, the stack must hold the exact number of parameters needed. At the opposite side, the x86 assembly language allows a random number of parameters on the stack and even provides and instruction for cleaning the stack upon leaving a procedure (see ret n).

Most of the instructions deal with stack manipulation:

  • LOAD instructions (ldarg, ldloc, ldnull, etc.) which push values onto the stack
  • STORE instructions (starg, stloc, stind, etc.) which pop values off the stack
  • DUP instruction – duplicates the element on top of stack
  • POP instruction – removes the element on top of stack

For more information about the IL instruction set, have a look at [2].

Although sufficient for achieving their goal, the IL instruction set seems to lead to the generation of inefficient code. In the development process of a Forth compiler for the .NET platform I discovered that many times the generated code is longer than it should be fact due to the lack of some simple instructions. In order for the reader to understand the code sequences that follow, it is appropriate to get to know some of the stack-handling primitives of the Forth programming language:

Forth Primitive

Stack Transition

Explanation

DUP

( A – A A )

Duplicates the element on top of stack

OVER

( A B – A B A )

Duplicates the second element on top of stack

SWAP

( A B – B A )

Swaps the two top-most elements on the stack

ROT

( A B C – B C A )

Rotates the three top-most elements on the stack

1+

(A – A+1)

Increments by 1 the element on top of stack

2+

(A – A+2)

Increments by 2 the element on top of stack

The stack transition column shows the stack status before and after the execution of the primitive. The two statuses are delimited by the dash sign.

Solving mathematical expressions in Forth is quite complicated for the programmer used to traditional imperative languages, however there’s nothing special except for the use of the Reverse Polish Notation (RPN). For further reading I recommend [3].

Let’s take for example the expression:

(A + B) * (A – B):

The postfix equivalent is:

A B + A B - *

The sequence of Forth primitives that solves the expression (assuming that the values A and B are already on the stack in this order) is:

OVER (A B – A B A)
OVER (A B A – A B A B)
+ (A B A B – A B C), where C = A + B
ROT (A B C – B C A)
ROT (B C A – C A B)
- (C A B – C D), where D = C – B
* (C D – E), where E = C * D

Understanding the code above will help you understand the IL code snippets that follow. Let’s move on and look…

Under the Hood

It is interesting to see the IL code generated at compilation of a program written in a language with support for .NET. My language of choice is C# which seems very promising. To repeat the experiments below you need Visual Studio .NET and .NET Framework IL Disassembler.

The class we will use for testing is very simple and is easily understood even by beginners:

class UnderTheHood
{
    private static int[] x = new int [] {1, 2, 3};
    private static int i = 0, j = 0, t = 0;
  
    static void Main(string[] args)
    {
        // Here we will insert code snippets

    }
}

Let’s take the first example:

x[i + 1] = i + 2;

The code generated by the compiler in the function Main is:

IL_0000:  ldsfld     int32[] CallTest.UnderTheHood::x
IL_0005:  ldsfld     int32 CallTest.UnderTheHood::i
IL_000a:  ldc.i4.1
IL_000b:  add
IL_000c:  ldsfld     int32 CallTest.UnderTheHood::i
IL_0011:  ldc.i4.2
IL_0012:  add
IL_0013:  stelem.i4
IL_0014:  ret

The first line and the second lines push on the stack the value of the static variable x and I respectively. The value of i gets incremented (by pushing 1 and performing the addition). The increment operation (this time with 2) is repeated in lines IL_000c and IL_0012. Eventually, the result is stored at its destination in line IL_0013.

At a first glance, we see that the code sequence computes the two indexes i+1 and i +2 respectively. If we recall the 1+ Forth primitive from the previous paragraph, we see that using it would lead to a greater efficiency especially since most modern processors have INC and DEC instructions. Translation from the IL to native code would be immediate. The modified sequence could look as follows:

IL_0000:	ldsfld		int32[] CallTest.UnderTheHood::x
IL_0005:	ldsfld		int32 CallTest.UnderTheHood::i
IL_000a:	ldc.i4.1
IL_000b:	add
IL_000c:	dup
IL_000d:	inc.1		// Non-existent mnemonic, see text

IL_000e:	stelem.i4
IL_000f:	ret

I chose the mnemonic inc.1, meaning that the top value on the stack is incremented by 1. Similarly we can write inc.2, dec.1, dec.2 for other operations. The gain is 7 bytes.

Let’s redo the experiment, this time with the following sequence:

x[i + 1] += 1;

The IL sequence in this case is:

.locals (	[0] int32[] CS$00000002$00000000,
	[1] int32 CS$00000002$00000001)
	
IL_0000:  ldsfld		int32[] CallTest.UnderTheHood::x
IL_0005:  dup
IL_0006:  stloc.0
IL_0007:  ldsfld		int32 CallTest.UnderTheHood::i
IL_000c:  ldc.i4.1
IL_000d:  add
IL_000e:  dup
IL_000f:  stloc.1
IL_0010:  ldloc.0
IL_0011:  ldloc.1
IL_0012:  ldelem.i4
IL_0013:  ldc.i4.1
IL_0014:  add
IL_0015:  stelem.i4
IL_0016:  ret

We see that the compiler has automatically generated two anonymous local variables to keep the values for x (line IL_0006) and i+1 (line IL_000f). This sequence can be substantially optimized using the OVER Forth primitive, which duplicates the second top-most element on the stack. The rewritten IL code looks like this:

IL_0000:  ldsfld		int32[] CallTest.UnderTheHood::x
IL_0005:  dup
IL_0006:  stloc.0
IL_0007:  ldsfld		int32 CallTest.UnderTheHood::i
IL_000c:  inc.1		// Non-existent mnemonic, see text

IL_000d:  over		// Non-existent mnemonic, see text

IL_000e:  over
IL_000f:  ldelem.i4
IL_0010:  ldc.i4.1
IL_0011:  add
IL_0012:  stelem.i4
IL_0013:  ret

The gain is 6 bytes.

A rather usual sequence is swapping elements found in two variables. The C# program looks like this:

t = i;
i = j;
j = t;

The equivalent IL code is:

IL_0000:	ldsfld	int32 CallTest.UnderTheHood::i
IL_0005:	stsfld	int32 CallTest.UnderTheHood::t
IL_000a:	ldsfld	int32 CallTest.UnderTheHood::j
IL_000f:	stsfld	int32 CallTest.UnderTheHood::i
IL_0014:	ldsfld	int32 CallTest.UnderTheHood::t
IL_0019:	stsfld	int32 CallTest.UnderTheHood::j
IL_001e:	ret

Note that the IL sequence follows the C# program closely. It would be interesting if the high-level language natively provided the programmer a SWAP instruction, so that previous code snippet could be rewritten as follows:

IL_0000:	ldsfld	int32 CallTest.UnderTheHood::i
IL_0005:	ldsfld	int32 CallTest.UnderTheHood::j
IL_000a:	swap	// Non-existent mnemonic, see text

IL_000f:	stsfld	int32 CallTest.UnderTheHood::j
IL_0014:	stsfld	int32 CallTest.UnderTheHood::i
IL_0019:	ret

The gain is here 9 bytes, without taking into account the space saved by not declaring an intermediate temporary variable (t). The code can be further refined by replacing the SWAP operation completely:

IL_0000:  ldsfld     int32 CallTest.UnderTheHood::i
IL_0005:  ldsfld     int32 CallTest.UnderTheHood::j
IL_000a:  stsfld     int32 CallTest.UnderTheHood::i
IL_000f:  stsfld     int32 CallTest.UnderTheHood::j
IL_0014:  ret

Conclusion

Even though at a first glance the gain is space is small, in reality things are slightly different. Such code sequences are relatively frequent in every-day applications. If we consider that a medium-sized application may contain approximately 200 of such optimizable constructs and that the medium gain is around 8 bytes per construct, we save 1.6KB of code space which is not negligible for embedded systems.

The space gain is not the only argument in the favor of using the new IL instructions. The native code generated by the JIT may benefit from using these instructions by a more efficient conversion, since some of the dedicated processor instructions can be used.

Biography

[1] Microsoft Developer Network, http://msdn.microsoft.com
[2] Common Language Infrastructure, Partition III (CIL Instruction Set)
[3] Forth Expressions, Valer BOCAN, NET Report, January 2002 (Romanian)
[4] Delta Forth .NET Development System, (C)1997-2002 Valer BOCAN, http://www.dataman.ro/dforth

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here