Abstract
Microsoft’s .NET Framework is a new platform designed to simplify the
development of distributed applications. At the bottom of the .NET lies the
CLR, the so called common language runtime. CLR provides an environment
that controls the .NET code, such as memory management, thread execution, code
checking, security services and garbage collection. CLR also implements a strict
infrastructure called CTS (common type subsystem) for checking the code
and types, i.e. types created in one language can be safely used and referred
to in another language. The managed execution environment solves certain problems
related to the traditional native code execution. The CLR automatically handles
the arrangement of object and their references, freeing them when they are no
longer necessary, thus avoiding invalid references and memory leaks. The CLR
does not rely on the native x86 instruction set. Instead, Microsoft has created
an intermediate language (IL) that is generated by all compilers with support
for the .NET platform. Microsoft Intermediate Language (MSIL) is a processor
independent instruction set and includes instructions for execution control,
direct memory access, exception handling, logic and arithmetic operations, as
well as instructions for loading, storing and initialization of object. The
MSIL code cannot be directly executed; it must be converted to a processor-specific
code by a JIT (just-in-time) compiler. Any supported architecture must
provide its specific JIT compiler.
When I was writing the code generator for the Delta Forth .NET compiler (http://www.dataman.ro/dforth)
I discovered that the IL could be improved to a certain extent. This paper is
about a proposed enhancement for the Microsoft Intermediate Language.
Microsoft Intermediate Language
MSIL is different from what we know about the traditional assembly languages.
Although endowed with low level instructions, the language also works with
high level concepts such as exception handling and memory management. The
IL also has a virtual parameter stack that can hold objects of different types,
such as strings, integers, floats, pointers, etc. The stack discipline is
stricter, meaning that at the time of execution of a command, the stack must
hold the exact number of parameters needed. At the opposite side, the x86
assembly language allows a random number of parameters on the stack and even
provides and instruction for cleaning the stack upon leaving a procedure (see
ret n).
Most of the instructions deal with stack manipulation:
- LOAD instructions (ldarg, ldloc, ldnull, etc.) which push values
onto the stack
- STORE instructions (starg, stloc, stind, etc.) which pop values off
the stack
- DUP instruction – duplicates the element on top of stack
- POP instruction – removes the element on top of stack
For more information about the IL instruction set, have a look at [2].
Although sufficient for achieving their goal, the IL instruction set seems
to lead to the generation of inefficient code. In the development process
of a Forth compiler for the .NET platform I discovered that many times the
generated code is longer than it should be fact due to the lack of some simple
instructions. In order for the reader to understand the code sequences that
follow, it is appropriate to get to know some of the stack-handling primitives
of the Forth programming language:
Forth Primitive |
Stack Transition |
Explanation |
DUP |
( A – A A ) |
Duplicates the element on top
of stack |
OVER |
( A B – A B A ) |
Duplicates the second element
on top of stack |
SWAP |
( A B – B A ) |
Swaps the two top-most elements
on the stack |
ROT |
( A B C – B C A ) |
Rotates the three top-most elements
on the stack |
1+ |
(A – A+1) |
Increments by 1 the element
on top of stack |
2+ |
(A – A+2) |
Increments by 2 the element
on top of stack |
The stack transition column shows the stack status before and after the execution
of the primitive. The two statuses are delimited by the dash sign.
Solving mathematical expressions in Forth is quite complicated for the programmer
used to traditional imperative languages, however there’s nothing special
except for the use of the Reverse Polish Notation (RPN). For further reading
I recommend [3].
Let’s take for example the expression:
(A + B) * (A – B):
The postfix equivalent is:
A B + A B - *
The sequence of Forth primitives that solves the expression (assuming that
the values A and B are already on the stack in this order) is:
OVER (A B – A B A)
OVER (A B A – A B A B)
+ (A B A B – A B C), where C = A + B
ROT (A B C – B C A)
ROT (B C A – C A B)
- (C A B – C D), where D = C – B
* (C D – E), where E = C * D
Understanding the code above will help you understand the IL code snippets
that follow. Let’s move on and look…
Under the Hood
It is interesting to see the IL code generated at compilation of a program
written in a language with support for .NET. My language of choice is C# which
seems very promising. To repeat the experiments below you need Visual Studio
.NET and .NET Framework IL Disassembler.
The class we will use for testing is very simple and is easily understood
even by beginners:
class UnderTheHood
{
private static int[] x = new int [] {1, 2, 3};
private static int i = 0, j = 0, t = 0;
static void Main(string[] args)
{
}
}
Let’s take the first example:
x[i + 1] = i + 2;
The code generated by the compiler in the function Main is:
IL_0000: ldsfld int32[] CallTest.UnderTheHood::x
IL_0005: ldsfld int32 CallTest.UnderTheHood::i
IL_000a: ldc.i4.1
IL_000b: add
IL_000c: ldsfld int32 CallTest.UnderTheHood::i
IL_0011: ldc.i4.2
IL_0012: add
IL_0013: stelem.i4
IL_0014: ret
The first line and the second lines push on the stack the value of the static
variable x and I respectively. The value of i gets incremented (by pushing
1 and performing the addition). The increment operation (this time with 2)
is repeated in lines IL_000c and IL_0012. Eventually, the result is stored
at its destination in line IL_0013.
At a first glance, we see that the code sequence computes the two indexes i+1
and i +2 respectively. If we recall the 1+ Forth primitive from the previous
paragraph, we see that using it would lead to a greater efficiency especially
since most modern processors have INC and DEC instructions. Translation from
the IL to native code would be immediate. The modified sequence could look as
follows:
IL_0000: ldsfld int32[] CallTest.UnderTheHood::x
IL_0005: ldsfld int32 CallTest.UnderTheHood::i
IL_000a: ldc.i4.1
IL_000b: add
IL_000c: dup
IL_000d: inc.1
IL_000e: stelem.i4
IL_000f: ret
I chose the mnemonic inc.1, meaning that the top value on the stack is incremented
by 1. Similarly we can write inc.2, dec.1, dec.2 for other operations. The
gain is 7 bytes.
Let’s redo the experiment, this time with the following sequence:
x[i + 1] += 1;
The IL sequence in this case is:
.locals ( [0] int32[] CS$00000002$00000000,
[1] int32 CS$00000002$00000001)
IL_0000: ldsfld int32[] CallTest.UnderTheHood::x
IL_0005: dup
IL_0006: stloc.0
IL_0007: ldsfld int32 CallTest.UnderTheHood::i
IL_000c: ldc.i4.1
IL_000d: add
IL_000e: dup
IL_000f: stloc.1
IL_0010: ldloc.0
IL_0011: ldloc.1
IL_0012: ldelem.i4
IL_0013: ldc.i4.1
IL_0014: add
IL_0015: stelem.i4
IL_0016: ret
We see that the compiler has automatically generated two anonymous local
variables to keep the values for x (line IL_0006) and i+1 (line IL_000f).
This sequence can be substantially optimized using the OVER Forth primitive,
which duplicates the second top-most element on the stack. The rewritten IL
code looks like this:
IL_0000: ldsfld int32[] CallTest.UnderTheHood::x
IL_0005: dup
IL_0006: stloc.0
IL_0007: ldsfld int32 CallTest.UnderTheHood::i
IL_000c: inc.1
IL_000d: over
IL_000e: over
IL_000f: ldelem.i4
IL_0010: ldc.i4.1
IL_0011: add
IL_0012: stelem.i4
IL_0013: ret
The gain is 6 bytes.
A rather usual sequence is swapping elements found in two variables. The C#
program looks like this:
t = i;
i = j;
j = t;
The equivalent IL code is:
IL_0000: ldsfld int32 CallTest.UnderTheHood::i
IL_0005: stsfld int32 CallTest.UnderTheHood::t
IL_000a: ldsfld int32 CallTest.UnderTheHood::j
IL_000f: stsfld int32 CallTest.UnderTheHood::i
IL_0014: ldsfld int32 CallTest.UnderTheHood::t
IL_0019: stsfld int32 CallTest.UnderTheHood::j
IL_001e: ret
Note that the IL sequence follows the C# program closely. It would be interesting
if the high-level language natively provided the programmer a SWAP instruction,
so that previous code snippet could be rewritten as follows:
IL_0000: ldsfld int32 CallTest.UnderTheHood::i
IL_0005: ldsfld int32 CallTest.UnderTheHood::j
IL_000a: swap
IL_000f: stsfld int32 CallTest.UnderTheHood::j
IL_0014: stsfld int32 CallTest.UnderTheHood::i
IL_0019: ret
The gain is here 9 bytes, without taking into account the space saved by
not declaring an intermediate temporary variable (t). The code can be further
refined by replacing the SWAP operation completely:
IL_0000: ldsfld int32 CallTest.UnderTheHood::i
IL_0005: ldsfld int32 CallTest.UnderTheHood::j
IL_000a: stsfld int32 CallTest.UnderTheHood::i
IL_000f: stsfld int32 CallTest.UnderTheHood::j
IL_0014: ret
Conclusion
Even though at a first glance the gain is space is small, in reality things
are slightly different. Such code sequences are relatively frequent in every-day
applications. If we consider that a medium-sized application may contain approximately
200 of such optimizable constructs and that the medium gain is around 8 bytes
per construct, we save 1.6KB of code space which is not negligible for embedded
systems.
The space gain is not the only argument in the favor of using the new IL instructions.
The native code generated by the JIT may benefit from using these instructions
by a more efficient conversion, since some of the dedicated processor instructions
can be used.
Biography
[1] Microsoft Developer Network, http://msdn.microsoft.com
[2] Common Language Infrastructure, Partition III (CIL Instruction Set)
[3] Forth Expressions, Valer BOCAN, NET
Report, January 2002 (Romanian)
[4] Delta Forth .NET Development System, (C)1997-2002 Valer BOCAN, http://www.dataman.ro/dforth