Whilst writing a previous blog post I stumbled across the .NET Interpreter, tucked away in the source code. Although, it I'd made even the smallest amount of effort to look for it, I'd have easily found it via the GitHub magic file search:
Usage Scenarios
Before we look at how to use it and what it does, its worth pointing out that the Interpreter is not really meant for production code. As far as I can tell, its main purpose is to allow you to get the CLR up and running on a new CPU architecture. Without the interpreter you wouldnt be able to test any C# code until you had a fully functioning JIT that could emit machine code for you. For instance see [ARM32/Linux] Initial bring up of FEATURE_INTERPRETER and [aarch64] Enable the interpreter on linux as well.
Also it doesnt have a few key features, most notable debugging support, that is you cant debug through C# code that has been interpreted, although you can of course debug the interpreter itself. From Tiered Compilation step 1:
¦. - the interpreter is not in good enough shape to run production code as-is. There are also some significant issues if you want debugging and profiling tools to work (which we do).
You can see an example of this in Interpreter: volatile ldobj appears to have incorrect semantics? (thanks to alexrp for telling me about this issue). There is also a fair amount of TODO
comments in the code, although I havent verified what (if any) specific C# code breaks due to the missing functionality.
However, I think another really useful scenario for the Interpreter is to help you learn about the inner workings of the CLR. Its only 8,000 lines long, but its all in one file and most significantly its written in C++. The code that the CLR/JIT uses when compiling for real is in multiple several files (the JIT on its own is over 200,000 L.O.C, spread across 100s of files) and there are large amounts hand-written written in raw assembly.
In theory the Interpreter should work in the same way as the full runtime, albeit not as optimised. This means that it much simpler and those of us who arent CLR and/or assembly experts can have a chance of working out whats going on!
Enabling the Interpreter
The Interpreter is disabled by default, so you have to build the CoreCLR from source to make it work (it used to be the fallback for ARM64 but thats no longer the case), heres the diff of the changes you need to make:
--- a/src/inc/switches.h
+++ b/src/inc/switches.h
@@ -233,5 +233,8 @@
#define FEATURE_STACK_SAMPLING
#endif // defined (ALLOW_SXS_JIT)
++#define FEATURE_INTERPRETER
+
#endif // !defined(CROSSGEN_COMPILE)
You also need to enable some environment variables, the ones that I used are in the table below. For the full list, take a look at Host Configuration Knobs and search for Interpreter.
Name | Description |
---|
Interpret | Selectively uses the interpreter to execute the specified methods |
InterpreterDoLoopMethods | If set, dont check for loops, start by interpreting all methods |
InterpreterPrintPostMortem | Prints summary information about the execution to the console |
DumpInterpreterStubs | Prints all interpreter stubs that are created to the console |
TraceInterpreterEntries | Logs entries to interpreted methods to the console |
TraceInterpreterIL | Logs individual instructions of interpreted methods to the console |
TraceInterpreterVerbose | Logs interpreter progress with detailed messages to the console |
TraceInterpreterJITTransition | Logs when the interpreter determines a method should be JITted |
To test out the Interpreter, I will be using the code below:
public static void Main(string[] args)
{
var max = 1000 * 1000;
if (args.Length > 0)
int.TryParse(args[0], out max);
var timer = Stopwatch.StartNew();
for (int i = 1; i <= max; i++)
{
if (i % (1000 * 100) == 0)
Console.WriteLine(string.Format("Completed {0,10:N0} iterations", i));
}
timer.Stop();
Console.WriteLine(string.Format("Performed {0:N0} iterations, max);
Console.WriteLine(string.Format("Took {0:N0} msecs", timer.ElapsedMilliseconds));
Console.WriteLine();
}
which on my machine, gives the following results for 100,000
iterations:
Run | Compiled (msecs) | Interpreted (msecs) |
---|
1 | 11 | 4,393 |
2 | 11 | 4,089 |
3 | 9 | 4,416 |
So yeah, you dont want to be using the interpreter for any performance sensitive code!!
Diagnostic Output
In addition, a diagnostic output is produced. Note, this is from a single iteration of the loop, otherwise it becomes to verbose to read.
Generating interpretation stub (# 1 = 0x1, hash = 0x91b7d02e) for ConsoleApplication.Program:Main.
Skipping ConsoleApplication.Program:.cctor
Entering method #1 (= 0x1): ConsoleApplication.Program:Main(class).
arguments:
0: class: 0x0000000002C50568 (System.String[]) [...]
START 1, ConsoleApplication.Program:Main(class)
0: nop
0x1: call
Skipping ConsoleApplication.Stopwatch:.cctor
Skipping DomainBoundILStubClass:IL_STUB_PInvoke
Skipping ConsoleApplication.Stopwatch:StartNew
Skipping ConsoleApplication.Stopwatch:.ctor
Skipping ConsoleApplication.Stopwatch:Reset
Skipping ConsoleApplication.Stopwatch:Start
Skipping ConsoleApplication.Stopwatch:GetTimestamp
Returning to method ConsoleApplication.Program:Main(class), stub num 1.
0x6: stloc.0
loc0 : class: 0x0000000002C50580 (ConsoleApplication.Stopwatch) [...]
loc1 : int: 0
loc2 : bool: false
0x7: ldc.i4.1
0x8: stloc.1
loc0 : class: 0x0000000002C50580 (ConsoleApplication.Stopwatch) [...]
loc1 : int: 1
loc2 : bool: false
0x9: br.s
0x27: ldloc.1
0x28: ldc.i4.2
0x29: clt
0x2b: stloc.2
loc0 : class: 0x0000000002C50580 (ConsoleApplication.Stopwatch) [...]
loc1 : int: 1
loc2 : bool: true
0x2c: ldloc.2
0x2d: brtrue.s
0xb: nop
0xc: ldstr
0x11: ldloc.1
0x12: box
0x17: call
Returning to method ConsoleApplication.Program:Main(class), stub num 1.
0x1c: call
Completed 1 iterations
Returning to method ConsoleApplication.Program:Main(class), stub num 1.
0x21: nop
0x22: nop
0x23: ldloc.1
0x24: ldc.i4.1
0x25: add
0x26: stloc.1
loc0 : class: 0x0000000002C50580 (ConsoleApplication.Stopwatch) [...]
loc1 : int: 2
loc2 : bool: true
0x27: ldloc.1
0x28: ldc.i4.2
0x29: clt
0x2b: stloc.2
loc0 : class: 0x0000000002C50580 (ConsoleApplication.Stopwatch) [...]
loc1 : int: 2
loc2 : bool: false
0x2c: ldloc.2
0x2d: brtrue.s
0x2f: ldloc.0
0x30: callvirt
Skipping ConsoleApplication.Stopwatch:Stop
Returning to method ConsoleApplication.Program:Main(class), stub num 1.
0x35: nop
0x36: ldstr
0x3b: ldloc.0
0x3c: callvirt
Skipping ConsoleApplication.Stopwatch:get_ElapsedMilliseconds
Skipping ConsoleApplication.Stopwatch:GetElapsedDateTimeTicks
Skipping ConsoleApplication.Stopwatch:GetRawElapsedTicks
Returning to method ConsoleApplication.Program:Main(class), stub num 1.
0x41: box
0x46: call
Returning to method ConsoleApplication.Program:Main(class), stub num 1.
0x4b: call
Took 33 msecs
Returning to method ConsoleApplication.Program:Main(class), stub num 1.
0x50: nop
0x51: ret
So you can clearly see the interpreter in action, executing the individual IL instructions and showing the current values of any local variables as it goes along. Then, once the entire program has run, you also get some nice summary statistics (this time from a full-run, with 100,000
iterations):
IL instruction profiling:
Instructions (24000085 total, 20000083 1-byte):
Instruction | execs | % | cum %
-------------------------------------------
ldloc.1 | 3000011 | 12.50% | 12.50%
ceq | 3000001 | 12.50% | 25.00%
ldc.i4.0 | 3000001 | 12.50% | 37.50%
nop | 2000013 | 8.33% | 45.83%
stloc.2 | 2000001 | 8.33% | 54.17%
ldc.i4 | 2000001 | 8.33% | 62.50%
brtrue.s | 2000001 | 8.33% | 70.83%
ldloc.2 | 2000001 | 8.33% | 79.17%
ldc.i4.1 | 1000001 | 4.17% | 83.33%
cgt | 1000001 | 4.17% | 87.50%
stloc.1 | 1000001 | 4.17% | 91.67%
rem | 1000000 | 4.17% | 95.83%
add | 1000000 | 4.17% | 100.00%
call | 23 | 0.00% | 100.00%
ldstr | 11 | 0.00% | 100.00%
box | 11 | 0.00% | 100.00%
ldloc.0 | 2 | 0.00% | 100.00%
callvirt | 2 | 0.00% | 100.00%
br.s | 1 | 0.00% | 100.00%
stloc.0 | 1 | 0.00% | 100.00%
ret | 1 | 0.00% | 100.00%
Main sections of the Interpreter code
Now weve seen it in action, lets take a look at the code within the Interpreter and see how it works
Top-level dispatcher
At the heart of the Interpreter is a giant switch statement (in Interpreter::ExecuteMethod(..)
), that is almost 1,200 lines long! In it youll find lots of code like this:
switch (*m_ILCodePtr)
{
case CEE_NOP:
m_ILCodePtr++;
continue;
case CEE_BREAK: <span class="c1"> m_ILCodePtr++;
continue;
case CEE_LDARG_0:
LdArg(<span class="mi">0);
break;
case CEE_LDARG_1:
LdArg(<span class="mi">1);
break;
...
}
In total, there are 199 case
statements, corresponding to all the available CLR Intermediate Language (IL) op-codes, in all their different combinations, for instance CEE_LDC_??
, i.e. CEE_LDC_I4
, CEE_LDC_I8
, CEE_LDC_R4
and CEE_LDC_R8
. The large majority of the case
statements just call out to another function that does the actual work, although there are some exceptions, such as CEE_RET
.
Method calls
The other task that takes up lots of code in the interpreter is handling method calls, over 2,500 L.O.C in total! This is spread across several methods, each doing a particular part of the work:
In summary, this work involves dynamically generating stubs and ensuring that method arguments are in the right registers (hence the assembly code). It handles virtual methods, static and instance calls, delegates, intrinsics and probably a few other scenarios as well! In addition, if the method being called needs to be interpreted, it also has to make sure that happens.
Creating objects and arrays
The interpreter needs to handle some of the key functionality of a runtime, that is creating and initialising objects. To do this it has to call into the GC, before finally calling the constructor:
Boxing and Unboxing
Another large chuck of code is dedicated to boxing/unboxing, that is converting value types (structs
) into object
references when needed. The .NET IL provides specific op-codes to handle this:
Loading and Storing data
That is, reading/writing fields in an object or elements in an array:
Other Specific IL Op Codes
There is also a significant amount of code (over 1,000 lines) that just deals with low-level operations, that is comparisions, branching and basic arithmetic:
- INT32 Interpreter::CompareOpRes(..)
CEQ
, CGT
, CGT_UN
, CLT
& CLT_UN
called via Interpreter::CompareOp() BEQ
, BGE
, BGT
, BLE
, BLT
, BNE_UN
, BGE_UN
, BGT_UN
, BLE_UN
, BLT_UN
called via Interpreter::BrOnComparison()
- void Interpreter::BinaryArithOp()
- void Interpreter::BinaryArithOvfOp()
Working with the Garbage Collector (GC)
In addition, the interpreter has to provide the GC with the information it needs. This happens when the GC calls Interpreter::GCScanRoots(..), with additional work talking place in Interpreter::GCScanRootAtLoc(..). Very simply the interpreter has to let the GC know about any root objects that are currently live. This includes static variables and any local variables in the function that is currently executing.
When the interpreter locates a root object, it notifies the GC via a callback (pf(..)
):
void Interpreter::GCScanRootAtLoc(Object** loc, InterpreterType it, promote_func* pf, ScanContext* sc, bool pinningRef)
{
switch (it.ToCorInfoType())
{
case CORINFO_TYPE_CLASS:
case CORINFO_TYPE_STRING:
{
DWORD flags = <span class="mi">0;
if (pinningRef) flags |= GC_CALL_PINNED;
(*pf)(loc, sc, flags);
}
break;
....
}
}
Integration with the Virtual Machine (VM)
Finally, whilst the Interpreter is fairly self-contained, there are times where it needs to work with the rest of the runtime
The post The .NET IL Interpreter first appeared on my blog Performance is a Feature!