Background
Within any .NET assembly you have a mix of meta-data and instructions. The meta-data describes the various entities: assemblies, modules, types, fields, methods, properties, etc. For example, the meta-data for a method provides its name, return type, parameters, and calling convention. The instructions are exactly that: instructions. They tell your assembly / application what to do.
While not perfect, .NET provides a feature rich framework for accessing meta-data (in the System.Reflection
namespace). It also provides a fairly decent framework for creating new instructions (in the System.Reflection.Emit
namespace). It even allows you to access existing instructions, via the System.Reflection.MethodBody.GetILAsByteArray()
method.
Curiously, while creating new instructions is made easy, reflecting over existing instructions is made quite difficult. All that is provided for existing instructions is a byte array. The .NET framework provides no real support (reflection-wise) for decoding that byte array.
Introduction
I thought it might be a fun exercise to try to bridge that curious gap in coverage (between System.Reflection
and System.Reflection.Emit
). This article will describe a framework that can be used for that purpose. It will also describe some of the techniques used to create that framework.
This framework is not intended as a disassembler. There are already some excellent tools out there for that purpose. I have little interest in re-inventing that particular wheel.
The goals for this framework were as follows:
- Fit seamlessly into the pre-existing .NET reflection framework. Wherever possible, use pre-existing .NET types and methods.
- Provide a means of visualizing instructions. Due to limitations of the .NET framework, true fidelity is not easily achieved. That said, it is possible to get pretty darn close, using what .NET reflection does provide. That is the goal of this framework: close but not perfect.
- Provide a decent set of test cases. This should help if anyone wants to take this code any further.
For a more extensive, robust solution, one reader suggested Mono.Cecil
. While its not reflection-based, and I haven't personally used it, I do hear mostly good things. A link is included in the Additional Reading section at the end of this article.
Using the Code
To use this framework, take advantage of the GetIL()
extension method. This method extends the System.Reflection.MethodBase
class so that it will return a list of instructions. An example of its usage (found Program.cs
) is as follows:
var instructions = typeof(Program)
.GetMethod("Main")
.GetIL();
Console.WriteLine("********** Main (all instructions) **********");
foreach (Instruction instruction in instructions)
Console.WriteLine(instruction);
The common interface for all instructions is IInstruction
. It has the following members:
Member | Description |
IsTarget | A value indicating if the instruction is the target of a branch or switch instruction. |
Label | A label for the instruction. |
Offset | The byte offset of the start of the instruction. |
OpCode | The operation code (opcode) for the instruction. |
Parent | The list of instructions containing the instruction. |
GetOperand() | The operand for the instruction. |
GetValue() | The resolved value of the operand for the instruction. For example, for method instructions, GetOperand() returns a meta-data token and GetValue() returns an instance of System.Reflection.MethodBase . |
Resolve() | INTERNAL ONLY: Where possible, resolves an operand into a more meaningful value. |
If you're familiar with .NET reflection and the Common Intermediate Language (CIL), this should be all you need. If not, later in the article a brief primer on these topics is provided.
Decoding the Data
Most of the decoding is fairly straightforward.
Every instruction starts with an operation code (opcode). For example, if code is calling a method, you might expect a call
opcode (System.Reflection.Emit.OpCodes.Call
).
Within the data, this is currently represented as either an 8 bit or 16 bit code. If the first byte is 0xFE
(System.Reflections.Emit.OpCodes.Prefix1.Value
), its a 16 bit code (two bytes); otherwise, its an 8 bit code (one byte). At the time this article was first written, only 27 of the total of 226 opcodes require two bytes.
The two byte opcodes are as follows: arglist
(FE 00
), ceq
(FE 01
), cgt
(FE 02
), cgt.un
(FE 03
), clt
(FE 04
), clt.un
(FE 05
), ldftn
(FE 06
), ldvirtftn
(FE 07
), ldarg
(FE 09
), ldarga
(FE 0A
), starg
(FE 0B
), ldloc
(FE 0C
), ldloca
(FE 0D
), stloc
(FE 0E
), localloc
(FE 0F
), endfilter
(FE 11
), unaligned.
(FE 12
), volatile.
(FE 13
), tail.
(FE 14
), initobj
(FE 15
), constrained.
(FE 16
), cpblk
(FE 17
), initblk
(FE 18
), rethrow
(FE 1A
), sizeof
(FE 1C
), refanytype
(FE 1D
), readonly.
(FE 1E
).
For many instructions, with an operand type of OperandType.InlineNone
, this is the entirety of data for the instruction. For other instructions, with other types of operands, the data for the operand immediately follows the opcode. The full set of operand types, described by the System.Reflection.Emit.OperandType
enumeration, is as follows:
OperandType | Count | Description |
InlineBrTarget | 14 | The operand is a 32-bit integer branch target. |
InlineField | 6 | The operand is a 32-bit metadata token. |
InlineI | 1 | The operand is a 32-bit integer. |
InlineI8 | 1 | The operand is a 64-bit integer. |
InlineMethod | 6 | The operand is a 32-bit metadata token. |
InlineNone | 147 | No operand. |
InlinePhi | 0 | The operand is reserved and should not be used. |
InlineR | 1 | The operand is a 64-bit IEEE floating point number. |
InlineSig | 1 | The operand is a 32-bit metadata signature token. |
InlineString | 1 | The operand is a 32-bit metadata string token. |
InlineSwitch | 1 | The operand is the 32-bit integer argument to a switch instruction. |
InlineTok | 1 | The operand is a 32-bit FieldRef, MethodRef, or TypeRef metadata token. |
InlineType | 17 | The operand is a 32-bit metadata token. |
InlineVar | 6 | The operand is 16-bit integer containing the ordinal of a local variable or an argument. |
ShortInlineBrTarget | 14 | The operand is an 8-bit integer branch target. |
ShortInlineI | 2 | The operand is an 8-bit integer. |
ShortInlineR | 1 | The operand is a 32-bit IEEE floating point number. |
ShortInlineVar | 6 | The operand is an 8-bit integer containing the ordinal of a local variable or an argument. |
The InstructionList.TryCreate
method creates IInstruction
instances from the data in the byte array.
A singleton type AllOpCodes
reflects over the fields in System.Reflection.Emit.OpCodes
to build a table of information for all available operation codes (opcodes). This information includes the numeric value of the opcode and its operand type.
It is worth noting that almost all of the data is serialized as little endian values, where the least significant byte is found at the lowest offset. However, there are a couple of notable exceptions. Both opcodes and compressed integral values are stored in big endian format. The complexity of deserialization is handled by the extension methods in the Transeric.Reflection.ReadOnlyListExtensions
class.
There is a clear intent by the architects of Intermediate Language serialization to favor compactness. This is where most of the deserialization complexity arises.
This intent is evident in simple examples, like providing a single byte ldloc.1
instruction, when the five byte ldloc
instruction can provide the same functionality.
It is evident in slightly more complex examples, like metadata tokens. For example, instead of repeating the parameters for a method, for each call instruction, a four byte metadata token that references those parameters is provided.
This complexity reaches its highest point with the calli
instruction. Here, a four byte metadata token (for the signature) is provided. This token in turn references a compressed representation of the method's signature. The focus on compactness elsewhere is laudable. However, I wonder about the trade-off of complexity versus compactness in this particular instance. More honestly, its a real chore to write a description of signature tokens :)
Metadata Tokens
A number of opcodes have an operand that is a "32-bit metadata token". This token is essentially a unique number that can be used to locate the metadata information. The high order 8 bits of the token indicate the type of token (field, method, type, etc.). The low order 24 bits provide a unique identity within that pool of tokens.
The Transeric.Reflection.Token
type, included in the code accompanying this article, makes it easy to separate the parts of a metadata token.
By itself, a metadata token is not very useful. It is necessary to "resolve" the token into its associated metadata information. The System.Reflection.Module
class provides the following methods to accomplish this goal: ResolveField
, ResolveMember
, ResolveMethod
, ResolveSignature
, ResolveString
, and ResolveType
. Consider the following example:
Offset | Data | Description |
00 | 02 00 00 | The unique identity is 2. |
03 | 70 | The token type indicates a string token (TokenType.String ). |
Taken as a whole, this data indicates a reference to the second string in the string metadata table. The System.Reflection.Module.ResolveString
method is used to resolve this token as follows:
string text = module.ResolveString(metadataToken);
With entities that can take generic arguments, the situation is a bit more complex. Consider the following call to the DoSomething
method:
public class MyType<T1>
{
public static void MyMethod<T2>(T2 arg) =>
DoSomething<T1, T2>();
}
To fully resolve the token for the DoSomething
method, it is necessary to know both the type arguments for the enclosing type (MyType
) and the type arguments for the method containing the instruction (MyMethod
). Since the method containing the instruction is known, this information is easy to obtain.
To obtain the type arguments for the enclosing type (MyType
), the following call is necessary:
Type[] typeArguments = parentMethod.DeclaringType.GetGenericArguments();
To obtain the type arguments for the enclosing method (MyMethod
), the following call is necessary:
Type[] methodArguments = parentMethod.GetGenericArguments();
With this information, we can resolve the token for DoSomething
into its corresponding method information as follows:
MethodBase method = parentMethod.Module.ResolveMethod(metadataToken, typeArguments, methodArguments);
InlineBrTarget
Type: BranchInstruction<int>
The operand is a 32-bit signed integer that specifies the byte offset from the end of the instruction. It is initially decoded by the ReadOnlyListExtensions.ReadInt32
method, which reads the four bytes containing this value.
Later, using Transeric.Reflection.MethodIL.ResolveInstruction
, this offset is resolved into an instance of Transeric.Reflection.IInstruction
. This is accomplished by conducting a binary search for the instruction that occurs at that offset. Consider the following example:
Offset | Data | Description |
00 | 38 | The opcode indicates a branch instruction (OpCodes.Br ). |
01 | 0F 00 00 00 | The offset is 15 bytes (0F ) from the end of the instruction. Here, the offset from the beginning of the data is 20: 5 (end of instruction) plus 15 (branch). |
05 | | The end of the instruction. |
Opcodes (14): br
(38
), brfalse
(39
), brtrue
(3A
), beq
(3B
), bge
(3C
), bgt
(3D
), ble
(3E
), blt
(3F
), bne.un
(40
), bge.un
(41
), bgt.un
(42
), ble.un
(43
), blt.un
(44
), leave
(DD
)
InlineField
Type: FieldInstruction
The operand is a 32-bit metadata token. It is initially decoded by the ReadOnlyListExtensions.ReadToken
method, which reads the four bytes containing this value. Later, using System.Reflection.Module.ResolveField
, the token is resolved into an instance of System.Reflection.FieldInfo
. Consider the following example:
Offset | Data | Description |
00 | 7B | The opcode indicates a load field instruction (OpCodes.Ldfld ). |
01 | 02 00 00 | The unique identity of the metadata token is 2. |
04 | 04 | The token type indicates a field definition (TokenType.FieldDef ). |
Opcodes (6): ldfld
(7B
), ldflda
(7C
), stfld
(7D
), ldsfld
(7E
), ldsflda
(7F
), stsfld
(80
)
InlineI
Type: Instruction<int>
The operand is a signed 32-bit integer. It is decoded by the ReadOnlyListExtensions.ReadInt32
method, which reads the four bytes containing this value. Because no metadata is referenced, no resolution is required. Consider the following example:
Offset | Data | Description |
00 | 20 | The opcode indicates a load 32-bit constant instruction (OpCodes.Ldc_I4 ). |
01 | 00 01 00 00 | The value 256 is loaded. |
Opcode (1): ldc.i4
(20
)
InlineI8
Type: Instruction<long>
The operand is a signed 64-bit integer. It is decoded by the ReadOnlyListExtensions.ReadInt64
method, which reads the eight bytes containing this value. Because no metadata is referenced, no resolution is required. Consider the following example:
Offset | Data | Description |
00 | 21 | The opcode indicates a load 64-bit constant instruction (OpCodes.Ldc_I8 ). |
01 | 00 01 00 00 00 00 00 00 | The value 256 is loaded. |
Opcode (1): ldc.i8
(21
)
InlineMethod
Type: MethodInstruction
The operand is a 32-bit metadata token. It is initially decoded by the ReadOnlyListExtensions.ReadToken
method, which reads the four bytes containing this value. Later, using System.Reflection.Module.ResolveMethod
, the token is resolved into an instance of System.Reflection.MethodBase
.
Offset | Data | Description |
00 | 28 | The opcode indicates a call instruction (OpCodes.Call ). |
01 | 02 00 00 | The unique identity of the metadata token is 2. |
04 | 06 | The token type indicates a method definition (TokenType.MethodDef ). |
Opcodes (6): jmp
(27
), call
(28
), callvirt
(6F
), newobj
(73
), ldftn
(FE 06
), ldvirtftn
(FE 07
)
InlineNone
Types: Instruction
, ParameterInstruction<byte>
, or VariableInstruction<byte>
Since there is no operand, this type is the simplest to decode. It also has the largest number of opcodes (147) associated with it. Consider the following example:
Offset | Data | Description |
00 | 2A | The opcode indicates a return instruction (OpCodes.Ret ). |
While none of these opcodes have an operand, many do have an implied operand. To simplify reflection, in the cases where the instruction is associated with a parameter or local variable, the code will create an instance of ParameterInstruction<byte>
or VariableInstruction<byte>
, behaving as if the implied operand were present.
For example, the ldloc.1
instruction (below) implies an operand of "1". For this reason, the code will create an instance of VariableInstruction<byte>
, so that the operand's value will resolve into an instance of System.Reflection.LocalVariableInfo
.
Offset | Data | Description |
00 | 07 | The opcode indicates a load local variable instruction (OpCodes.Ldloc_1 ). |
Similarly, the ldarg.1
instruction (below), also implies an operand of "1". For this reason, the code will create an instance of ParameterInstruction<byte>
, so that the operand's value will resolve into an instance of System.Reflection.ParameterInfo
.
Offset | Data | Description |
00 | 03 | The opcode indicates a load local argument instruction (OpCodes.Ldarg_1 ). |
For information on how local variables and parameters are resolved, see the description of the operand type ShortInlineVar
.
Opcodes (147): nop
(00
), break
(01
), ldarg.0
(02
), ldarg.1
(03
), ldarg.2
(04
), ldarg.3
(05
), ldloc.0
(06
), ldloc.1
(07
), ldloc.2
(08
), ldloc.3
(09
), stloc.0
(0A
), stloc.1
(0B
), stloc.2
(0C
), stloc.3
(0D
), ldnull
(14
), ldc.i4.m1
(15
), ldc.i4.0
(16
), ldc.i4.1
(17
), ldc.i4.2
(18
), ldc.i4.3
(19
), ldc.i4.4
(1A
), ldc.i4.5
(1B
), ldc.i4.6
(1C
), ldc.i4.7
(1D
), ldc.i4.8
(1E
), dup
(25
), pop
(26
), ret
(2A
), ldind.i1
(46
), ldind.u1
(47
), ldind.i2
(48
), ldind.u2
(49
), ldind.i4
(4A
), ldind.u4
(4B
), ldind.i8
(4C
), ldind.i
(4D
), ldind.r4
(4E
), ldind.r8
(4F
), ldind.ref
(50
), stind.ref
(51
), stind.i1
(52
), stind.i2
(53
), stind.i4
(54
), stind.i8
(55
), stind.r4
(56
), stind.r8
(57
), add
(58
), sub
(59
), mul
(5A
), div
(5B
), div.un
(5C
), rem
(5D
), rem.un
(5E
), and
(5F
), or
(60
), xor
(61
), shl
(62
), shr
(63
), shr.un
(64
), neg
(65
), not
(66
), conv.i1
(67
), conv.i2
(68
), conv.i4
(69
), conv.i8
(6A
), conv.r4
(6B
), conv.r8
(6C
), conv.u4
(6D
), conv.u8
(6E
), conv.r.un
(76
), throw
(7A
), conv.ovf.i1.un
(82
), conv.ovf.i2.un
(83
), conv.ovf.i4.un
(84
), conv.ovf.i8.un
(85
), conv.ovf.u1.un
(86
), conv.ovf.u2.un
(87
), conv.ovf.u4.un
(88
), conv.ovf.u8.un
(89
), conv.ovf.i.un
(8A
), conv.ovf.u.un
(8B
), ldlen
(8E
), ldelem.i1
(90
), ldelem.u1
(91
), ldelem.i2
(92
), ldelem.u2
(93
), ldelem.i4
(94
), ldelem.u4
(95
), ldelem.i8
(96
), ldelem.i
(97
), ldelem.r4
(98
), ldelem.r8
(99
), ldelem.ref
(9A
), stelem.i
(9B
), stelem.i1
(9C
), stelem.i2
(9D
), stelem.i4
(9E
), stelem.i8
(9F
), stelem.r4
(A0
), stelem.r8
(A1
), stelem.ref
(A2
), conv.ovf.i1
(B3
), conv.ovf.u1
(B4
), conv.ovf.i2
(B5
), conv.ovf.u2
(B6
), conv.ovf.i4
(B7
), conv.ovf.u4
(B8
), conv.ovf.i8
(B9
), conv.ovf.u8
(BA
), ckfinite
(C3
), conv.u2
(D1
), conv.u1
(D2
), conv.i
(D3
), conv.ovf.i
(D4
), conv.ovf.u
(D5
), add.ovf
(D6
), add.ovf.un
(D7
), mul.ovf
(D8
), mul.ovf.un
(D9
), sub.ovf
(DA
), sub.ovf.un
(DB
), endfinally
(DC
), stind.i
(DF
), conv.u
(E0
), prefix7
(F8
), prefix6
(F9
), prefix5
(FA
), prefix4
(FB
), prefix3
(FC
), prefix2
(FD
), prefix1
(FE
), prefixref
(FF
), arglist
(FE 00
), ceq
(FE 01
), cgt
(FE 02
), cgt.un
(FE 03
), clt
(FE 04
), clt.un
(FE 05
), localloc
(FE 0F
), endfilter
(FE 11
), volatile.
(FE 13
), tail.
(FE 14
), cpblk
(FE 17
), initblk
(FE 18
), rethrow
(FE 1A
), refanytype
(FE 1D
), readonly.
(FE 1E
)
InlinePhi
Type: None
None of the opcodes in System.Reflection.Emit.OpCodes
reference this operand type. According to the documentation for OperandType.InlinePhi
: "The operand is reserved and should not be used".
InlineR
Type: Instruction<double>
The operand is a 64-bit IEEE floating point number. It is decoded by the ReadOnlyListExtensions.ReadDouble
method, which reads the eight bytes containing this value. Because no metadata is referenced, no resolution is required. Consider the following example:
Offset | Data | Description |
00 | 23 | The opcode indicates a load 64-bit constant instruction (OpCodes.Ldc_R8 ). |
01 | 00 00 00 00 00 00 F0 3F | The 64-bit floating point number 1.0 is loaded. |
Opcode (1): ldc.r8
(23
)
InlineSig
Type: SignatureInstruction
The operand is a 32-bit metadata token. It is initially decoded by the ReadOnlyListExtensions.ReadToken
method, which reads the four bytes containing this value. Later, using System.Reflection.Module.ResolveSignature
, the token is resolved into a byte array containing the signature's data. Since .NET does not provide much help decoding this byte array, it is further resolved by the Transeric.Reflection.MethodSignature
class.
Offset | Data | Description |
00 | 29 | The opcode indicates an indirect call instruction (OpCodes.Calli ). |
01 | 02 00 00 | The unique identity of the metadata token is 2. |
04 | 11 | The token type indicates a signature (TokenType.Signature ). |
Bear with me, this one is difficult to explain. It took me a very long time to figure it all out.
Here, System.Reflection.Module.ResolveSignature
simply returns a byte array that is a compressed representation of the target method's signature. We're largely on our own when we try to decode this byte array into something meaningful.
At a high level, the method signature is simple. It provides: a calling convention, a parameter count, a return type, and a sequence of zero or more parameter types.
The calling convention is simple to decode. Its always a single byte, so no worries about compression. The possible values are described in the Transeric.Reflection.CilCallingConvention
enumeration.
The framework described by this article further simplifies interaction by converting this value into the .NET standard System.Runtime.InteropServices.CallingConvention
and System.Reflection.CallingConventions
enumerations.
For other parts of the method signature, all integral values, we need to worry about compression. To maximize compactness the architects of IL serialization devised a fairly simple compression scheme. Integral values can be stored in one, two, or four bytes. The bytes are serialized in big endian order. The high order bits of the first byte describe the length of the value. The remaining bits provide data. There are three possible forms of the value, which are as follows:
Bit Pattern | Description |
0XXXXXXX | The first bit is clear indicating a single byte value. The remaining bits contain the data for that value. |
10XXXXXX XXXXXXXX | The first two bits indicate that this is a two byte value. The remaining bits contain the data for that value. |
11XXXXXX XXXXXXXX XXXXXXXX XXXXXXXX | The first two bits indicate that this is a four byte value. The remaining bits contain the data for that value. |
Using the above scheme, we first decode the parameter count. This is simply the number of parameter types that are provided with the method signature.
Next we decode the return type and each of the parameter types. The process for both is identical and (regrettably) complex.
Types come in two broad flavors: simple and complex.
To decode a simple type, we first read a byte representing the type. This byte is interpreted using the enumeration Transeric.Reflection.ElementType
. The enumeration recognizes the following common/simple types.
ElementType | Value | Description |
Void | 01 | A "void" type (System.Void ). |
Boolean | 02 | A Boolean type (System.Boolean ). |
Char | 03 | A character type (System.Char ). |
SByte | 04 | A signed 8-bit integer type (System.SByte ). |
Byte | 05 | An unsigned 8-bit integer type (System.Byte ). |
Int16 | 06 | A signed 16-bit integer type (System.Int16 ). |
UInt16 | 07 | An unsigned 16-bit integer type (System.UInt16 ). |
Int32 | 08 | A signed 32-bit integer type (System.Int32 ). |
UInt32 | 09 | An unsigned 32-bit integer type (System.UInt32 ). |
Int64 | 0A | A signed 64-bit integer type (System.Int64 ). |
UInt64 | 0B | An unsigned 64-bit integer type (System.UInt64 ). |
Single | 0C | A 32-bit IEEE floating point number type (System.Single ). |
Double | 0D | A 64-bit IEEE floating point number type (System.Double ). |
String | 0E | A character string type (System.String ). |
TypedReference | 16 | A typed reference type (System.TypedReference ). |
IntPtr | 18 | A platform-specific signed integral type that is used to represent a pointer or a handle (System.IntPtr ). |
UIntPtr | 19 | A platform-specific unsigned integral type that is used to represent a pointer or a handle (System.UIntPtr ). |
Object | 1C | An object type that can be used to pass any type (System.Object ). |
For these simple types, this single byte is all that is necessary to serialize the type.
For complex types (where the byte value is ElementType.Class
or ElementType.ValueType
) we need to do more work. In these cases, an additional value is provided: an encoded metadata token for the type. We begin by de-compressing an integral value.
Regrettably, the work doesn't end there. The integral value is further encoded. The lowest two order bits indicate the type of token and are interpreted as follows:
Bits | Description |
00 | A type definition metadata token (TokenType.TypeDef ). |
01 | A type reference metadata token (TokenType.TypeRef ). |
10 | A type specification metadata token (TokenType.TypeSpec ). |
11 | This is not an expected, valid code. |
The remaining bits (when shifted down) provide the unique identity of the token. After we have decoded the metadata token, we can then resolve it into a System.Type
, by using the System.Reflection.Module.ResolveType
method. Let's consider a not-so-simple example:
Offset | Data | Description |
00 | 00 | Indicates a standard call (CilCallingConvention.Standard ). |
01 | 02 | There are two parameters / parameter types. |
02 | 08 | The return type is a signed 32-bit integer (ElementType.Int32 ). |
03 | 0E | The first parameter is a character string (ElementType.String ). |
04 | 12 | The second parameter is class (ElementType.Class ). |
05 | 08 | The encoded metadata token for the class is 08 . |
To decode the token, in the above example, we consider the bits in encoded token (00001000
). The two low order bits (00
) indicate that the token is a type definition (TokenType.TypeDef
). The remaining bits (000010
) provide the unique identity of that type definition (2). The decoded metadata token is as follows:
Offset | Data | Description |
00 | 01 00 00 | The unique identity of the metadata token is 1. |
03 | 02 | The token type indicates a type definition (TokenType.TypeDef ). |
We can then use the System.Reflection.Module.ResolveType
method to resolve this metadata token into a System.Type
.
One last bit of additional complexity comes into play. It is possible to indicate that some of the parameters are optional. This is accomplished by placing a value of ElementType.Sentinel
before the first optional parameter. So, modifying the previous example, we make the second parameter optional as follows:
Offset | Data | Description |
00 | 00 | Indicates a standard call (CilCallingConvention.Standard ). |
01 | 02 | There are two parameters / parameter types. |
02 | 08 | The return type is a signed 32-bit integer (ElementType.Int32 ). |
03 | 0E | The first parameter is a character string (ElementType.String ). |
04 | 41 | The sentinel value (ElementType.Sentinel ) indicates that all subsequent parameters are optional. |
05 | 12 | The second parameter is class (ElementType.Class ). |
06 | 08 | The encoded metadata token for the class is 08 . |
Opcode (1): calli
(29
)
InlineString
Type: StringInstruction
The operand is a 32-bit metadata token. It is initially decoded by the ReadOnlyListExtensions.ReadToken
method, which reads the four bytes containing this value. Later, using System.Reflection.Module.ResolveString
, the token is resolved into an instance of System.String
. Consider the following example:
Offset | Data | Description |
00 | 72 | The opcode indicates a load string instruction (OpCodes.Ldstr ). |
01 | 02 00 00 | The unique identity of the metadata token is 2. |
04 | 70 | The token type indicates a string (TokenType.String ). |
Opcode (1): ldstr
(72
)
InlineSwitch
Type: SwitchInstruction
This is one of the few cases where the operand is of variable size and consists of multiple parts.
The first part of the operand is a signed 32-bit integer, which indicates the number of branches associated with this switch instruction. It is decoded by the ReadOnlyListExtensions.ReadInt32
method, which reads the four bytes containing the value.
After this, one or more branch offsets are provided. Each branch offset is a signed 32-bit integer that provides the byte offset from the end of the instruction. Each of these is initially decoded by the ReadOnlyListExtensions.ReadInt32
method, which reads the four bytes containing the value.
Later, using System.Reflection.Module.ResolveInstruction
, each offset is resolved into an instance of Transeric.Reflection.IInstruction
. This is accomplished by conducting a binary search for the instruction that occurs at that offset. Consider the following example:
Offset | Data | Description |
00 | 45 | The opcode indicates a switch instruction (OpCodes.Switch ). |
01 | 02 00 00 00 | There are 2 branch offsets for this switch instruction. |
05 | 0E 00 00 00 | The first branch offset is 14 bytes (0E ) from the end of the instruction. Here, the offset from the beginning of the data is 27: 13 (end of instruction) plus 14 (branch). |
09 | 0F 00 00 00 | The second branch offset is 15 bytes (0F ) from the end of the instruction. Here, the offset from the beginning of the data is 28: 13 (end of instruction) plus 15 (branch). |
0D | | End of instruction (13 bytes from the beginning of the data). |
Opcode (1): switch
(45
)
InlineTok
Types: FieldInstruction
, MemberInstruction
, MethodInstruction
, SignatureInstruction
, StringInstruction
, TypeInstruction
The operand is a 32-bit metadata token. It is initially decoded by the ReadOnlyListExtensions.ReadToken
method, which reads the four bytes containing this value. Depending on the metadata token type, an instance of one of the following types will be created: FieldInstruction
, MemberInstruction
, MethodInstruction
, SignatureInstruction
, StringInstruction
, or TypeInstruction
. The later resolution of the token into its corresponding metadata information is dependent on that type. Consider the following example:
Offset | Data | Description |
00 | D0 | The opcode indicates a load token instruction (OpCodes.Ldtoken ). |
01 | 02 00 00 | The unique identity of the metadata token is 2. |
04 | 04 | The token type indicates a field definition (TokenType.FieldDef ). |
Since the token type is TokenType.FieldDef
, an instance of FieldInstruction
is created.
Opcode (1): ldtoken
(D0
)
InlineType
Type: TypeInstruction
The operand is a 32-bit metadata token. It is initially decoded by the ReadOnlyListExtensions.ReadToken
method, which reads the four bytes containing this value. Later, using System.Reflection.Module.ResolveType
, the token is resolved into an instance of System.Type
. Consider the following example:
Offset | Data | Description |
00 | 8C | The opcode indicates a box instruction (OpCodes.Box ). |
01 | 02 00 00 | The unique identity of the metadata token is 2. |
04 | 02 | The token type indicates a type definition (TokenType.TypeDef) . |
Opcodes (17): cpobj
(70
), ldobj
(71
), castclass
(74
), isinst
(75
), unbox
(79
), stobj
(81
), box
(8C
), newarr
(8D
), ldelema
(8F
), ldelem
(A3
), stelem
(A4
), unbox.any
(A5
), refanyval
(C2
), mkrefany
(C6
), initobj
(FE 15
), constrained.
(FE 16
), sizeof
(FE 1C
)
InlineVar
Types: ParameterInstruction<ushort>
or VariableInstruction<ushort>
The operand is a unsigned 16-bit integer. It is initially decoded by the ReadOnlyExtensions.ReadUInt16
method, which reads the two bytes containing this value. Depending on the instruction, an instance of either ParameterInstruction<ushort>
or VariableInstruction<ushort>
will be created.
Note: It would be nice if System.OperandType
defined separate operand types (e.g. InlineArg
and InlineVar
) for these two distinct operand types. Regrettably, it does not.
ParameterInstruction<ushort>
A ParameterInstruction<ushort>
instance is created for the instructions ldarg
, ldarga
, and starg
. Later the operand is resolved via Transeric.Reflection.MethodIL.ResolveParameter
into an instance of ParameterInfo
. It accomplishes this by using the System.Reflection.GetParameters
method. Since GetParameters
does not return the this
argument, the index value is interpreted according to the containing method's calling convention (notably CallingConventions.HasThis
). Consider the following example:
Offset | Data | Description |
00 | FE 09 | The opcode indicates a load argument instruction (OpCodes.Ldarg ). |
02 | 01 00 00 00 | The zero-based index value (1) indicates the second parameter. |
VariableInstruction<ushort>
A VariableInstruction<ushort>
instance is created for the instructions ldloc
, ldloca
, and stloc
. Later the operand is resolved via Transeric.Reflection.MethodIL.ResolveVariable
into an instance of LocalVariableInfo
. It accomplishes this by using the MethodBody.LocalVariables
property. Consider the following example:
Offset | Data | Description |
00 | FE 0C | The opcode indicates a load local variable instruction (OpCodes.Ldloc ). |
02 | 01 00 00 00 | The zero-based index value (1) indicates the second local variable. |
Opcodes (6): ldarg
(FE 09
), ldarga
(FE 0A
), starg
(FE 0B
), ldloc
(FE 0C
), ldloca
(FE 0D
), stloc
(FE 0E
)
ShortInlineBrTarget
Type: BranchInstruction<sbyte>
The operand is an 8-bit signed integer that specifies the byte offset from the end of the instruction. It is initially decoded by the ReadOnlyListExtensions.ReadSByte
method, which reads the byte containing this value.
Later, using Transeric.Reflection.MethodIL.ResolveInstruction
, this offset is resolved into an instance of Transeric.Reflection.IInstruction
. This is accomplished by conducting a binary search for the instruction that occurs at that offset. Consider the following example:
Offset | Data | Description |
00 | 2B | The opcode indicates a branch instruction (OpCodes.Br_S ). |
01 | 0F | The offset is 15 bytes (0F ) from the end of the instruction. Here, the offset from the beginning of the data is 17: 2 (end of instruction) plus 15 (branch). |
02 | | The end of the instruction. |
Opcodes (14): br.s
(2B
), brfalse.s
(2C
), brtrue.s
(2D
), beq.s
(2E
), bge.s
(2F
), bgt.s
(30
), ble.s
(31
), blt.s
(32
), bne.un.s
(33
), bge.un.s
(34
), bgt.un.s
(35
), ble.un.s
(36
), blt.un.s
(37
), leave.s
(DE
)
ShortInlineI
Type: Instruction<byte>
or Instruction<sbyte>
The operand is an 8-bit integer. Depending on the instruction, it is decoded by either ReadOnlyListExtensions.ReadByte
(instruction unaligned.
) or ReadOnlyListExtensions.ReadSByte
(instruction ldc.i4.s
), which reads the byte containing this value. Because no metadata is referenced, no resolution is required. Consider the following example:
Offset | Data | Description |
00 | 1F | The opcode indicates a load 8-bit constant instruction (OpCodes.Ldc_I4_S ). |
01 | FF | The value -1 is loaded. |
Opcodes (2): ldc.i4.s
(1F
), unaligned.
(FE 12
)
ShortInlineR
Types: Instruction<float>
The operand is an 32-bit IEEE floating point number. It is decoded by the ReadOnlyListExtensions.ReadSingle
method, which reads the four bytes containing this value. Because no metadata is referenced, no resolution is required. Consider the following example:
Offset | Data | Description |
00 | 22 | The opcode indicates a load 32-bit constant instruction (OpCodes.Ldc_R4 ). |
01 | 00 00 80 3F | The 32 bit IEEE floating point number 1.0 is loaded. |
Opcodes (1): ldc.r4
(22
)
ShortInlineVar
Types: ParameterInstruction<byte>
or VariableInstruction<byte>
The operand is a unsigned 8-bit integer. It is initially decoded by the ReadOnlyExtensions.ReadByte
method, which reads the byte containing this value. Depending on the instruction, an instance of either ParameterInstruction<byte>
or VariableInstruction<byte>
is created.
Note: It would be nice if System.OperandType defined separate operand types (e.g. ShortInlineArg
and ShortInlineVar
) for these two distinct operand types. Regrettably, it does not.
ParameterInstruction<byte>
A ParameterInstruction<byte>
instance is created for the instructions ldarg.s
, ldarga.s
, and starg.s
. Later the operand is resolved via Transeric.Reflection.MethodIL.ResolveParameter
into an instance of ParameterInfo
. It accomplishes this by using the System.Reflection.GetParameters
method. Since GetParameters
does not return the this
argument, the index value is interpreted according to the containing method's calling convention (notably CallingConventions.HasThis
). Consider the following example:
Offset | Data | Description |
00 | 0E | The opcode indicates a load argument instruction (OpCodes.Ldarg_S ). |
01 | 01 00 00 00 | The zero-based index value (1) indicates the second parameter. |
VariableInstruction<byte>
A VariableInstruction<byte>
instance is created for the instructions ldloc.s
, ldloca.s
, and stloc.s
. Later the operand is resolved via Transeric.Reflection.MethodIL.ResolveVariable
into an instance of LocalVariableInfo
. It accomplishes this by using the MethodBody.LocalVariables
property. Consider the following example:
Offset | Data | Description |
01 | 11 | The opcode indicates a load local variable instruction (OpCodes.Ldloc_S ). |
02 | 01 00 00 00 | The zero-based index value (1) indicates the second local variable. |
Opcodes (6): ldarg.s
(0E
), ldarga.s
(0F
), starg.s
(10
), ldloc.s
(11
), ldloca.s
(12
), stloc.s
(13
)
Introduction to .NET Reflection
Its a bit unlikely a reader interested in this topic will also be a beginner with reflection. If so, there are undoubtedly far better articles on this topic. That said, I would feel badly, if I didn't at least provide a brief introduction.
As mentioned earlier in the topic, .NET provides a comparatively feature rich framework for examining the metadata associated with an assembly / application. The easiest way to explore this framework is probably to step through the debugger in the Program.ReflectionPrimer
method provided with the source code for this article. There, I've provided examples of a lot of common use cases.
Below we cover some of the major types in the framework:
Type | Description |
Assembly | Each application consists of one or more assemblies. From a Visual Studio perspective, building a Project basically results in the creation of an Assembly (.exe or .dll ). Some common elements of interest provided by the Assembly include: name, version, product, title, copyright information, file location, modules, and other referenced assemblies. |
Module | Each assembly consists of one or more modules. In most cases, there is only a single module. Some common elements of interest provided by the Module include: name, types, and a means of resolving metadata tokens. |
Type | Each module consists of one or more types. Every time you create a class or a struct , you create a Type . There are also a large number of system types (e.g. System.Int32 and System.String ). Some common elements of interest provided by the Type include: name, base type, fields, methods, and properties. |
MethodInfo | Each Type may include methods. When you create a function/method, there is corresponding metadata (MethodInfo ) for that method. Some common elements of interest provided by MethodInfo include: name, return type, parameters, calling convention, declaring type, and method body. In this article we extend MethodBase (from which MethodInfo is derived) to also provide the Intermediate Language instructions in the method. |
PropertyInfo | Each Type may also include properties. When you create a property, there is corresponding metadata (PropertyInfo ) for that property. Some common elements of interest provided by PropertyInfo include: name, type, get method, and set method. |
FieldInfo | Each Type may also include fields. When you create a field, there is corresponding metadata (FieldInfo ) for that field. Some common elements of interest provided by FieldInfo include: name and type. |
Introduction to Common Intermediate Language (CIL)
The topic of Common Intermediate Language (CIL) is far too broad to cover in this article. Also, I claim no expertise on the topic. IL simply resembles an assembly language. Because I am a bit of a dinosaur (dating back to a time when understanding assembly language was a critical skill), I understand just enough IL to read it with some proficiency. Its a bit of an inate skill for me.
I assume there are entire books on this topic. Regrettably, I haven't read one and can't in good consience personally recommend one. That said, I've noticed people in forums recommending "Expert .NET 2.0 IL Assembler" by Serge Lidin. However, I worry that, since we've progressed to .NET 4.7.1, that recommendation might be a bit dated. The same author seems to have more recent books.
Before we start describing some IL, for comparison, let's consider a simple HelloWorld
program in C#:
using System;
namespace HelloWorld
{
public class Program
{
public static void Main(string[] args) =>
Console.WriteLine("Hello World!");
}
}
The corresponding instructions, contained in the Main
method, would be as follows:
IL_0000: ldstr "Hello World!"
IL_0005: call void [mscorlib]System.Console::WriteLine(string)
IL_000a: nop
IL_000b: ret
Let's consider the first line:
IL_0000: ldstr "Hello World!"
This instruction simply pushes a string onto the stack. Among other things, the stack is used to pass arguments to methods. Stacks are rather important things in Intermediate Language (and most assembly languages). Most instructions modify the stack in some fashion.
The different parts of this instruction are as follows:
Part | Description |
IL_0000 | This is simply a label for the location of the instruction within the method. It doesn't actually contribute to the byte array that stores the instructions. The IL part stands for Intermediate Language. The 0000 is the byte offset from the start of the method. While the label can be arbitrarily chosen (within reason), most of the disassemblers seem to prefer this naming convention. |
: | This indicates that the bit before ": " is a label. |
ldstr | This is the opcode for this instruction (OpCodes.Ldstr ). |
"Hello World!" | This is simply the string literal that is passed to the method. |
Moving onto the second line:
IL_0005: call void [mscorlib]System.Console::WriteLine(string)
This instruction simply calls the specified method. It is assumed the arguments were previously pushed onto the stack. Notable portions of this instruction include the following:
Part | Description |
call | This is the opcode for the instruction (OpCodes.Call ). |
void | This is the return type for the method that is called. In this case, nothing is returned. |
[mscorlib] | The name of the assembly (mscorlib ) that contains the method. |
System. | The namespace (System ) that contains the method. |
Console:: | The name of the class (Console ) that contains the method. |
WriteLine | The name of the method. |
(string) | The types of the parameters for the method. In this case, there is a single parameter of type string (System.String ). |
Moving onto the third line:
IL_000a: nop
This instruction does nothing ("no operation").
Moving onto the fourth and final line:
IL_000b: ret
This instruction simply returns from the current method.
To actually build the program (with ILASM) and run it we would need a bit of extra metadata. The following minimal program can be built and will run:
.assembly HelloWorld
{
}
.method static void Main()
{
.entrypoint
.maxstack 1
ldstr "Hello World!"
call void [mscorlib]System.Console::WriteLine(string)
ret
}
Note: If we truly disassembled the example C# program it would include a whole bunch more metadata. This was omitted for simplicity.
Additional Reading
Below are a collection of links to Microsoft reference materials covering some of the concepts covered in this article:
Metadata and Self-Describing Components
https://docs.microsoft.com/en-us/dotnet/standard/metadata-and-self-describing-components
ECMA C# and Common Language Infrastructure Standards
https://www.visualstudio.com/license-terms/ecma-c-common-language-infrastructure-standards/
ECMA Common Language Infrastructure (CLI) Partitions I to VI
http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-335.pdf
Common Intermediate Language
https://en.wikipedia.org/wiki/Common_Intermediate_Language
Mono.Cecil
http://www.mono-project.com/docs/tools+libraries/libraries/Mono.Cecil/
History
- 5/15/2018 - The original version was uploaded
- 5/20/2018 - Added a reference to
Mono.Cecil
and a couple more useful links in Additional Reading. - 6/15/2018 - Fixing typo at end of MetadataTokens section (where ".ResolveMethod" was missing).