Introduction
This article is meant to be a small tutorial for runtime IL-rewriting. I don't pretend that it is going to be a complete one,
since IL-Rewriting can be a big and and sometimes difficult topic. The targeted audience are developers without former experience of IL-Rewriting, but are curious to try it out.
Background
My past articles about the
CLR hosting API,
and a mixed-mode profiler
have primarily been towards diagnostics and testing of running applications. This article also has its origin in the testing field.
Once I saw a video by Roy Osherove (a unit testing guru) speaking about the implementation of some mocking frameworks.
Mock frameworks are used in unit-testing to isolate a class and remove dependencies to objects. Some of them do it by using IL-rewriting to fake and stub objects.
A cool tool to check out is Moles
from Microsoft Research, which allows you to override return values in the System libraries. Let's say that you have a legacy system and a
calendar implementation to test,
and the problem is that it behaves differently depending on the time the test is run. Using IL-rewriting, you will be able to override the
System.DateTime.Now
in your tests.
The benefit is that your tests will be deterministic and always give the same result.
Code Rewriting
Modifying the Code of a Running System is an Old Practice with Native Binaries
The author could write self-modifying code.
For license checking one could ship a crippled binar, and if the license check is successful,
some important code could be decrypted and copied from a hidden place in the binary and written into some place in the binary where it had been removed.
To Circumvent a License Check or Give Unlimited Lives in a Computer Game
Some people made
permanent changes in the EXE, by changing a conditional jmp to a direct jmp.
Some binary EXEs were encrypted on disk so this approach was not always possible, so instead they used a small loader and applied the patch on the running system.
To Add Code to a Native Application
One must find a free memory region (aka code cave), either on disk or in the memory space of the running process.
In order to add this code to the program you simply copied some instructions from the beginning of function A, to your code cave,
and at the beginning of function A, you inserted a call to the code cave. Last in the code cave you added a return (ret), and the execution continued from where it was called.
IL-rewriting works more or less in the same way. The fact is that it is much easier.
- No need for code caves. Free space can allocated directly in the CLR.
- IL Assembler is more readable than native assembler code, because it can often also be viewed as C# or VB.NET
- Metadata contains the full type information of classes and types.
Rewriting through the ICorProfiler API
Fortunately. There is a profiler API which makes it possible to interact and get notifications from the CLR.
It is an unmanaged API, this is unfortunately necessary, otherwise also the profiler code would be profiled.
Let's look at the interfaces we need to build a profiler.
IcorProfilerCallback interfaces
Your own profiler must implement this interface in order to get notifications from the CLR.
The first callback we will look at is Initialize
, which is called at startup.
MIDL_INTERFACE("176FBED1-A55C-4796-98CA-A9DA0EF883E7")
ICorProfilerCallback : public IUnknown
{
public:
virtual HRESULT STDMETHODCALLTYPE Initialize(
IUnknown *pICorProfilerInfoUnk) = 0;
}
The parameter we get is a pointer to an object implementing ICorprofilerInfo
, IcorProfilerInfo2
, and/or ICorProfilerInfo3
.
The object you get depends on the version of the CLR that is running. ICorprofilerInfo
and ICorprofilerInfo
are the most important.
The ICorProfilerInfo3
interface is implemented in .NET 4.0 and adds attach and detach capabilities.
ICorProfiler3
inherits from ICorProfilerInfo2
, which inherits from ICorPRofilerInfo
.
So if you get a IcorPRofiler2
object, there is no need to query for the
ICorprofilerInfo
object.
So the first thing we should do is to query for ICorProfilerInfo2
.
ICorProfilerInfo2* m_corProfilerInfo2;
HRESULT hr = pICorProfilerInfoUnk->QueryInterface(IID_ICorProfilerInfo2, (LPVOID*)&m_corProfilerInfo2);
Subscribing to Events
You can tell the CLR what events you are interested in getting notifications from by setting an event mask.
The mask is constructed by Or:ing together some enum values, and calling SetEventMask
on the ICorProfilerInfo
object.
This can only be done once, and only inside the Initialize
method. Calling it from other functions later will result in an error.
For doing IL-rewriting, the following enum values are recommended.
DWORD eventMask = COR_PRF_MONITOR_NONE;
eventMask |= COR_PRF_MONITOR_JIT_COMPILATION;
eventMask |= COR_PRF_MONITOR_MODULE_LOADS;
eventMask |= COR_PRF_DISABLE_INLINING;
eventMask |= COR_PRF_DISABLE_OPTIMIZATIONS;
m_corProfilerInfo->SetEventMask(eventMask);
JITCompilationStarted
Concerning IL-rewriting, the most important callback to implement is JITCompilationStarted
.
MIDL_INTERFACE("8A8CC829-CCF2-49fe-BBAE-0F022228071A")
ICorProfilerCallback2 : public ICorProfilerCallback
{
public:
HRESULT ( STDMETHODCALLTYPE *JITCompilationStarted )(
ICorProfilerCallback2 * This,
FunctionID functionId,
BOOL fIsSafeToBlock);
}
That callback is called on all managed methods when the IL-code is about to be
JITted into native code.
This is the window of opportunity we have to do some IL-Rewriting.
Steps to Follow
What we get from the JITCompilationStarted
callback is a FunctionID.
By using the FunctionID as a parameter to ICorProfilerInfo::GetFunctionInfo
we can obtain its ClassID
and ModuleID
.
A call to ICorProfilerInfo::GetModuleInfo
with the ModuleID will return its Module
name, and its AssemblyID
.
IMetaDataImport Interface
This interface is for doing lookups in the metadata. You can for example iterate over all methods of a class, or find the parent class or interfaces of a class.
IMetaDataEmit
This interface is for emitting/generating new Modules, Assemblies, Classes, Methods, Strings etc. If you are interested in using methods from other assemblies,
you will have to generate a mdMethodRef
to that method in the module you will call it from. It is sort of like a forward declaration or external declaration in C.
The loading of an external assembly is automatically taken care of by the CLR. Note, that it will be loaded when the method is executed, not when the
MethodRef
is created.
Internal Structures
The IL-code of a method contains a header, describing the IL-code.
In its Easiest Implementation
This header is just 1 byte. 6 bits for the length and 2 for flags.
This structure is called IMAGE_COR_ILMETHOD_TINY
. A tiny method must fulfill the following requirements.
- Small method - IL code is max 63 bytes.
- No Exception Handling
- No local variables
The other structure is IMAGE_COR_ILMETHOD_FAT
. It is a more complicated header, containing stack size, type information of local variables, and information about sub sections.
Usage of Exception Handling results in one or more extra sections. If you add prelude code you will have to update the start and end of the exception handling.
Having added only prelude code the addresses are easily adjusted by compensating for the size of the new IL.
Adding a few new IL code instructions in the middle, is more complicated.
Adding Prelude Code
There is already a CodeProject article describing how to add prelude code for managed methods
called Really Easy Logging using IL Rewriting and the .NET Profiling API.
The good part is that it is a simple and working sample. What the author does is dynamically allocating a string and a creating a
mdMethodRef
to point to System.Console.WriteLine
.
In the prelude code he puts the string on the stack and calls the mdMethodRef
. Unfortunately, there is not so much "Rewriting" done at all.
My own contribution in the area of IL-rewriting is to show how to replace existing calls with calls to methods in external assemblies.
Replacing Existing Method Calls
Let's start! Below is the IL code from the method FatDateNow
in
SampleApp1.exe:
"133002002200000001000011281400000A0A723B000070281300000A1200725B000070281500000A281200000A2A"
How do we begin? The first exercise we should do is to open the managed application with ILDasm.
Then we navigate to the method and open it.
.method public hidebysig instance void FatDateNow() cil managed
{
.maxstack 2
.locals init ([0] valuetype [mscorlib]System.DateTime dt)
IL_0000: nop
IL_0001: call valuetype [mscorlib]System.DateTime [mscorlib]System.DateTime::get_Now()
IL_0006: stloc.0
IL_0007: ldstr "DateTime.Now : "
IL_000c: call void [mscorlib]System.Console::Write(string)
IL_0011: nop
IL_0012: ldloca.s dt
IL_0014: ldstr "HH:mm"
IL_0019: call instance string [mscorlib]System.DateTime::ToString(string)
IL_001e: call void [mscorlib]System.Console::WriteLine(string)
IL_0023: nop
IL_0024: ret
}
The lines starting with "IL_00XX" is the method body. Everything before that is info coming from the Header.
I recommend accessing the header through COR_ILMETHOD_TINY
and COR_ILMETHOD_FAT
(from
corhlpr.h in the SDK),
those are two structures that contains accessor methods for the particular fields. This way you don't have to worry about bit shifting so much.
If we look to the left at the lines containing method calls, we can see that the IL code for doing method calls is 0x28.
It also takes 1 parameter called a token, which is of type mdMethodRef
.
If you look close enough, the mdMethodRef
has its first byte in
parenthesis. The mdMethodRef
is an encoded token, the first byte refers to the module where it is located.
The rest of the number just seems to be a sequence number of method references of that module. This means that if you want to add a call to a method that is already referenced,
it is possible to reuse an mdMethodRef
. Otherwise you will have to create one. Fortunately duplicates are also accepted if you don't care to do lookups.
References to methods in other assemblies must be of type mdMethodRef
. If you call a method within the same assembly it is implemented/defined you can use the
mdMethodDef
.
Looking Up an Existing mdMethodRef
I have added a function that looks up the mdTypeRef
(class/struct) and the
mdMethodRef
. In my example, I call it to look up the getter accessor for System.DateTime.Now
.
HRESULT hr = MetadataHelper::ResolveMethodRef(
info, moduleId,
L"System.DateTime", L"get_Now",
&dateTimeTypeRef, &getNowMemberRef);
Below is the implementation of the function.
HRESULT MetadataHelper::ResolveMethodRef(
ICorProfilerInfo* info, ModuleID moduleID,
const WCHAR* typeName, const WCHAR* methodName,
mdTypeRef* outTypeRef, mdMemberRef* outMemberRef)
{
HRESULT hr = S_OK;
IMetaDataImport* pMetaDataImport = NULL;
*outTypeRef = mdTypeRefNil;
*outMemberRef = mdMemberRefNil;
hr = info->GetModuleMetaData(moduleID, ofRead, IID_IMetaDataImport,
(IUnknown** )&pMetaDataImport);
if (FAILED(hr))
{
return hr;
}
mdTypeRef typeRef = mdTypeRefNil;
hr = FindTypeRef(pMetaDataImport, typeName, &typeRef);
if (hr != S_OK)
goto cleanup;
mdMemberRef methodRef = mdMemberRefNil;
hr = FindMemberRef(pMetaDataImport, methodName, typeRef, &methodRef);
if (hr != S_OK)
goto cleanup;
*outTypeRef = typeRef;
*outMemberRef = methodRef;
cleanup:
pMetaDataImport->Release();
return hr;
}
FindTypeRef
and FindMemberRef
are methods that simply iterates over all types and all methods, until finding the token we are looking for.
The full source is included in the attachment. It would take up unnecessary space here.
Creating a New mdMethodRef
const BYTE keyEMCAInterceptLib[] = { 0xca, 0xf2, 0x7b, 0x24, 0xca, 0xa5, 0xa1, 0x88 };
mdMemberRef newMemberRef = mdMemberRefNil;
MetadataHelper::DeclareZeroParamsFunctionReturningObject(info,
moduleId,
L"InterceptLib", keyEMCAInterceptLib,
keyEMCAInterceptLibSize,
L"InterceptLib.DebugLogger",
L"get_Now",
dateTimeTypeRef,
&newMemberRef);
The code for creating a new mdMethodRef
is very specific to the method you are creating it for.
First of all it depends on the types of parameters and the type of the return value.
The types are encoded into a method signature, that identifies the method. It is similar to a function pointer type.
Casting a function pointer to a pointer that doesn't accept the same type of parameters makes no sense.
Secondly, if strong naming is used, one must also supply the public token for the assembly caf27b24caa5a188
.
Below is the signature for System.DateTime.Now
:
BYTE rSig[] = {IMAGE_CEE_CS_CALLCONV_DEFAULT,
0, ELEMENT_TYPE_VALUETYPE, 0, 0, 0, 0, 0 };
The method doesn't take parameters, but it does return a value. If the value is a primitive type like
int
or double
,
there is no need to specify a complementing type token. In this case the return type is a
DateTime
, i.e. a class or struct.
This is described as ELEMENT_TYPE_VALUETYPE
, since this is too little info for the CLR,
the type must be followed by a compressed token reference (replacing the four zeros).
The function CorSigCompressToken
is available in the SDK in corhlpr.h:
ULONG ulTokenLength = CorSigCompressToken(retType, &rSig[3]);
ULONG ulSigLength = 3 + ulTokenLength;
mdMemberRef memberRef = mdMemberRefNil;
Check(metaDataEmit->DefineMemberRef(typeRef, methodName, rSig, ulSigLength, &memberRef));
In the example I encoded just the return value, that is because System.DateTime.Now
doesn't have parameters. If you want to call a function with parameters,
those too needs to be encoded.
BYTE rSig[] = {IMAGE_CEE_CS_CALLCONV_DEFAULT,
2, ELEMENT_TYPE_VALUETYPE, 0, 0, 0, 0, ELEMENT_TYPE_VALUETYPE, 0, 0, 0, 0, ELEMENT_TYPE_VALUETYPE, 0, 0, 0, 0, 0 };
Now we can put it all together:
static const BYTE keyEMCAInterceptLib[] = { 0xca, 0xf2, 0x7b, 0x24, 0xca, 0xa5, 0xa1, 0x88 };
ULONG InterceptAPI::ReplaceDateTimeNowCalls(ICorProfilerInfo* info,
ModuleID moduleId, BYTE* ILBytes, ULONG ILBytesSize)
{
ULONG keyEMCAInterceptLibSize = sizeof(keyEMCAInterceptLib);
mdTypeRef dateTimeTypeRef = mdTypeRefNil;
mdMemberRef getNowMemberRef = mdMemberRefNil;
HRESULT hr = S_OK;
hr = MetadataHelper::ResolveMethodRef(info, moduleId,
L"System.DateTime", L"get_Now", &dateTimeTypeRef, &getNowMemberRef);
if (hr != S_OK)
return 0;
mdMemberRef newMemberRef = mdMemberRefNil;
MetadataHelper::DeclareZeroParamsFunctionReturningObject(info,
moduleId,
L"InterceptLib", keyEMCAInterceptLib,
keyEMCAInterceptLibSize,
L"InterceptLib.DebugLogger",
L"get_Now",
dateTimeTypeRef,
&newMemberRef);
ULONG count = OpCodeParser::ReplaceFunctionCall(ILBytes,
ILBytesSize, getNowMemberRef, newMemberRef);
return count;
Below is the implementation of ReplaceFunctionCall
:
ULONG OpCodeParser::ReplaceFunctionCall(BYTE* opCodeBytes, ULONG length,
mdMemberRef fromMemberRef, mdMemberRef toMemberRef)
{
ULONG count = 0;
ULONG index = 0;
while (index < length)
{
BYTE opCode = opCodeBytes[index];
if (IsFunctionCall(opCode))
{
int addressIndex = index + 1;
BYTE* address = opCodeBytes + addressIndex;
mdMemberRef* memberRefAddress = reinterpret_cast<mdMemberRef*>(address);
mdMemberRef memberRef = *memberRefAddress;
if (fromMemberRef == memberRef)
{
*memberRefAddress = toMemberRef;
count++;
}
}
ULONG opCodeSize = InstructionSize(opCodeBytes + index);
index += opCodeSize;
}
return count;
}
What the code does, is test if the current OpCode is a function call (testing if the opcode is equal to 0x28)
if so we test if it is the one we are looking for (fromMemberRef
) and replace it with the new one (toMemberRef
), otherwise we skip to the next instruction.
The implementation of InstructionSize
can initially be tricky. Instructions may have parameters and are therefore not of the same size.
In the appendix of the book Expert .NET 2.0 IL Assembler,
I found a list with all the instructions and information of how many parameters they expected.
With this information in hand, I did a function with a switch statement and a some if statements testing for certain bytecode ranges.
Probably not the nicest implementation you will see, but it served my purpose at the time.
Recently I found out that the information I found in the appendix of the book is also available in the .NET SDK.
On my machine I found the file at the following location: "C:\Program Files (x86)\Microsoft SDKs\Windows\v7.0A\Include\opcode.def".
OPDEF(CEE_NOP, "nop", Pop0, Push0, InlineNone, IPrimitive, 1, 0xFF, 0x00, NEXT)
OPDEF(CEE_BREAK, "break", Pop0, Push0, InlineNone, IPrimitive, 1, 0xFF, 0x01, BREAK)
OPDEF(CEE_LDARG_0,"ldarg.0", Pop0, Push1, InlineNone, IMacro, 1, 0xFF, 0x02, NEXT)
Including this file in your project, and doing a home made macro, it should be possible to convert the file into an array of structs.
Here is a link to someone that has done it Thoughts on writing an IL Disassembler.
Unfortunately, he did that for a company and cannot release the source code. In some future, I might do it myself too.
Running the Demo
The demo consists of three executables
- SampleApp1.exe - Prints the current time and a fixed time (18.15)
- InterceptApp.exe - Takes a filename as a parameter, launches an managed app with profiling
- RunIt.bat - Launches SampleApp1.exe using only environment vars. (update it. It needs absolute file names)
Running SampleApp1.exe without ILRewriting will give the following output.
Running SampleApp1.exe with ILRewriting will intercept calls to System.DateTime.Now
to also print 18:15 (6.15 pm).
Points of Interest
Below are some resources that I have found useful and interesting
History
- 4th Sept, 2012, Initial post
- 9th Sept, 2012, Added missing Op-codes
- 27th Sept, 2012, Minor spelling corrections, added history section