The Problem
I've never been a huge fan of writing logging into my applications. There are a few reasons. One, it’s just never been helpful in my personal code development. It’s just easier to use a debugger or look at a stack trace. Of course in a production application you can't use a debugger and that’s where it’s a help to have some logging. However, that brings us to reason number two that I'm not a huge fan of logging. It always seems that there is no logging code in the part of the application that is breaking. Of course when this happens there’s no easy way to go back and add the needed logging code. Usually this requires recompiling and reinstalling this new temporary version on the system in question… not really the best situation.
The Idea
This got me thinking, wouldn't it be nice if I could just alter the production code by adding the logging I needed to diagnose the problem and then remove the added code when I was done? A while back I had stumbled on an article talking about the .NET code profiling API and the idea of IL rewriting. I'll get into the technical aspects of what exactly that means and how it works in a little bit, but for now the basic idea of it is that you can grab the assembly code at run time just before it gets compiled into machine code and rewrite the code first (like maybe to add some logging for example).
The Starting Point
Ideally what I would want to be able to do is log every method entry and exit point, log the times they entered and exited, be able to log the parameters passed to methods, and be able to enter specific logging commands at various points in the method (something like “log what the value of x is after the second switch
statement”). Obviously that is a lot to bite off, so I thought I'd start with just the first one, the ability to log every method as it is entered and the time it was entered.
The Profiling API
This is all made possible by the .NET Profiling API that is built into the .NET runtime. The profiling API is basically a bunch of hooks into the runtime that you can use to find out what is happening when an application is running. It can provide information about everything from Garbage Collection to Exceptions that are being thrown to when Assemblies/Classes/Methods are being loaded. This last part is what we will use in this example. Specifically we will use JITCompilationStarted
to find code that is about to be compiled into machine code by the runtime and add our logging code first.
One problem with the profiling API is that it must be implemented as a COM component. At first glance this seems like a strange requirement, but the reason for this becomes clear if you think about it for a second. If your profiler were written in .NET, it would constantly be trying to profile itself.
As a side note, there are also a handful of methods that you can hook into that will notify you every time a method is entered/exited. These are much easier to use. In fact my first attempt used these methods, but I ended up using the IL rewriting path for two reasons. One, having the .NET runtime call into your profiling code every time a method is entered or exited is VERY slow. And two, this would not facilitate my ultimate goal of actually being able to add more specific logging code at any point within a method.
The Research
As with any new undertaking, I went straight to Google and started my research. I found a small handful of articles dealing with this topic, but by far the most helpful was one by Aleksandr Mikunov. Aleksandr gives a great primer on the basics of IL rewriting and even includes some sample code that is close to what I wanted to do. If you would like to work with IL rewriting you should definitely read his article.
The Messy Details
So here’s how IL rewriting works in a nutshell. When you compile your .NET code, be it C# or VB or whatever, your code is converted into what is called Intermediate Langue or IL code (which looks a little bit like assembly code). When you run your compiled application, the first time each method is used, the .NET runtime converts the IL to actual machine code. This is known as Just In Time (or JIT) compilation. JIT compilation is the reason every part of your application runs more slowly the first time you use it. What we are doing is getting a notification when the runtime is about to compile any piece of IL to machine code altering the IL code first. In our case, we're inserting a call to a logging class as the first instruction in every method.
Since this article is primarily about IL rewriting, I'll only give a quick overview of the peripheral assemblies and then give a more detailed explanation of the code that actually handles the rewriting. The Logger.Controller
assembly is responsible for starting the application you would like to profile. It handles all the details of hooking .NET Windows applications or ASP.NET applications (Windows services coming soon…) up to our profiler. It then opens a remoting channel to listen for callbacks from the project being logged.
The Logger.Core
assembly contains a static log
method. This is the call that our IL rewriter will add to the beginning of every method being profiled. In order to make sure that all applications can find this assembly, it’s registered with the GAC as part of the install process (or as part of the post build process if you are compiling the source code).
The other projects in the solution are:
Logger.Common
: Defines an interface used for remoting
Logger.Gui
: The graphical user interface which operates the Logger.Controller
Logger.Profiler
and Logger.ProfilerPS
: The C++ COM implementation of the .NET profiling API. This is the meat of the application.
TestApplication
: A simple test harness
Logger
: The install package
The Code
For the sake of the length of this article, I'll limit my explanation of the code to the Logger.Profiler
project. Logger.Profiler
is a C++ COM component that implements the .NET profiler API. The real entry point for our discussion is the implementation of JITCompilationStarted
in the Profiler
class. It looks like this:
HRESULT Profiler::JITCompilationStarted(FunctionID functionID, BOOL fIsSafeToBlock)
{
FunctionInfo* functionInfo =
FunctionInfo::CreateFunctionInfo(profilerInfo, functionID);
if (NULL == functionInfo) { return S_OK; }
ILWriterBase *ilWriter = ILWriterBase::CreateILWriter(profilerInfo, functionInfo);
if(ilWriter->CanRewrite())
{
Check(profilerInfo->SetILFunctionBody(functionInfo->GetModuleID(),
functionInfo->GetToken(), (LPCBYTE)ilWriter->GetNewILBytes()));
}
return S_OK;
}
The first line is a call to the static
method CreateFunctionInfo
on the FunctionInfo
class. CreateFunctionInfo
is really a factory method that returns either a FunctionInfo
class, which contains all the information about the signature of the function that is about to be JIT compiled, or the value NULL
if this function should not be profiled.
The next code block starts with a call to another factory method called CreateILWriter
. This method returns one of three concrete implementations of an ILWriterBase
class. The reason for this is that there are two different flavors of an IL function, tiny and fat. Basically a tiny method format is used for very small methods that don't take any arguments, and fat format is used for everything else. Each of these formats requires a slightly different approach to IL rewriting.
At this point, you may be doing the math in your head… wait, there are two types of IL function formats but three implementations of ILWriterBase
. The reason for this is that we run into three scenarios when modifying IL, a fat function that needs to be rewritten, a tiny function that needs to be rewritten, and a tiny format that will need to be converted to a fat function because the additional code makes the rewritten function large.
The last thing that JITCompilationStarted
does is call SetILFunctionBody
on a reference to the profiling engines ICorProfilerInfo
(we got a reference to this by querying for the interface in Initialize
). We get the IL to pass to this call by calling GetNewILBytes
on the implementation of ILWriterBase
that we got back when we called CreateILWriter
. Here’s the code:
void *ILWriterBase::GetNewILBytes()
{
void* newMethodBytes = AllocateNewMethodBody(functionInfo);
WriteHeader(newMethodBytes);
WriteNewIL(newMethodBytes);
WriteOldIL(newMethodBytes);
WriteExtra(newMethodBytes);
Diagnostics::GetInstance()->PrintIL((BYTE*)GetOldMethodBytes(), GetOldMethodSize());
Diagnostics::GetInstance()->PrintIL((BYTE*)newMethodBytes, GetNewMethodSize());
return newMethodBytes;
}
The first thing we do is create space for the new code using AllocateNewMethodBody
, which uses ICorProfilerInfo
’s GetILFunctionBodyAllocator
function to get us a pointer to memory allocated for our new function. We then start by writing the header information. A “header” is basically some metadata about the method. WriteHeader
is a pure abstract method because headers are written completely differently for tiny and fat formats. We then write in a call to our logging method by calling WriteNewIL
, which simply uses memcpy
to copy the bytes of the new IL directly after the header information. Next we write in the original bytes by calling WriteOldIL
, again using memcpy
to copy the old function body in directly after our new IL instructions. Lastly, we call WriteExtra
. WriteExtra
is only implemented by FatILWriter
and is used to write in any try
/catch
block information from the original function. It also updates the location of the catch
handlers, which I won't go into detail here. For more information on that, refer to the article by Aleksandr Mikunov referenced above.
And that’s all there is to it. At this point our code exits and the .NET runtime compiles our altered IL code, including our logging calls, into machine code.
The Wrap Up
I realize this is only the 30,000 foot view of what is going on, but that’s a bit on purpose. First, most of what I've learned about what IL rewriting is and how it works comes from the article by Aleksandr Mikunov, and to restate it here would be nothing short of plagiarism. Secondly, I've tried very hard to write “self documenting” code. Personally, I'd rather look at well written source code than documentation any day. I hope that what I've provided is just that, reasonable well written source code that will help you get started with writing your own IL rewriting.
The Next Steps
I hope that what I've provided is actually the beginning of a useful tool for people, but I admit that as it stands now it’s only of limited value. The next step is to log the exit points of the code. This involves a lot more work, but I think it’s well on its way. At that point this actually becomes a reasonably well performing performance tuning tool. I've used a few of the ones on the market which work really well, but perform very slowly. I believe all of the ones that I've seen are using the COM callback approach that I mentioned earlier.
The final step is to actually allow users to insert any custom code they want at any point in the application. For example, imagine being able to actually modify method MyMethod
in class MyClass
to log the value of MyProperty
every fourth time the code is called on a live system without having to do a recompile and reinstall.