Accessing Assembly Metadata with Reflection or Mono Cecil (Part 6)

KenBeckett

5.00/5 (7 votes)

2 Dec 2012CDDL16 min read

43.9K

944

Loading type metadata from assemblies with Reflection or Mono Cecil.

Introduction

This article is about loading type metadata from .NET assemblies using Reflection or Mono Cecil, with the specific goal of supporting the resolution of symbolic references in a codeDOM. Sources are included. This is Part 6 of a series on codeDOMs, but it may be useful for anyone who wishes to retrieve type metadata from .NET assemblies for any reason. In the previous parts, I’ve discussed “CodeDOMs”, provided a C# codeDOM, a WPF UI and IDE, a C# parser, and solution/project codeDOM classes.

Background

The codeDOM discussed in this series handles symbolic references to named objects with various classes derived from the SymbolicRef base class. These references might be resolved in a manually generated object tree, but when parsed from a C# source file, most of them will be unresolved (represented by the UnresolvedRef object). Only references to built-in types will be resolved, since they are parsed from keywords. Symbolic references in a codeDOM which refer to declarations inside the codeDOM (local variables, members of the same type, types declared within the codeDOM, etc) can be resolved by searching for the declaration within the codeDOM. However, symbolic references to external declarations in other projects or assemblies require a Project object to be resolved, as it contains a collection of references to such external sources of types (these references derive from the Reference class, not SymbolicRef, and they aren’t related to symbolic references). This article will cover only the loading of assemblies and the types in them – the actual resolving of symbolic references using this metadata will be covered in the next article.

Loading Types From References

The first step in loading type data referenced by a Project is to validate and locate (or “resolve”) the project and assembly references. For project references, public types are simply imported from the referenced project into the namespaces of the local project (if an “InternalsVisibleTo” attribute specifies the local project, non-public types must also be imported). For assembly references, the referenced assemblies must be located and loaded before the type metadata is retrieved, and this can be a rather complicated process. A referenced assembly can have a “hint path”, but this is not a guaranteed location – a search has to be done to locate the assembly. Some assemblies might be located in the GAC, and .NET framework (BCL) assemblies have their own locations which must be determined with the help of the targeted framework version specified for the project.

As assemblies are located and loaded, they must be kept track of to prevent duplicate loads and also to (if possible) unload them at some point. This functionality has been placed in a class named ApplicationContext, which is used to load and track all assemblies loaded by the current application (into the primary AppDomain). A class named FrameworkContext is used to track all assemblies for each framework version or profile (such as .NET 2.0, Silverlight, Client Profiles, etc). Some frameworks are partial, meaning they build on a previous release (such as .NET 3.0 and 3.5), and so their FrameworkContext instances are chained together. Each project has a targeted framework version, and assemblies are loaded with Project.LoadAssembly(), which calls FrameworkContext.LoadAssembly() for the targeted framework, which in turn loads them from the global ApplicationContext instance. Loaded assemblies are represented by the class LoadedAssembly, which is actually a base class that will be subclassed for different loading methods and also has a subclass ErrorLoadedAssembly to represent assemblies that failed to load for some reason. When the context objects are looking for assemblies in the GAC, they make use of the GACUtil helper class, which also includes a method for comparing assembly versions.

When a Project is going to be resolved, ResolveReferences() is called on it first, which calls Resolve() on each Reference. For an AssemblyReference, this attempts to find the referenced “.dll” and also verifies that it’s valid for the targeted framework. For a ProjectReference, the referenced project is located in the solution by filename. If the project isn’t found in the codeDOM or the project type isn’t supported, then the output filename is determined and the project reference is treated as an assembly reference in an attempt to still resolve the types. Once the references are resolved, LoadReferencedAssemblies() is called on the project to load all of the referenced assemblies into memory (by calling Load() on each Reference), and finally the type metadata is loaded from each assembly for public types (by calling LoadTypes() on each LoadedAssembly).

When an entire Solution is going to be resolved, ResolveReferences() is called on each project, and then UpdateProjectDependencies() is called on the solution, which determines the order in which the projects should be resolved based upon their references to each other. Then, LoadReferencedAssemblies() is called on each project in dependency order to load the referenced assemblies, and finally the type metadata is loaded from each assembly.

Loading Type Metadata With Reflection

The most obvious method of retrieving type metadata from assemblies in .NET is using Reflection. This is done by calling Assembly.Load() to load an assembly, which returns an Assembly instance. The LoadedAssembly class is subclassed as ReflectionLoadedAssembly for such assemblies. Then, LoadTypes() is called on the resulting assembly object to get an array of Type objects. Members of types are represented by MethodInfo, PropertyInfo, FieldInfo, etc. Helper classes providing some useful static methods for working with these classes are located in Utilities/Reflection.

Feature	Reflection Class
Assemblies	`Assembly`
Members of types	`MemberInfo`
Types	`Type`
All Methods	`MethodBase`
Methods	`MethodInfo`
Constructors	`ConstructorInfo`
Properties	`PropertyInfo`
Events	`EventInfo`
Fields	`FieldInfo`
Parameters	`ParameterInfo`

Reflection doesn’t just load the metadata for examination – it loads assemblies with the intention of allowing them to execute, and this means that they must pass various security and validation checks, or the assembly will fail to load. Some of these checks occur while browsing the types and their members, and can cause exceptions to be thrown.

In using Reflection for this project, I experienced the following issues:

You can’t load different versions of an assembly into the same AppDomain, and different projects can and do reference different versions of the same assembly – this can even occur in the same project due to chained references. This also means an app running on .NET 4 can’t load older .NET libraries. These are serious problems for a tool like Nova CodeDOM.
You can’t unload an assembly from an AppDomain once it’s loaded, you can only unload the entire AppDomain, but you can’t do that to the primary AppDomain. This leads to continuous memory growth when loading a series of different projects that reference different assemblies.
Trying to work around #1 and/or #2 above by using multiple AppDomains is a lot more difficult than might be expected – you can’t reference types loaded into one AppDomain by code in another one, or they will be silently loaded into the other AppDomain. You would have to locate all code that references them locally, or create your own types to marshal the data between the domains. This is not obvious – I’ve seen code where people think they have solved this problem, but they actually haven’t due to referencing the types across the domains.
Exceptions can be thrown when loading assemblies or browsing types, including security (CAS) exceptions, and various other types of exceptions for many different reasons. This can be a major problem that prevents the loading of certain types or entire assemblies.
Performance is not very good.

It turns out that while using Reflection might be fine for a program to analyze its own assemblies at runtime, it’s actually not a good choice at all for loading types from unrelated assemblies in order to do static analysis. So, what other option is there?

Loading Type Metadata With Reflection Using Reflection-Only Loads

Back in .NET 2.0, a “read-only” capability was added to reflection to get around many of the problems with using reflection for static analysis of assemblies. The Assembly.ReflectionOnlyLoad() method is specifically designed to allow for reflection of metadata for static analysis – meaning that you do not intend to execute any code in the assembly, but only inspect it. This would seem to be exactly what is needed for this project, and so I added a CodeDOM.ApplicationContext.UseReflectionOnlyLoad config file option to enable this mode, and set it to be on by default.

Using Reflection with assemblies that were loaded in reflection-only mode gets around the really big problem of not being able to load different versions of the same assembly in the same AppDomain. It also avoids many possible exceptions, because it bypasses strong name verification, CAS policy checks, processor architecture loading rules, binding policies, doesn’t execute any initialization code, and prevents automatic loading of dependent assemblies.

However, reflection-only loads still have some issues:

You still can’t load a different version of the ‘mscorlib’ assembly into the primary AppDomain (you can try, and it will pretend to work, but it won’t – you’ll still have only the version that the running app was compiled with in memory). This can be worked around by “hiding” newer types in mscorlib when an older version was desired in order to prevent resolve conflicts.
You still can’t unload an assembly from an AppDomain once it’s loaded.
So, #1 is mostly fixed, but #2 is the same, and using multiple AppDomains to get around these issues is just as difficult as before.
Although many exceptions are avoided, new problems can occur that are specific to reflection-only mode. Bypassing the binding policies can cause attempts to load older framework assemblies that aren’t compatible with newer and/or 64-bit OSes. Cross-references between normally loaded assemblies (such as the resident ‘mscorlib’ that you can’t replace) and reflection-only loaded assemblies can cause problems. Exceptions are reduced, but they are far from eliminated.
Just like normal Reflection, performance is not very good.
Unlike normal Reflection, when using reflection-only mode dependent assemblies aren’t handled automatically. A callback is fired whenever an assembly references another one (see OnReflectionOnlyAssemblyResolve), and the assembly must be manually located and loaded. This provides some flexibility, but finding the correct assembly can take a lot of work, especially because the callback can occur at any time (such as when type metadata is being browsed) and there isn’t any easy way to determine the precise context (such as which project the callback is related to when an entire solution is being loaded).
Because no code in the assembly can be executed, anything that instantiates types will throw an exception. For example, you can’t retrieve custom attributes using the normal GetCustomAttributes() method (on Assembly, MemberInfo, or ParameterInfo) because it instantiates them. This particular issue can be worked around by using the CustomAttributeData class, which provides static methods for retrieving custom attributes from these types.

It’s possible to work around some of these issues, and I’ve added logic to do that where I could, but it’s still not too hard to come across a project that you just can’t load without some problems.

In summary, it turns out that although using Reflection in reflection-only mode avoids some problems, it still has some serious issues in certain cases. It’s still likely to fail to load some assemblies or types, so it’s not a great choice for loading metadata from assemblies for static analysis. So, we could use an alternative, and one such possibility is Mono Cecil.

Loading Type Metadata With Mono Cecil

Mono Cecil is an open source library for reading metadata from .NET assemblies (it’s part of the Mono project). Based upon my problems with using Reflection, I decided to add support for using Mono Cecil (version 0.9.5) to load metadata and see how it compared, and so I added a CodeDOM.ApplicationContext.UseMonoCecilLoads config file option (which is on by default). In this case, assemblies are loaded by calling AssemblyDefinition.ReadAssembly(), which returns an AssemblyDefinition instance. The LoadedAssembly class is subclassed as MonoCecilLoadedAssembly for such assemblies, and a MonoCecilAssemblyResolver class is used to resolve dependent assemblies during loading or type browsing. Then, TypeDefinition objects are loaded from the assembly definition object. Members of types are represented by MethodDefinition, PropertyDefinition, FieldDefinition, etc. Helper classes providing some useful static methods for working with these classes are located in Utilities/Mono.Cecil.

Feature	Reflection Class	Mono Cecil Class
Assemblies	`Assembly`	`AssemblyDefinition`
Members of types	`MemberInfo`	`IMemberDefinition`
Types	`Type`	`TypeDefinition`
Generic types	`Type`	`GenericInstanceType`
Type parameters	`Type`	`GenericParameter`
All Methods	`MethodBase`	`MethodDefinition`
Methods	MethodInfo	`MethodDefinition`
Constructors	ConstructorInfo	`MethodDefinition`
Properties	`PropertyInfo`	`PropertyDefinition`
Events	`EventInfo`	`EventDefinition`
Fields	`FieldInfo`	`FieldDefinition`
Parameters	`ParameterInfo`	`ParameterDefinition`

Performance of Mono Cecil is generally very good, apparently partly because of deferred loading (which moves some CPU time from assembly load time to later browsing of the type data). On average in my experience, it takes about 1/3 the time that Reflection takes to load assemblies and types. However, memory usage is actually much higher than Reflection – about twice as much on average. The table below shows some example times of loading assemblies and types for some solutions along with memory usage.

Solution	Projects	Files	Load (secs)		Diff	Memory (MB)		Diff
Solution	Projects	Files	Refl.	Cecil	Diff	Refl.	Cecil	Diff
Nova	8	687	1.1	0.5	45%	27	49	181%
SubText 2.5.2	7	849	1.1	0.2	18%	24	76	317%
MS EntLib Tests	70	2,445	2.2	0.7	32%	76	155	204%
Large Proprietary	43	4,677	4.6	1.2	26%	129	223	173%

Issues with Mono Cecil include:

Despite some comments on the web to the contrary (perhaps for older versions) it uses a lot more memory than Reflection – roughly twice as much on average (it varies from 50% more to twice as much, or in some cases 3-4 times as much).
It isn’t thread safe – not even if you’re only reading type data. This is a rather shocking omission for a library that tends to load and process a lot of data. As a workaround for this, the ILSpy project on github has a forked version that has been made thread safe for reading only.
It has a questionable object model, with objects that represent definitions actually deriving from objects that represent references (TypeDefinition derives from TypeReference, MethodDefinition from MethodReference, etc). This seems to be a trick to allow the definitions to also be treated as references, or perhaps just to inherit similar functionality as a possible consequence of the metadata format. In any case, it’s not a logical “is-a” relationship, so it’s somewhat confusing and precludes the use of normal inheritance, such as a common base class for all member definitions (an interface is implemented instead). In some cases, it gets downright ugly, such as generic type instances represented by GenericInstanceType have a GenericArguments collection of TypeReferences (and HasGenericArguments property), but also have a GenericParameters collection of GenericParameters (and HasGenericParameters property) – courtesy of the TypeReference base class – which is not used. It just doesn’t seem very clean to me, and it also seems that it could have been more similar to Reflection in order to reduce the learning curve.
It hard-codes the use of ‘mscorlib’ for built-in types, so it won’t work correctly with assemblies that supply their own built-in types instead of using mscorlib (this is relatively rare). It’s possible to work around this limitation by modifying the source.

Mono Cecil (version 0.9.5) uses a lot more memory than Reflection, isn’t thread safe, and has a somewhat confusing object model. This is unfortunate, because otherwise it would be truly great. But, it certainly gets the job done when Reflection sometimes can’t, it’s faster, and it’s open source – so there’s still plenty to like about it. I should also mention that it allows you to read and modify IL, so it’s a great option if you need to do that.

I’ve left the Reflection capability in Nova CodeDOM as a fallback primarily because Mono Cecil uses so much more memory, but I don’t really expect it to get used much – Mono Cecil just works better overall. Are there any other alternatives? Yes – there is CCI (Common Compiler Infrastructure) on CodePlex and perhaps others, but the word on the web seems to be that Mono Cecil is generally easier to use. If anyone has direct experience otherwise, please let me know. In the meantime, I think Mono Cecil is good enough for this project.

Using the “Reference Assemblies” for the .NET BCL

Starting with .NET 3.0, “reference assemblies” have been provided for the .NET framework for design and build time use, and are preferred to the runtime assemblies in the GAC. These assemblies were added to avoid conflicts due to minor changes in the runtime assemblies, and they contain metadata only (no IL code). They are located in “%ProgramFiles%\Reference Assemblies\Microsoft\Framework\...”, with separate subdirectories for different versions and profiles. If you look at a BCL assembly reference (such as “System”) in a VS project, you’ll see that it points to these “reference” assemblies. The code included with this article will attempt to use these assemblies instead of the runtime assemblies if possible (they won’t exist if VS is not installed on the machine, and they don’t exist for frameworks prior to 3.0).

Using the Attached Source Code

A new Projects/Assemblies folder has been added with new classes used to load assemblies and their type metadata, and some existing classes have been updated to support the new functionality (such as Solution and Project for things related to loading). Loading solutions/projects will now show output messages regarding the loading of referenced assemblies and their types (set Log.LogLevel to Detailed in the config file to have all loaded assemblies listed as they are loaded). Other than the loading of assemblies and type data, there is little change in functionality from the previous article – this has been necessary preparation for resolving symbolic references, which will be implemented in the next article. As usual, a separate ZIP file containing binaries is provided so that you can run them without having to build them first.

Summary

My codeDOM is now able to load type metadata from the various assemblies referenced by projects, and it also has knowledge of references between projects in a solution. I now have everything needed to tackle the next big part of this project: resolving. In my next article, I’ll undertake the big task of resolving all symbolic references within a codeDOM.

License

This article, along with any associated source code and files, is licensed under The Common Development and Distribution License (CDDL)