Introduction
The idea of this article is not to present a complete Orca replacement
but to show how the combination of Reflection.Emit and the Reflection
based discovery mechanism of WPF helps solve the problem when data must
be shown which has a structure that is not known at compile time.
Background
The MSI SDK defines close to 100 tables
and the columns they contain. One possible approach to build a viewer application would be to model all known tables as classes. This approach has two grave disadvantages.
The first is that it's a lot of work and the second and more important one is that a lot of MSI files contain custom tables that couldn't be shown with this approach.
A better solution (and the one presented here) is to use the MSI database functions to retrieve the schema of the database and create the required classes dynamically.
Using the Code
The sample project contains three classes:
TypeBuilder
This is a helper class for creating dynamic types on the fly. The instance method GetTypeFromPropertyList
takes an array of Name/Type pairs as input and returns
the type of the created class. The class internally caches created types so that each dynamic type is only created once. The criteria for cache lookup are
equality on all input Name/Type pairs. This is the only class that internally uses Reflection.Emit.
MsiReader
This class internally uses the automation interface of Windows Installer to retrieve the data found in an MSI
database. The class uses the passed in TypeBuilder
instance to create dynamic types on the fly. The class exposes three public methods:
GetTableNames
: returns a list of tables contained in the installer file.
GetTableContent
: returns an array of dynamically created objects that are populated with the content of the table.
The method requires the name of the table as input.
GetBinaryContent
: returns an array of bytes representing binary data found in the database.
The reason why binary data is handled differently than the basic types (int and string) is one of size and performance. Simple types are directly read from the database and copied into
the dynamically created object. Because binary data blobs can be huge, it makes no sense to read all the data and copy its entire content into the model.
Instead, a reference to the location in the database is copied into the model. This database reference is represented by the BinaryConentDescription
class
(which is a nested private class within MsiReader
) and IBinaryConentDescription
which is its publicly exposed representation. The reference can then be passed
into the GetBinaryContent
method to retrieve the actual data.
MainWindow
This class’ main responsibility is displaying the content of the loaded database and reacting on user input. The ListBox
(left-pane) shows the tables found in the
database. The GridView
(right-pane) shows the content of the selected table. The class has a single private method (ProcessMsiFile
)
that creates an instance of the MsiReader
class, retrieves the data, and sets with it the DataContext
for WPF. The rest of the processing is done
by WPF's data-binding magic.
The image above shows the controls that are data-bound to the model. The model is an anonymous class created within the ProcessMsiFile
method.
The following sequence diagram shows how the three classes and the Windows installer interact:
Dynamic Types
The following shows how one of the database tables (TextStyle
)
is mapped to C# and finally to the IL code that is emitted by the TypeBuilder
class:
Column |
Type |
Key |
Nullable |
TextStyle |
Identifier |
Y |
N |
FaceName |
Text |
N |
N |
Size |
Integer |
N |
N |
Color |
DoubleInteger |
N |
Y |
StyleBits |
Integer |
N |
Y |
The generated dynamic type representing the TextStyle
table looks as follows in C#:
class dyn_<guid>
{
public dyn_<guid>(string p1, string p2, int p3, int? p4, int? p5)
{
m_TextStyle = p1;
m_FaceName = p2;
m_Size = p3;
m_Color = p4;
m_StyleBits = p5;
}
string m_TextStyle;
string m_FaceName;
int m_Size;
int? m_Color;
int? m_StyleBits;
public string TextStyle
{
get
{
return m_TextStyle;
}
}
public string FaceName
{
get
{
return m_FaceName;
}
}
public int Size
{
get
{
return m_Size;
}
}
public int? Color
{
get
{
return m_Color;
}
}
public int? StyleBits
{
get
{
return m_StyleBits;
}
}
}
To note, here is how the database types are mapped to the .NET types.
And finally, the same in IL (FaceName
and StyleBits
removed to keep listing short):
.field private string m_TextStyle
.field private int32 m_Size
.field private valuetype [mscorlib]System.Nullable`1<int32> m_Color
.method public hidebysig specialname rtspecialname
instance void .ctor (
string p1,
int32 p3,
valuetype [mscorlib]System.Nullable`1<int32> p4
) cil managed
{
IL_0000: ldarg.0
IL_0001: call instance void [mscorlib]System.Object::.ctor()
IL_0006: ldarg.0
IL_0007: ldarg.1
IL_0008: stfld string MSIExplorer.dyn_abc::m_TextStyle
IL_0014: ldarg.0
IL_0015: ldarg.3
IL_0016: stfld int32 MSIExplorer.dyn_abc::m_Size
IL_001b: ldarg.0
IL_001c: ldarg.s p4
IL_001e: stfld valuetype [mscorlib]System.Nullable`1<int32>
MSIExplorer.dyn_abc::m_Color
IL_002b: ret
}
.method public hidebysig specialname
instance string get_TextStyle () cil managed
{
IL_0000: ldarg.0
IL_0001: ldfld string MSIExplorer.dyn_abc::m_TextStyle
IL_0006: ret
}
.method public hidebysig specialname
instance int32 get_Size () cil managed
{
IL_0000: ldarg.0
IL_0001: ldfld int32 MSIExplorer.dyn_abc::m_Size
IL_0006: ret
}
.method public hidebysig specialname
instance valuetype [mscorlib]System.Nullable`1<int32>
get_Color () cil managed
{
IL_0000: ldarg.0
IL_0001: ldfld valuetype [mscorlib]System.Nullable`1<int32>
MSIExplorer.dyn_abc::m_Color
IL_0006: ret
}
.property instance string TextStyle()
{
.get instance string MSIExplorer.dyn_abc::get_TextStyle()
}
.property instance int32 Size()
{
.get instance int32 MSIExplorer.dyn_abc::get_Size()
}
.property instance valuetype [mscorlib]System.Nullable`1<int32> Color()
{
.get instance valuetype [mscorlib]System.Nullable`1<int32>
MSIExplorer.dyn_abc::get_Color()
}
And this is how the type looks like in the grid:
This is also the approach I've taken while writing the IL code in the TypeBuilder
class.
I've first written the above C# class, used ILSpy to have a look at the generated
IL, and copied that to the source file. Converting the original IL code to Reflection.Emit calls is no big issue.
With the above knowledge, it should be easy to understand what the code below does:
private Type CreateTypeFromPropertyList
(Tuple<string, Type>[] properties, string typeName)
{
Emit.TypeBuilder tb = mb.DefineType(typeName, TypeAttributes.Public);
var fields = new List<Emit.FieldBuilder>();
foreach (var prop in properties)
{
Emit.FieldBuilder fb = tb.DefineField("m_" + prop.Item1,
prop.Item2, FieldAttributes.Private);
fields.Add(fb);
Emit.PropertyBuilder pb = tb.DefineProperty(prop.Item1,
PropertyAttributes.HasDefault, prop.Item2, null);
Emit.MethodBuilder mbPropGetAccessor = tb.DefineMethod("get_" + prop.Item1,
MethodAttributes.Public | MethodAttributes.SpecialName |
MethodAttributes.HideBySig, prop.Item2, Type.EmptyTypes);
Emit.ILGenerator propGetIL = mbPropGetAccessor.GetILGenerator();
propGetIL.Emit(Emit.OpCodes.Ldarg_0);
propGetIL.Emit(Emit.OpCodes.Ldfld, fb);
propGetIL.Emit(Emit.OpCodes.Ret);
pb.SetGetMethod(mbPropGetAccessor);
}
Emit.ConstructorBuilder ctor2 = tb.DefineConstructor(
MethodAttributes.Public, CallingConventions.Standard,
properties.Select(a => a.Item2).ToArray());
Emit.ILGenerator ctor2IL = ctor2.GetILGenerator();
ctor2IL.Emit(Emit.OpCodes.Ldarg_0);
ctor2IL.Emit(Emit.OpCodes.Call, typeof(object).GetConstructor(Type.EmptyTypes));
foreach (Emit.FieldBuilder fb in fields)
{
ctor2IL.Emit(Emit.OpCodes.Ldarg_0);
ctor2IL.Emit(Emit.OpCodes.Ldarg, (byte)fields.FindIndex(a => a == fb) + 1);
ctor2IL.Emit(Emit.OpCodes.Stfld, fb);
}
ctor2IL.Emit(Emit.OpCodes.Ret);
return tb.CreateType();
}
Note the changed ordering compared to the original IL. The reason for that is that I wanted to reduce the number of required loops
without compromising on readability. The first loop creates the field, the property, and the property getter (in that order),
while the second loop emits the code to set the field in the constructor body.
For initial testing, I used Reflection-Emit's capability to generate an assembly and checked its content with
ILSpy and PEVerify.exe.
Points of Interest
.NET Arrays
One of the biggest surprises for me was how arrays work in regard to Reflection and data-binding. The first naive approach was to just collect the data and do
a ToArray
to return the data. There was a big surprise when WPF's data binding didn't work with this approach.
public object[] GetTableContent(string tableName)
{
var resList = new List<object>();
...
return resList.ToArray();
var arr = Array.CreateInstance(rowType, resList.Count);
Array.Copy(resList.ToArray(), arr, resList.Count);
return (object[])arr;
}
The difference of the two arrays cannot be seen with the debugger, both look like they are of type object[]
but the one that
works is actually an array of a dynamic type.
DataGrid and Nullable Types
Another annoying bug/feature is DataGrid
's inability to properly handle nullable types
in regard to sorting. Some of the integers are nullable in the database and are therefore represented as int?
in the dynamic classes. For those columns, sorting just stops working!
LINQ and Memory Consumption
The third issue I had was one of memory consumption. When saving large binary files (> 100 MB), there is a possibility of an OutOfMemoryException
being thrown.
This happens when running the X86 platform debug build. When running the program on X64, the limit is way above anything I could find in my MSI files.
The reason why this happens so early is probably how I convert the stream data (returned as string from the installer API) to a byte array. I use the following LINQ code:
<string buffer>.SelectMany(c => BitConverter.GetBytes(c)).ToArray();
This is probably one of the areas where unsafe code would be justifiable to improve performance and to reduce the risk of running out of memory.
Lazy Loading
Another interesting observation is how easy it was to introduce delayed loading of the table content into the application by inserting
a Lazy<T>
member at the proper location.
private void ProcessMsiFile(string fileName)
{
MsiReader msiReader = new MsiReader(builder, fileName);
this.DataContext = new
{
FileName = Path.GetFileName(fileName),
Tables =
(from tableName in msiReader.GetTableNames()
select new
{
TableName = tableName,
Rows = new Lazy<object[]>
(() => msiReader.GetTableContent(tableName))
}).ToArray()
};
}
The reason for this trick was the relatively slow loading of large MSI files. Note also that by inserting Lazy<T>
into
the ProcessMsiFile
function, the sequence diagram above doesn’t reflect anymore how long the instance
of the MsiReader
lives.
Conclusion
The code shown here is by no means ready to be used in a real-word program. The goal here is to show how it's possible to solve a relatively complex
problem with a few hundred lines of code by leveraging some of the more exotic APIs available in the .NET Framework. This is also the reason why
there are a lot of features missing from the sample program to make it really useful. Some of the more obvious ones are support for modifying the
database and a find/replace facility. Although these two features would be really useful, I felt that I'd miss the goal of the article by losing
the simplicity of the current implementation.