(untagged)

A simple Orca clone using Reflection.Emit

Reto Ravasio

0.00/5 (No votes)

16 Aug 2011

A WPF app that uses dynamic types and databinding for displaying MSI files.

Introduction

The idea of this article is not to present a complete Orca replacement but to show how the combination of Reflection.Emit and the Reflection based discovery mechanism of WPF helps solve the problem when data must be shown which has a structure that is not known at compile time.

Background

The MSI SDK defines close to 100 tables and the columns they contain. One possible approach to build a viewer application would be to model all known tables as classes. This approach has two grave disadvantages. The first is that it's a lot of work and the second and more important one is that a lot of MSI files contain custom tables that couldn't be shown with this approach. A better solution (and the one presented here) is to use the MSI database functions to retrieve the schema of the database and create the required classes dynamically.

Using the Code

The sample project contains three classes:

TypeBuilder

This is a helper class for creating dynamic types on the fly. The instance method GetTypeFromPropertyList takes an array of Name/Type pairs as input and returns the type of the created class. The class internally caches created types so that each dynamic type is only created once. The criteria for cache lookup are equality on all input Name/Type pairs. This is the only class that internally uses Reflection.Emit.

MsiReader

This class internally uses the automation interface of Windows Installer to retrieve the data found in an MSI database. The class uses the passed in TypeBuilder instance to create dynamic types on the fly. The class exposes three public methods:

GetTableNames: returns a list of tables contained in the installer file.
GetTableContent: returns an array of dynamically created objects that are populated with the content of the table. The method requires the name of the table as input.
GetBinaryContent: returns an array of bytes representing binary data found in the database.

The reason why binary data is handled differently than the basic types (int and string) is one of size and performance. Simple types are directly read from the database and copied into the dynamically created object. Because binary data blobs can be huge, it makes no sense to read all the data and copy its entire content into the model. Instead, a reference to the location in the database is copied into the model. This database reference is represented by the BinaryConentDescription class (which is a nested private class within MsiReader) and IBinaryConentDescription which is its publicly exposed representation. The reference can then be passed into the GetBinaryContent method to retrieve the actual data.

MainWindow

This class’ main responsibility is displaying the content of the loaded database and reacting on user input. The ListBox (left-pane) shows the tables found in the database. The GridView (right-pane) shows the content of the selected table. The class has a single private method (ProcessMsiFile) that creates an instance of the MsiReader class, retrieves the data, and sets with it the DataContext for WPF. The rest of the processing is done by WPF's data-binding magic.

The image above shows the controls that are data-bound to the model. The model is an anonymous class created within the ProcessMsiFile method.

The following sequence diagram shows how the three classes and the Windows installer interact:

Dynamic Types

The following shows how one of the database tables (TextStyle) is mapped to C# and finally to the IL code that is emitted by the TypeBuilder class:

Column	Type	Key	Nullable
`TextStyle`	`Identifier`	Y	N
`FaceName`	`Text`	N	N
`Size`	`Integer`	N	N
`Color`	`DoubleInteger`	N	Y
`StyleBits`	`Integer`	N	Y

The generated dynamic type representing the TextStyle table looks as follows in C#:

class dyn_<guid>
{
    public dyn_<guid>(string p1, string p2, int p3, int? p4, int? p5)
    {
        m_TextStyle = p1;
        m_FaceName = p2;
        m_Size = p3;
        m_Color = p4;
        m_StyleBits = p5;
    }

    string m_TextStyle;
    string m_FaceName;
    int m_Size;
    int? m_Color;
    int? m_StyleBits;

    public string TextStyle
    {
        get
        {
            return m_TextStyle;
        }
    }
    public string FaceName
    {
        get
        {
            return m_FaceName;
        }
    }

    public int Size
    {
        get
        {
            return m_Size;
        }
    }
    public int? Color
    {
        get
        {
            return m_Color;
        }
    }
    public int? StyleBits
    {
        get
        {
            return m_StyleBits;
        }
    }
}

To note, here is how the database types are mapped to the .NET types.

And finally, the same in IL (FaceName and StyleBits removed to keep listing short):

// Fields
.field private string m_TextStyle
.field private int32 m_Size
.field private valuetype [mscorlib]System.Nullable`1<int32> m_Color

// Methods
.method public hidebysig specialname rtspecialname
 instance void .ctor (
  string p1,
  int32 p3,
  valuetype [mscorlib]System.Nullable`1<int32> p4
 ) cil managed
{
 IL_0000: ldarg.0
 IL_0001: call instance void [mscorlib]System.Object::.ctor()
 IL_0006: ldarg.0
 IL_0007: ldarg.1
 IL_0008: stfld string MSIExplorer.dyn_abc::m_TextStyle
 IL_0014: ldarg.0
 IL_0015: ldarg.3
 IL_0016: stfld int32 MSIExplorer.dyn_abc::m_Size
 IL_001b: ldarg.0
 IL_001c: ldarg.s p4
 IL_001e: stfld valuetype [mscorlib]System.Nullable`1<int32>
                        MSIExplorer.dyn_abc::m_Color
 IL_002b: ret
} // end of method dyn_abc::.ctor

.method public hidebysig specialname
 instance string get_TextStyle () cil managed
{
 IL_0000: ldarg.0
 IL_0001: ldfld string MSIExplorer.dyn_abc::m_TextStyle
 IL_0006: ret
} // end of method dyn_abc::get_TextStyle


.method public hidebysig specialname
 instance int32 get_Size () cil managed
{
 IL_0000: ldarg.0
 IL_0001: ldfld int32 MSIExplorer.dyn_abc::m_Size
 IL_0006: ret
} // end of method dyn_abc::get_Size

.method public hidebysig specialname
 instance valuetype [mscorlib]System.Nullable`1<int32>
                        get_Color () cil managed
{
 IL_0000: ldarg.0
 IL_0001: ldfld valuetype [mscorlib]System.Nullable`1<int32>
                        MSIExplorer.dyn_abc::m_Color
 IL_0006: ret
} // end of method dyn_abc::get_Color

// Properties
.property instance string TextStyle()
{
 .get instance string MSIExplorer.dyn_abc::get_TextStyle()
}
.property instance int32 Size()
{
 .get instance int32 MSIExplorer.dyn_abc::get_Size()
}
.property instance valuetype [mscorlib]System.Nullable`1<int32> Color()
{
 .get instance valuetype [mscorlib]System.Nullable`1<int32>
                        MSIExplorer.dyn_abc::get_Color()
}

And this is how the type looks like in the grid:

This is also the approach I've taken while writing the IL code in the TypeBuilder class. I've first written the above C# class, used ILSpy to have a look at the generated IL, and copied that to the source file. Converting the original IL code to Reflection.Emit calls is no big issue.

With the above knowledge, it should be easy to understand what the code below does:

private Type CreateTypeFromPropertyList
                (Tuple<string, Type>[] properties, string typeName)
{
    Emit.TypeBuilder tb = mb.DefineType(typeName, TypeAttributes.Public);

    var fields = new List<Emit.FieldBuilder>();
    foreach (var prop in properties)
    {
        // field
        Emit.FieldBuilder fb = tb.DefineField("m_" + prop.Item1,
                                prop.Item2, FieldAttributes.Private);
        fields.Add(fb);

        // property
        Emit.PropertyBuilder pb = tb.DefineProperty(prop.Item1,
                                PropertyAttributes.HasDefault, prop.Item2, null);
        Emit.MethodBuilder mbPropGetAccessor = tb.DefineMethod("get_" + prop.Item1,
                                MethodAttributes.Public | MethodAttributes.SpecialName |
                                MethodAttributes.HideBySig, prop.Item2, Type.EmptyTypes);
        Emit.ILGenerator propGetIL = mbPropGetAccessor.GetILGenerator();
        propGetIL.Emit(Emit.OpCodes.Ldarg_0);
        propGetIL.Emit(Emit.OpCodes.Ldfld, fb);
        propGetIL.Emit(Emit.OpCodes.Ret);
        pb.SetGetMethod(mbPropGetAccessor);
    }

    // constructor ctor(f1, f2, f3, ...)
    Emit.ConstructorBuilder ctor2 = tb.DefineConstructor(
                                MethodAttributes.Public, CallingConventions.Standard,
                                properties.Select(a => a.Item2).ToArray());
    Emit.ILGenerator ctor2IL = ctor2.GetILGenerator();
    ctor2IL.Emit(Emit.OpCodes.Ldarg_0);
    ctor2IL.Emit(Emit.OpCodes.Call, typeof(object).GetConstructor(Type.EmptyTypes));

    foreach (Emit.FieldBuilder fb in fields)
    {
        ctor2IL.Emit(Emit.OpCodes.Ldarg_0);
        ctor2IL.Emit(Emit.OpCodes.Ldarg, (byte)fields.FindIndex(a => a == fb) + 1);
        ctor2IL.Emit(Emit.OpCodes.Stfld, fb);
    }
    ctor2IL.Emit(Emit.OpCodes.Ret);

    return tb.CreateType();
}

Note the changed ordering compared to the original IL. The reason for that is that I wanted to reduce the number of required loops without compromising on readability. The first loop creates the field, the property, and the property getter (in that order), while the second loop emits the code to set the field in the constructor body.

For initial testing, I used Reflection-Emit's capability to generate an assembly and checked its content with ILSpy and PEVerify.exe.

Points of Interest

.NET Arrays

One of the biggest surprises for me was how arrays work in regard to Reflection and data-binding. The first naive approach was to just collect the data and do a ToArray to return the data. There was a big surprise when WPF's data binding didn't work with this approach.

public object[] GetTableContent(string tableName)
{
    var resList = new List<object>();

    // fill resList with content
    ...

    // this compiles but doesn't work
    return resList.ToArray();

    // this works as expected
    var arr = Array.CreateInstance(rowType, resList.Count);
    Array.Copy(resList.ToArray(), arr, resList.Count);
    return (object[])arr;
}

The difference of the two arrays cannot be seen with the debugger, both look like they are of type object[] but the one that works is actually an array of a dynamic type.

DataGrid and Nullable Types

Another annoying bug/feature is DataGrid's inability to properly handle nullable types in regard to sorting. Some of the integers are nullable in the database and are therefore represented as int? in the dynamic classes. For those columns, sorting just stops working!

LINQ and Memory Consumption

The third issue I had was one of memory consumption. When saving large binary files (> 100 MB), there is a possibility of an OutOfMemoryException being thrown. This happens when running the X86 platform debug build. When running the program on X64, the limit is way above anything I could find in my MSI files.

The reason why this happens so early is probably how I convert the stream data (returned as string from the installer API) to a byte array. I use the following LINQ code:

<string buffer>.SelectMany(c => BitConverter.GetBytes(c)).ToArray();

This is probably one of the areas where unsafe code would be justifiable to improve performance and to reduce the risk of running out of memory.

Lazy Loading

Another interesting observation is how easy it was to introduce delayed loading of the table content into the application by inserting a Lazy<T> member at the proper location.

private void ProcessMsiFile(string fileName)
{
    MsiReader msiReader = new MsiReader(builder, fileName);

    this.DataContext = new
    {
        FileName = Path.GetFileName(fileName),
        Tables =
            (from tableName in msiReader.GetTableNames()
                select new
                {
                    TableName = tableName,
                    Rows = new Lazy<object[]>
                        (() => msiReader.GetTableContent(tableName))
                    //TODO:Testing - no lazy loading
                    //Rows = new { Value =
                    //    msiReader.GetTableContent(tableName)},
                }).ToArray()
    };
}

The reason for this trick was the relatively slow loading of large MSI files. Note also that by inserting Lazy<T> into the ProcessMsiFile function, the sequence diagram above doesn’t reflect anymore how long the instance of the MsiReader lives.

Conclusion

The code shown here is by no means ready to be used in a real-word program. The goal here is to show how it's possible to solve a relatively complex problem with a few hundred lines of code by leveraging some of the more exotic APIs available in the .NET Framework. This is also the reason why there are a lot of features missing from the sample program to make it really useful. Some of the more obvious ones are support for modifying the database and a find/replace facility. Although these two features would be really useful, I felt that I'd miss the goal of the article by losing the simplicity of the current implementation.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here