Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / VB

CodeDOM Go Kit: The CodeDOM is Dead, Long Live the CodeDOM

5.00/5 (6 votes)
11 Dec 2019MIT9 min read 9.8K   85  
If you use the CodeDOM, here's an indispensable package to make it awesome.

Introduction

I really hate the CodeDOM, and you should, too. Well, let's back up. I like the idea of the CodeDOM, but not the execution. It's clunky, kludgy and extremely limited. If I never cared about language agnostic code generation in .NET, I'd never use it.

However, I do care about those things. I also care about tapping into things like Microsoft's XML serialization code generator and mutilating its output into something less awful.

Furthermore, Roslyn looks to be limited in terms of its platform availability and its language support, which seems set in stone at VB and C#. Meanwhile, the F# team produced a CodeDOM provider, this time for their language, so it looks like the CodeDOM can still generate in languages Roslyn can't, and on devices where Roslyn can't currently operate. I might be wrong about the platforms, and maybe .NET 5 will change everything, but this is where we are now.

Here's a few of the things the CodeDOM always needed:

  • A Parse() function. Gosh that would have been cool.
  • An Eval() function, even if it could only do certain expressions
  • A Visit() function, or otherwise a way to search the graph

I decided to finally do something about that, and with it comes a really quick and clean way to build and edit CodeDOM trees.

Roll up your sleeves, because we're about to flip the switch on this Frankenstein.

Building this Mess

The source distribution contains two binaries in the project directory. These are not used at runtime, and are used solely during a build-step to generate SlangTokenizer.cs. The tool is called Rolex, and I provide a link to the CodeProject article in the "Further Reading" section. These aren't strictly required in order for the project to build, but they allow edits to SlangTokenizer.rl to be reflected in the associated source code file. Without this build step, SlangTokenizer.cs will not change even if SlangTokenizer.rl does.

Using this Mess

First, let's talk about parsing. No, I didn't implement a parser for every language the CodeDOM has ever been rendered to. What I did instead, was make a parser for a CodeDOM compatible subset of C# that I call Slang. Code written in Slang will parse to a CodeDOM graph and thus render to any language for which there exists an adequate CodeDOM provider. Think of it as a subset of C# you can convert to other languages, like VB.

Slang basically eliminates most of the gruntwork of building the CodeDOM tree. This little bit of witchcraft will let you create an expression simply by:

C#
CodeExpression expr = SlangParser.ParseExpression("(a + b * c - d) > e");

You can parse whole compile units too - entire files at a time, so you never need to deal with the CodeDOM directly in that case. So we've got our parsing.

Next, this is only practical for a limited number of uses, unless of course you want to make a calculator app but on our CodeDomResolver object, we have an Evaluate() method that takes some code in the form of a CodeExpression and interprets it, giving you the result. It can compute, call methods, access properties, etc., but it has no notion of variables or arguments so it is not a full interpreter, although with some work, it could be. Combining with what we just saw from above, we can do something like this:

C#
var res = new CodeDomResolver();
Console.WriteLine(res.Evaluate(SlangParser.ParseExpression("5*4-7")));

Putting together what we just went over, and adding a couple new tricks, the following program spits out 13, followed by its own code except in VB:

C#
using CD;
using System;
using System.CodeDom;

namespace CodeDomDemo
{
    class Program
    {
        static void Main(string[] args)
        {
            // evaluates a simple expression 
            var res = new CodeDomResolver();
            Console.WriteLine(res.Evaluate(SlangParser.ParseExpression("5*4-7")));

            // takes this file and converts it to vb
            var ccu = SlangParser.ReadCompileUnitFrom("..\\..\\Program.cs");
            ccu.ReferencedAssemblies.Add("CodeDomGoKit.dll");
            ccu.ReferencedAssemblies.Add(typeof(CodeObject).Assembly.GetName().ToString());
            SlangPatcher.Patch(ccu);
            Console.WriteLine(CodeDomUtility.ToString(ccu, "vb"));
        }
    }
}

Like this:

VB.NET
Option Strict Off
Option Explicit On

Imports CD
Imports System
Imports System.CodeDom

Namespace CodeDomDemo
    Friend Class Program
        Public Shared Sub Main()
            'evaluates a simple expression
            Dim res As CodeDomResolver = New CodeDomResolver()
            System.Console.WriteLine(res.Evaluate(CD.SlangParser.ParseExpression("5*4-7")))
            'takes this file and converts it to vb
            Dim ccu As System.CodeDom.CodeCompileUnit = _
                         CD.SlangParser.ReadCompileUnitFrom("..\..\Program.cs")
            ccu.ReferencedAssemblies.Add("CodeDomGoKit.dll")
            ccu.ReferencedAssemblies.Add(GetType(CodeObject).Assembly.GetName.ToString)
            CD.SlangPatcher.Patch(ccu)
            System.Console.WriteLine(CD.CodeDomUtility.ToString(ccu, "vb"))
        End Sub
    End Class
End Namespace

Now you have even less of an excuse to ever write anything in VB again. More importantly, now you have a breezy way to create CodeDOM graphs.

In my best Billy Mays voice, "but wait, there's more!" The above doesn't really demonstrate any dynamism, but obviously if we're generating code, we need the generation to be dynamic.

For doing this, I've provided a preprocessor that allows for simple T4 text template processing. That way, you can use T4 (ASP like) context switches <# #> in order to dynamically render from a template. Your generation project can store that template as an embedded resource and use it to generate code. Here's an example of using the preprocessor on the file Test.tt. First, here's the template:

ASP.NET
using System;
class Program 
{
    static void Main() 
    {
    <# 
    for(var i = 0;i<5;++i) {
    #>
        Console.WriteLine("Hello World! #<#=i+1#>");
    <#
    }
    #>
    }
}

After preprocessing, it yields this:

C#
using System;
class Program
{
        static void Main()
        {
                Console.WriteLine("Hello World! #1");
                Console.WriteLine("Hello World! #2");
                Console.WriteLine("Hello World! #3");
                Console.WriteLine("Hello World! #4");
                Console.WriteLine("Hello World! #5");
        }
}

Note that the output's formatting does not matter. Watch what happens when we run this through Slang and output to C#:

C#
using System;

internal class Program {
    public static void Main() {
        System.Console.WriteLine("Hello World! #1");
        System.Console.WriteLine("Hello World! #2");
        System.Console.WriteLine("Hello World! #3");
        System.Console.WriteLine("Hello World! #4");
        System.Console.WriteLine("Hello World! #5");
    }
}

Not only is the formatting better, but it has made some changes to the program. For one, our types have been fully qualified. For another, our Main() method was made public! That's because it was detected as an entry point method so SlangPatcher.Patch() winds up representing it with a CodeEntryPointMethod. When the CodeDOM renders this class, it always sets the visibility to public. We have no direct control over that. The CodeDOM also does not support arguments or return values for entry point methods, unfortunately.

Anyway, here's the code to make that happen, assuming Test.tt is in our project directory:

C#
var sw = new StringWriter();
using (var sr = new StreamReader(@"..\..\Test.tt"))
    SlangPreprocessor.Preprocess(sr, sw);
var ccu = SlangParser.ParseCompileUnit(sw.ToString());
SlangPatcher.Patch(ccu);
Console.WriteLine(CodeDomUtility.ToString(ccu));

If we need to spruce up the tree we got, and maybe do a bit of search and replace on some code, we have the CodeDomVisitor:

C#
/// now let's take our code and modify it
CodeDomVisitor.Visit(ccu,(ctx) => {
    // we're looking for a method invocation
    var mi = ctx.Target as CodeMethodInvokeExpression;
    if (null != mi)
    {
        // ... calling WriteLine
        if ("WriteLine" == mi.Method?.MethodName)
        {
            // replace the passed in expression with "Hello world!"
            mi.Parameters.Clear();
            mi.Parameters.Add(new CodePrimitiveExpression("Hello world!"));
            // done after the first WriteLine so we cancel
            ctx.Cancel = true;
        }
    }
});

I didn't show it just above, because we didn't need it, but in cases where you need to replace your own target, CodeDomVisitor has the oh so creatively named ReplaceTarget() method which just takes your current context, and the new object. It uses reflection to work its "replace myself" magic, setting the appropriate parent's property or replacing the item in the parent's collection, as needed.

Now, we've briefly touched on CodeDomResolver but if you're going to be doing anything super fancy with the CodeDOM, like writing a compiler or an interpreter while using it to house your code, you can use this class to give you type and scope information about the tree. SlangPatcher uses this class extensively:

C#
// create one of these lil guys
var res = new CodeDomResolver();
// add our code to it
res.CompileUnits.Add(ccu);
// give it a chance to build its information over our code
res.Refresh();
CodeDomVisitor.Visit(ccu, (ctx) => {
    // for every expression...
    var expr = ctx.Target as CodeExpression;
    if(null!=expr)
    {
        // except method reference expressions...
        var mri = expr as CodeMethodReferenceExpression;
        if (null != mri)
            return;
        // get the expression type
        var type = res.TryGetTypeOfExpression(expr);
        // write it along with the expression itself
        Console.WriteLine(
            "Expression type {0}: {1} is {2}",
            expr.GetType().Name,
            CodeDomUtility.ToString(expr),
            null!=type?CodeDomUtility.ToString(type):"unresolvable");
    }
});

This outputs the following:

Expression type CodeObjectCreateExpression: new CodeDomResolver() is CodeDomResolver
Expression type CodeMethodInvokeExpression: System.Console.WriteLine("Hello world!") is void
Expression type CodeTypeReferenceExpression: System.Console is System.Console
Expression type CodePrimitiveExpression: "Hello world!" is string
Expression type CodeMethodInvokeExpression: 
  CD.SlangParser.ReadCompileUnitFrom("..\\..\\Demo1.cs") is System.CodeDom.CodeCompileUnit
Expression type CodeTypeReferenceExpression: CD.SlangParser is CD.SlangParser
Expression type CodePrimitiveExpression: "..\\..\\Demo1.cs" is string
Expression type CodeMethodInvokeExpression: 
  ccu.ReferencedAssemblies.Add("CodeDomGoKit.dll") is int
Expression type CodePropertyReferenceExpression: 
  ccu.ReferencedAssemblies is System.Collections.Specialized.StringCollection
Expression type CodeVariableReferenceExpression: ccu is System.CodeDom.CodeCompileUnit
Expression type CodePrimitiveExpression: "CodeDomGoKit.dll" is string
Expression type CodeMethodInvokeExpression: 
  ccu.ReferencedAssemblies.Add(typeof(CodeObject).Assembly.GetName().ToString()) is int
Expression type CodePropertyReferenceExpression: 
  ccu.ReferencedAssemblies is System.Collections.Specialized.StringCollection
Expression type CodeVariableReferenceExpression: ccu is System.CodeDom.CodeCompileUnit
Expression type CodeMethodInvokeExpression: 
  typeof(CodeObject).Assembly.GetName().ToString() is string
Expression type CodeMethodInvokeExpression: 
  typeof(CodeObject).Assembly.GetName() is System.Reflection.AssemblyName
Expression type CodePropertyReferenceExpression: 
  typeof(CodeObject).Assembly is System.Reflection.Assembly
Expression type CodeTypeOfExpression: typeof(CodeObject) is System.Type
Expression type CodeMethodInvokeExpression: CD.SlangPatcher.Patch(ccu) is unresolvable
Expression type CodeTypeReferenceExpression: CD.SlangPatcher is CD.SlangPatcher
Expression type CodeVariableReferenceExpression: ccu is System.CodeDom.CodeCompileUnit
Expression type CodeMethodInvokeExpression: 
  System.Console.WriteLine(CD.CodeDomUtility.ToString(ccu, "vb")) is void
Expression type CodeTypeReferenceExpression: System.Console is System.Console
Expression type CodeMethodInvokeExpression: CD.CodeDomUtility.ToString(ccu, "vb") is string
Expression type CodeTypeReferenceExpression: CD.CodeDomUtility is CD.CodeDomUtility
Expression type CodeVariableReferenceExpression: ccu is System.CodeDom.CodeCompileUnit
Expression type CodePrimitiveExpression: "vb" is string
Expression type CodeMethodInvokeExpression: 
  System.Console.WriteLine("Press any key...") is void
Expression type CodeTypeReferenceExpression: System.Console is System.Console
Expression type CodePrimitiveExpression: "Press any key..." is string
Expression type CodeMethodInvokeExpression: System.Console.ReadKey() is System.ConsoleKeyInfo
Expression type CodeTypeReferenceExpression: System.Console is System.Console
Expression type CodeMethodInvokeExpression: System.Console.Clear() is void
Expression type CodeTypeReferenceExpression: System.Console is System.Console
Expression type CodeVariableReferenceExpression: ccu is System.CodeDom.CodeCompileUnit

There was one expression it couldn't resolve. The reason it couldn't is because it's a call to a method SlangPatcher.Patch() that takes a params array argument, and I didn't write in the support code for making that work, which is harder than it sounds, even if it sounds hard.

We also need a way to resolve CodeTypeReference objects which is why the resolver has TryResolveType() which attempts to retrieve the type that the CodeTypeReference represents. This might be a runtime Type, or it might be a CodeTypeDeclaration depending on where it comes from. It will be null if it couldn't be resolved. Currently, there isn't an alternative that throws.

This is fine, but what if we need to pull members off of declared types? That's what CodeDomReflectionBinder is for. It works somewhat like Microsoft's DefaultBinder but it works with CodeDOM objects as well as runtime types. Basically, what this class does is member discovery and selection based on name and signature:

C#
// once again, we need one of these
var res = new CodeDomResolver();
res.CompileUnits.Add(ccu);
res.Refresh();
// we happen to know Program is the 1st type in the 2nd namespace*
var prg = ccu.Namespaces[1].Types[0];
// we need the scope where we're at
var scope = res.GetScope(prg);
// because our binder attaches to it
var binder = new CodeDomReflectionBinder(scope);
// now get all the methods with the specified name and flags
var members = binder.GetMethodGroup
              (prg, "TestOverload",BindingFlags.Public | BindingFlags.Static);
Console.WriteLine("There are {0} TestOverload method overloads.", members.Length);
// try selecting one that takes a single string parameter
var argTypes1 = new CodeTypeReference[] { new CodeTypeReference(typeof(string)) };
var m = binder.SelectMethod
        (BindingFlags.Public | BindingFlags.Static, members, argTypes1, null);
if (null != m)
{
    Console.WriteLine("Select TestOverload(string) returned:");
    _DumpMethod(m);
}
else
    Console.WriteLine("Unable to bind to method");
// try selecting one that takes a single it parameter
var argTypes2 = new CodeTypeReference[] { new CodeTypeReference(typeof(int)) };
m = binder.SelectMethod(BindingFlags.Public | BindingFlags.Static, members, argTypes2, null);
if (null != m)
{
    Console.WriteLine("Select TestOverload(int) returned:");
    _DumpMethod(m);
}
else
    Console.WriteLine("Unable to bind to method");

This outputs the following: (see Demo1.cs for reference)

There are 2 TestOverload method overloads.
Select TestOverload(string) returned:
public static int TestOverload(string val) {
    System.Console.WriteLine(val);
    return val.GetHashCode();
}

Select TestOverload(int) returned:
public static string TestOverload(int val) {
    System.Console.WriteLine(val);
    return val.ToString();
}

As you can see, we managed to select each of the method overloads based on signature.

You'll note the binder deals in object types a lot. This is due to the dual nature of the class. It operates on reflected runtime types and code objects, so in order to accept or return both types, they must be presented as object. You must test and cast for each to figure out what it is, which is all _DumpMethod() above was doing, though it wasn't shown. In this case, they both happen to be methods declared in code, but had our class derived from a runtime type with those methods on it, we would have received them as MethodInfo instances instead of CodeMemberMethod.

Finally, we've been using CodeDomUtility quite a bit, but we haven't really covered it. All it is a bunch of abbreviations for creating various CodeDOM constructs. If Slang is an automatic transmission, CodeDomUtility is the manual version. Harder to use, but more control:

C#
var state = CDU.FieldRef(CDU.This, "_state");
var current = CDU.FieldRef(CDU.This, "_current");
var line = CDU.FieldRef(CDU.This, "_line");
var column = CDU.FieldRef(CDU.This, "_column");
var position = CDU.FieldRef(CDU.This, "_position");

var enumInterface = CDU.Type(typeof(IEnumerator<>));
enumInterface.TypeArguments.Add("Token");
var result = CDU.Method(typeof(bool), "MoveNext", MemberAttributes.Public);
result.ImplementationTypes.Add(enumInterface);
result.ImplementationTypes.Add(typeof(System.Collections.IEnumerator));
result.Statements.AddRange(new CodeStatement[]
{
    CDU.If(CDU.Lt(state,CDU.Literal(_Enumerating)),
        CDU.If(CDU.Eq(state,CDU.Literal(_Disposed)),
            CDU.Call(CDU.TypeRef("TableTokenizerEnumerator"),"_ThrowDisposed")),
        CDU.If(CDU.Eq(state,CDU.Literal(_AfterEnd)),CDU.Return(CDU.False))),
    CDU.Let(current,CDU.Default("Token")),
    CDU.Let(CDU.FieldRef(current,"Line"),line),
    CDU.Let(CDU.FieldRef(current,"Column"),column),
    CDU.Let(CDU.FieldRef(current,"Position"),position),
    CDU.Call(CDU.FieldRef(CDU.This,"_buffer"),"Clear"),
    CDU.Let(CDU.FieldRef(current,"SymbolId"),CDU.Invoke(CDU.This,"_Lex")),
    CDU.Var(CDU.Type(typeof(bool)),"done",CDU.False),
    CDU.While(CDU.Not(CDU.VarRef("done")),
        CDU.Let(CDU.VarRef("done"),CDU.True),
        CDU.If(CDU.Lt(CDU.Literal(_ErrorSymbol),CDU.FieldRef(current,"SymbolId")),
            CDU.Var(typeof(string),"be",CDU.ArrIndexer(CDU.FieldRef(CDU.This,"_blockEnds"),
                    CDU.FieldRef(current,"SymbolId"))),
            CDU.If(CDU.Not(CDU.Invoke
                  (CDU.TypeRef(typeof(string)),"IsNullOrEmpty",CDU.VarRef("be"))),
                CDU.If(CDU.Not(CDU.Invoke(CDU.This,"_TryReadUntilBlockEnd",CDU.VarRef("be"))),
                    CDU.Let(CDU.FieldRef(current,"SymbolId"),CDU.Literal(_ErrorSymbol)))
                )),
                CDU.If(CDU.And(CDU.Lt(CDU.Literal(_ErrorSymbol),
                       CDU.FieldRef(current,"SymbolId")),CDU.NotEq
                       (CDU.Zero,CDU.BitwiseAnd(CDU.ArrIndexer
                       (CDU.FieldRef(CDU.This,"_nodeFlags"),
                        CDU.FieldRef(current,"SymbolId")),CDU.One))),
                CDU.Let(CDU.VarRef("done"),CDU.False),
                CDU.Let(CDU.FieldRef(current,"Line"),line),
                CDU.Let(CDU.FieldRef(current,"Column"),column),
                CDU.Let(CDU.FieldRef(current,"Position"),position),
                CDU.Call(CDU.FieldRef(CDU.This,"_buffer"),"Clear"),
                CDU.Let(CDU.FieldRef(current,"SymbolId"),CDU.Invoke(CDU.This,"_Lex")))
        ),
    CDU.Let(CDU.FieldRef(current,"Value"),CDU.Invoke(CDU.FieldRef
           (CDU.This,"_buffer"),"ToString")),
    CDU.If(CDU.Eq(CDU.FieldRef(current,"SymbolId"),CDU.Literal(_EosSymbol)),
        CDU.Let(state,CDU.Literal(_AfterEnd))),
    CDU.Return(CDU.NotEq(state,CDU.Literal(_AfterEnd))) });
return result;

If you squint at it, you can kind of see the CodeDOM constructs being created in there. Basically, what it's doing is implementing an IEnumerator<Token>.MoveNext() method.

There are two methods of general interest on this class, even if you never use it to generate code like the above: ToString() which renders nearly any CodeDOM object to a string, and Literal() which can serialize primitives, arrays, and objects** to CodeDOM structures. This is profoundly useful for generated table code, like DFA state tables and parse tables, which are typically stored in nested arrays. Your generator just needs to instantiate a "live" version of the array and then pass it to Literal(array) to get the CodeExpression object that can be used to create it again. This can become a static field initializer to store your pregenerated tables. Again, very useful and I use it in many of my code generation projects. If you're generating huge arrays, it's more efficient and easier to do it this way than using Slang.

(**with the appropriate InstanceDescriptor/TypeConverter setup)

Limitations

While Slang is getting better and better, it still has outstanding issues and is basically experimental. If it works for your code, then great. If not, then hopefully a later revision will fix that. The error handling needs a ton of work as well.

There are certain things Slang can never support like the post-fix increment and decrement operators, entry point methods with arguments, try-cast, instantiating multidimensional arrays or calling most operator overloads. These are based on limitations of the CodeDOM and so there's nothing I can do to remedy that.

The binder and resolver aren't complete yet, and may have trouble with nested types and certain uses of generics, and don't support binding with optional arguments or param arrays.

Further Reading

History

  • 11th December, 2019 - Initial submission

License

This article, along with any associated source code and files, is licensed under The MIT License