Introduction
I really hate the CodeDOM, and you should, too. Well, let's back up. I like the idea of the CodeDOM, but not the execution. It's clunky, kludgy and extremely limited. If I never cared about language agnostic code generation in .NET, I'd never use it.
However, I do care about those things. I also care about tapping into things like Microsoft's XML serialization code generator and mutilating its output into something less awful.
Furthermore, Roslyn looks to be limited in terms of its platform availability and its language support, which seems set in stone at VB and C#. Meanwhile, the F# team produced a CodeDOM provider, this time for their language, so it looks like the CodeDOM can still generate in languages Roslyn can't, and on devices where Roslyn can't currently operate. I might be wrong about the platforms, and maybe .NET 5 will change everything, but this is where we are now.
Here's a few of the things the CodeDOM always needed:
- A
Parse()
function. Gosh that would have been cool. - An
Eval()
function, even if it could only do certain expressions - A
Visit()
function, or otherwise a way to search the graph
I decided to finally do something about that, and with it comes a really quick and clean way to build and edit CodeDOM trees.
Roll up your sleeves, because we're about to flip the switch on this Frankenstein.
Building this Mess
The source distribution contains two binaries in the project directory. These are not used at runtime, and are used solely during a build-step to generate SlangTokenizer.cs. The tool is called Rolex, and I provide a link to the CodeProject article in the "Further Reading" section. These aren't strictly required in order for the project to build, but they allow edits to SlangTokenizer.rl to be reflected in the associated source code file. Without this build step, SlangTokenizer.cs will not change even if SlangTokenizer.rl does.
Using this Mess
First, let's talk about parsing. No, I didn't implement a parser for every language the CodeDOM has ever been rendered to. What I did instead, was make a parser for a CodeDOM compatible subset of C# that I call Slang. Code written in Slang will parse to a CodeDOM graph and thus render to any language for which there exists an adequate CodeDOM provider. Think of it as a subset of C# you can convert to other languages, like VB.
Slang basically eliminates most of the gruntwork of building the CodeDOM tree. This little bit of witchcraft will let you create an expression simply by:
CodeExpression expr = SlangParser.ParseExpression("(a + b * c - d) > e");
You can parse whole compile units too - entire files at a time, so you never need to deal with the CodeDOM directly in that case. So we've got our parsing.
Next, this is only practical for a limited number of uses, unless of course you want to make a calculator app but on our CodeDomResolver
object, we have an Evaluate()
method that takes some code in the form of a CodeExpression
and interprets it, giving you the result. It can compute, call methods, access properties, etc., but it has no notion of variables or arguments so it is not a full interpreter, although with some work, it could be. Combining with what we just saw from above, we can do something like this:
var res = new CodeDomResolver();
Console.WriteLine(res.Evaluate(SlangParser.ParseExpression("5*4-7")));
Putting together what we just went over, and adding a couple new tricks, the following program spits out 13, followed by its own code except in VB:
using CD;
using System;
using System.CodeDom;
namespace CodeDomDemo
{
class Program
{
static void Main(string[] args)
{
var res = new CodeDomResolver();
Console.WriteLine(res.Evaluate(SlangParser.ParseExpression("5*4-7")));
var ccu = SlangParser.ReadCompileUnitFrom("..\\..\\Program.cs");
ccu.ReferencedAssemblies.Add("CodeDomGoKit.dll");
ccu.ReferencedAssemblies.Add(typeof(CodeObject).Assembly.GetName().ToString());
SlangPatcher.Patch(ccu);
Console.WriteLine(CodeDomUtility.ToString(ccu, "vb"));
}
}
}
Like this:
Option Strict Off
Option Explicit On
Imports CD
Imports System
Imports System.CodeDom
Namespace CodeDomDemo
Friend Class Program
Public Shared Sub Main()
Dim res As CodeDomResolver = New CodeDomResolver()
System.Console.WriteLine(res.Evaluate(CD.SlangParser.ParseExpression("5*4-7")))
Dim ccu As System.CodeDom.CodeCompileUnit = _
CD.SlangParser.ReadCompileUnitFrom("..\..\Program.cs")
ccu.ReferencedAssemblies.Add("CodeDomGoKit.dll")
ccu.ReferencedAssemblies.Add(GetType(CodeObject).Assembly.GetName.ToString)
CD.SlangPatcher.Patch(ccu)
System.Console.WriteLine(CD.CodeDomUtility.ToString(ccu, "vb"))
End Sub
End Class
End Namespace
Now you have even less of an excuse to ever write anything in VB again. More importantly, now you have a breezy way to create CodeDOM graphs.
In my best Billy Mays voice, "but wait, there's more!" The above doesn't really demonstrate any dynamism, but obviously if we're generating code, we need the generation to be dynamic.
For doing this, I've provided a preprocessor that allows for simple T4 text template processing. That way, you can use T4 (ASP like) context switches <# #>
in order to dynamically render from a template. Your generation project can store that template as an embedded resource and use it to generate code. Here's an example of using the preprocessor on the file Test.tt. First, here's the template:
using System;
class Program
{
static void Main()
{
<#
for(var i = 0;i<5;++i) {
#>
Console.WriteLine("Hello World! #<#=i+1#>");
<#
}
#>
}
}
After preprocessing, it yields this:
using System;
class Program
{
static void Main()
{
Console.WriteLine("Hello World! #1");
Console.WriteLine("Hello World! #2");
Console.WriteLine("Hello World! #3");
Console.WriteLine("Hello World! #4");
Console.WriteLine("Hello World! #5");
}
}
Note that the output's formatting does not matter. Watch what happens when we run this through Slang and output to C#:
using System;
internal class Program {
public static void Main() {
System.Console.WriteLine("Hello World! #1");
System.Console.WriteLine("Hello World! #2");
System.Console.WriteLine("Hello World! #3");
System.Console.WriteLine("Hello World! #4");
System.Console.WriteLine("Hello World! #5");
}
}
Not only is the formatting better, but it has made some changes to the program. For one, our types have been fully qualified. For another, our Main()
method was made public! That's because it was detected as an entry point method so SlangPatcher.Patch()
winds up representing it with a CodeEntryPointMethod
. When the CodeDOM renders this class, it always sets the visibility to public. We have no direct control over that. The CodeDOM also does not support arguments or return values for entry point methods, unfortunately.
Anyway, here's the code to make that happen, assuming Test.tt is in our project directory:
var sw = new StringWriter();
using (var sr = new StreamReader(@"..\..\Test.tt"))
SlangPreprocessor.Preprocess(sr, sw);
var ccu = SlangParser.ParseCompileUnit(sw.ToString());
SlangPatcher.Patch(ccu);
Console.WriteLine(CodeDomUtility.ToString(ccu));
If we need to spruce up the tree we got, and maybe do a bit of search and replace on some code, we have the CodeDomVisitor
:
CodeDomVisitor.Visit(ccu,(ctx) => {
var mi = ctx.Target as CodeMethodInvokeExpression;
if (null != mi)
{
if ("WriteLine" == mi.Method?.MethodName)
{
mi.Parameters.Clear();
mi.Parameters.Add(new CodePrimitiveExpression("Hello world!"));
ctx.Cancel = true;
}
}
});
I didn't show it just above, because we didn't need it, but in cases where you need to replace your own target, CodeDomVisitor
has the oh so creatively named ReplaceTarget()
method which just takes your current context, and the new object. It uses reflection to work its "replace myself" magic, setting the appropriate parent's property or replacing the item in the parent's collection, as needed.
Now, we've briefly touched on CodeDomResolver
but if you're going to be doing anything super fancy with the CodeDOM, like writing a compiler or an interpreter while using it to house your code, you can use this class to give you type and scope information about the tree. SlangPatcher
uses this class extensively:
var res = new CodeDomResolver();
res.CompileUnits.Add(ccu);
res.Refresh();
CodeDomVisitor.Visit(ccu, (ctx) => {
var expr = ctx.Target as CodeExpression;
if(null!=expr)
{
var mri = expr as CodeMethodReferenceExpression;
if (null != mri)
return;
var type = res.TryGetTypeOfExpression(expr);
Console.WriteLine(
"Expression type {0}: {1} is {2}",
expr.GetType().Name,
CodeDomUtility.ToString(expr),
null!=type?CodeDomUtility.ToString(type):"unresolvable");
}
});
This outputs the following:
Expression type CodeObjectCreateExpression: new CodeDomResolver() is CodeDomResolver
Expression type CodeMethodInvokeExpression: System.Console.WriteLine("Hello world!") is void
Expression type CodeTypeReferenceExpression: System.Console is System.Console
Expression type CodePrimitiveExpression: "Hello world!" is string
Expression type CodeMethodInvokeExpression:
CD.SlangParser.ReadCompileUnitFrom("..\\..\\Demo1.cs") is System.CodeDom.CodeCompileUnit
Expression type CodeTypeReferenceExpression: CD.SlangParser is CD.SlangParser
Expression type CodePrimitiveExpression: "..\\..\\Demo1.cs" is string
Expression type CodeMethodInvokeExpression:
ccu.ReferencedAssemblies.Add("CodeDomGoKit.dll") is int
Expression type CodePropertyReferenceExpression:
ccu.ReferencedAssemblies is System.Collections.Specialized.StringCollection
Expression type CodeVariableReferenceExpression: ccu is System.CodeDom.CodeCompileUnit
Expression type CodePrimitiveExpression: "CodeDomGoKit.dll" is string
Expression type CodeMethodInvokeExpression:
ccu.ReferencedAssemblies.Add(typeof(CodeObject).Assembly.GetName().ToString()) is int
Expression type CodePropertyReferenceExpression:
ccu.ReferencedAssemblies is System.Collections.Specialized.StringCollection
Expression type CodeVariableReferenceExpression: ccu is System.CodeDom.CodeCompileUnit
Expression type CodeMethodInvokeExpression:
typeof(CodeObject).Assembly.GetName().ToString() is string
Expression type CodeMethodInvokeExpression:
typeof(CodeObject).Assembly.GetName() is System.Reflection.AssemblyName
Expression type CodePropertyReferenceExpression:
typeof(CodeObject).Assembly is System.Reflection.Assembly
Expression type CodeTypeOfExpression: typeof(CodeObject) is System.Type
Expression type CodeMethodInvokeExpression: CD.SlangPatcher.Patch(ccu) is unresolvable
Expression type CodeTypeReferenceExpression: CD.SlangPatcher is CD.SlangPatcher
Expression type CodeVariableReferenceExpression: ccu is System.CodeDom.CodeCompileUnit
Expression type CodeMethodInvokeExpression:
System.Console.WriteLine(CD.CodeDomUtility.ToString(ccu, "vb")) is void
Expression type CodeTypeReferenceExpression: System.Console is System.Console
Expression type CodeMethodInvokeExpression: CD.CodeDomUtility.ToString(ccu, "vb") is string
Expression type CodeTypeReferenceExpression: CD.CodeDomUtility is CD.CodeDomUtility
Expression type CodeVariableReferenceExpression: ccu is System.CodeDom.CodeCompileUnit
Expression type CodePrimitiveExpression: "vb" is string
Expression type CodeMethodInvokeExpression:
System.Console.WriteLine("Press any key...") is void
Expression type CodeTypeReferenceExpression: System.Console is System.Console
Expression type CodePrimitiveExpression: "Press any key..." is string
Expression type CodeMethodInvokeExpression: System.Console.ReadKey() is System.ConsoleKeyInfo
Expression type CodeTypeReferenceExpression: System.Console is System.Console
Expression type CodeMethodInvokeExpression: System.Console.Clear() is void
Expression type CodeTypeReferenceExpression: System.Console is System.Console
Expression type CodeVariableReferenceExpression: ccu is System.CodeDom.CodeCompileUnit
There was one expression it couldn't resolve. The reason it couldn't is because it's a call to a method SlangPatcher.Patch()
that takes a params
array argument, and I didn't write in the support code for making that work, which is harder than it sounds, even if it sounds hard.
We also need a way to resolve CodeTypeReference
objects which is why the resolver has TryResolveType()
which attempts to retrieve the type that the CodeTypeReference
represents. This might be a runtime Type
, or it might be a CodeTypeDeclaration
depending on where it comes from. It will be null if it couldn't be resolved. Currently, there isn't an alternative that throws.
This is fine, but what if we need to pull members off of declared types? That's what CodeDomReflectionBinder
is for. It works somewhat like Microsoft's DefaultBinder
but it works with CodeDOM objects as well as runtime types. Basically, what this class does is member discovery and selection based on name and signature:
var res = new CodeDomResolver();
res.CompileUnits.Add(ccu);
res.Refresh();
var prg = ccu.Namespaces[1].Types[0];
var scope = res.GetScope(prg);
var binder = new CodeDomReflectionBinder(scope);
var members = binder.GetMethodGroup
(prg, "TestOverload",BindingFlags.Public | BindingFlags.Static);
Console.WriteLine("There are {0} TestOverload method overloads.", members.Length);
var argTypes1 = new CodeTypeReference[] { new CodeTypeReference(typeof(string)) };
var m = binder.SelectMethod
(BindingFlags.Public | BindingFlags.Static, members, argTypes1, null);
if (null != m)
{
Console.WriteLine("Select TestOverload(string) returned:");
_DumpMethod(m);
}
else
Console.WriteLine("Unable to bind to method");
var argTypes2 = new CodeTypeReference[] { new CodeTypeReference(typeof(int)) };
m = binder.SelectMethod(BindingFlags.Public | BindingFlags.Static, members, argTypes2, null);
if (null != m)
{
Console.WriteLine("Select TestOverload(int) returned:");
_DumpMethod(m);
}
else
Console.WriteLine("Unable to bind to method");
This outputs the following: (see Demo1.cs for reference)
There are 2 TestOverload method overloads.
Select TestOverload(string) returned:
public static int TestOverload(string val) {
System.Console.WriteLine(val);
return val.GetHashCode();
}
Select TestOverload(int) returned:
public static string TestOverload(int val) {
System.Console.WriteLine(val);
return val.ToString();
}
As you can see, we managed to select each of the method overloads based on signature.
You'll note the binder deals in object
types a lot. This is due to the dual nature of the class. It operates on reflected runtime types and code objects, so in order to accept or return both types, they must be presented as object
. You must test and cast for each to figure out what it is, which is all _DumpMethod()
above was doing, though it wasn't shown. In this case, they both happen to be methods declared in code, but had our class derived from a runtime type with those methods on it, we would have received them as MethodInfo
instances instead of CodeMemberMethod
.
Finally, we've been using CodeDomUtility
quite a bit, but we haven't really covered it. All it is a bunch of abbreviations for creating various CodeDOM constructs. If Slang is an automatic transmission, CodeDomUtility is the manual version. Harder to use, but more control:
var state = CDU.FieldRef(CDU.This, "_state");
var current = CDU.FieldRef(CDU.This, "_current");
var line = CDU.FieldRef(CDU.This, "_line");
var column = CDU.FieldRef(CDU.This, "_column");
var position = CDU.FieldRef(CDU.This, "_position");
var enumInterface = CDU.Type(typeof(IEnumerator<>));
enumInterface.TypeArguments.Add("Token");
var result = CDU.Method(typeof(bool), "MoveNext", MemberAttributes.Public);
result.ImplementationTypes.Add(enumInterface);
result.ImplementationTypes.Add(typeof(System.Collections.IEnumerator));
result.Statements.AddRange(new CodeStatement[]
{
CDU.If(CDU.Lt(state,CDU.Literal(_Enumerating)),
CDU.If(CDU.Eq(state,CDU.Literal(_Disposed)),
CDU.Call(CDU.TypeRef("TableTokenizerEnumerator"),"_ThrowDisposed")),
CDU.If(CDU.Eq(state,CDU.Literal(_AfterEnd)),CDU.Return(CDU.False))),
CDU.Let(current,CDU.Default("Token")),
CDU.Let(CDU.FieldRef(current,"Line"),line),
CDU.Let(CDU.FieldRef(current,"Column"),column),
CDU.Let(CDU.FieldRef(current,"Position"),position),
CDU.Call(CDU.FieldRef(CDU.This,"_buffer"),"Clear"),
CDU.Let(CDU.FieldRef(current,"SymbolId"),CDU.Invoke(CDU.This,"_Lex")),
CDU.Var(CDU.Type(typeof(bool)),"done",CDU.False),
CDU.While(CDU.Not(CDU.VarRef("done")),
CDU.Let(CDU.VarRef("done"),CDU.True),
CDU.If(CDU.Lt(CDU.Literal(_ErrorSymbol),CDU.FieldRef(current,"SymbolId")),
CDU.Var(typeof(string),"be",CDU.ArrIndexer(CDU.FieldRef(CDU.This,"_blockEnds"),
CDU.FieldRef(current,"SymbolId"))),
CDU.If(CDU.Not(CDU.Invoke
(CDU.TypeRef(typeof(string)),"IsNullOrEmpty",CDU.VarRef("be"))),
CDU.If(CDU.Not(CDU.Invoke(CDU.This,"_TryReadUntilBlockEnd",CDU.VarRef("be"))),
CDU.Let(CDU.FieldRef(current,"SymbolId"),CDU.Literal(_ErrorSymbol)))
)),
CDU.If(CDU.And(CDU.Lt(CDU.Literal(_ErrorSymbol),
CDU.FieldRef(current,"SymbolId")),CDU.NotEq
(CDU.Zero,CDU.BitwiseAnd(CDU.ArrIndexer
(CDU.FieldRef(CDU.This,"_nodeFlags"),
CDU.FieldRef(current,"SymbolId")),CDU.One))),
CDU.Let(CDU.VarRef("done"),CDU.False),
CDU.Let(CDU.FieldRef(current,"Line"),line),
CDU.Let(CDU.FieldRef(current,"Column"),column),
CDU.Let(CDU.FieldRef(current,"Position"),position),
CDU.Call(CDU.FieldRef(CDU.This,"_buffer"),"Clear"),
CDU.Let(CDU.FieldRef(current,"SymbolId"),CDU.Invoke(CDU.This,"_Lex")))
),
CDU.Let(CDU.FieldRef(current,"Value"),CDU.Invoke(CDU.FieldRef
(CDU.This,"_buffer"),"ToString")),
CDU.If(CDU.Eq(CDU.FieldRef(current,"SymbolId"),CDU.Literal(_EosSymbol)),
CDU.Let(state,CDU.Literal(_AfterEnd))),
CDU.Return(CDU.NotEq(state,CDU.Literal(_AfterEnd))) });
return result;
If you squint at it, you can kind of see the CodeDOM constructs being created in there. Basically, what it's doing is implementing an IEnumerator<Token>.MoveNext()
method.
There are two methods of general interest on this class, even if you never use it to generate code like the above: ToString()
which renders nearly any CodeDOM object to a string, and Literal()
which can serialize primitives, arrays, and objects** to CodeDOM structures. This is profoundly useful for generated table code, like DFA state tables and parse tables, which are typically stored in nested arrays. Your generator just needs to instantiate a "live" version of the array and then pass it to Literal(array)
to get the CodeExpression
object that can be used to create it again. This can become a static field initializer to store your pregenerated tables. Again, very useful and I use it in many of my code generation projects. If you're generating huge arrays, it's more efficient and easier to do it this way than using Slang.
(**with the appropriate InstanceDescriptor/TypeConverter
setup)
Limitations
While Slang is getting better and better, it still has outstanding issues and is basically experimental. If it works for your code, then great. If not, then hopefully a later revision will fix that. The error handling needs a ton of work as well.
There are certain things Slang can never support like the post-fix increment and decrement operators, entry point methods with arguments, try-cast, instantiating multidimensional arrays or calling most operator overloads. These are based on limitations of the CodeDOM and so there's nothing I can do to remedy that.
The binder and resolver aren't complete yet, and may have trouble with nested types and certain uses of generics, and don't support binding with optional arguments or param arrays.
Further Reading
History
- 11th December, 2019 - Initial submission