Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

Introducing TaHoGen – An Open Source Implementation of a CodeSmith-Style Code Generation Engine

0.00/5 (No votes)
14 Mar 2005 3  
Looking for multiple file output support from a single template in one pass? Then look no further.

Introduction

So, what is TaHoGen, you may ask?

TaHoGen is a 100% free, Open Source code generation engine (licensed under the GPL) with the following features:

  • CodeSmith Template Compatibility: The parser itself can parse almost any CodeSmith template. The only thing it cannot parse (for legal reasons) is anything specific to Eric Smith’s object model (such as SchemaExplorer and the like).
  • Multiple Language Support. Like CodeSmith, TaHoGen will work with any CodeDom language such as C#, VB.NET, J#, and JScript.
  • CodeBehind in Multiple Languages. You can write a template in VB.NET, and include codebehind files written in other languages such as C#. The template compiler will automatically compile the C# file, and include it in the assembly of the compiled VB.NET template.
  • Native SubTemplate Support. This allows you to do things such as compile templates within templates, run the template, and then pass that newly compiled template to another template so that it can reuse it again. You can even use a template to generate another template which, in turn, can generate another template. Anyway, you get the idea.
  • Single Template, Multiple Outputs. You can send the output of a single template to one or more of the following targets at the same time:
    • Console Output (StdOut)
    • Debug Window
    • Trace Window
    • Clipboard (experimental)
    • File Output
    • Stream Output
    • String Output (send the results to a target string)
  • Multiple Templates, One Compiled Assembly. TaHoGen allows you to take multiple template files and compile them into a single assembly. You can even name your templates and separate them into different namespaces within the same assembly, if you wish.
  • Composite Templates. Any template can be “chained” together with another template at runtime to form even more complex templates. For example, you can combine a class generator template with an SQL template to generate both your business classes and database in one pass.
  • Shared Property Sets. This allows you to set a group of properties on a single object once, and then pass that property object around to all of your templates so that they can all read from that property set. You can even use this property set to pass subtemplates to other templates!
  • Very, Very Fast. The parser backend is written using C++ expression templates, making parsing time extremely fast. In addition, both the the parser and the CodeDom compilers are cached, meaning that no single, unique template will be parsed and compiled more than once—making the template build time even faster.

Like CodeSmith, TaHoGen is a code generator generator. It parses the text from template files and converts that text into a CodeDom graph which, in turn, is compiled (using the .NET language of your choice) into a customized text generator which outputs exactly the text that you specify.

What TaHoGen is not

Unlike its commercial counterpart, TaHoGen doesn’t come with a fancy GUI out of the box. In fact, in its rawest form, it doesn’t even have a GUI at all. In this first article, I will show you how to use the engine itself. In the next article in this series, we’ll create a simple GUI for running our templates, and we’ll even integrate it into the VS.NET IDE so that we can send the output of the template directly to the code window—but that’s for later. For now, sit tight, and I promise that I will make this current article worth your while.

Background

At first, I was fascinated with how Eric Smith (author of CodeSmith) was able to parse ASP-style text files and convert them into templates. I wanted to learn how to do this myself without having to learn regular expressions, so I looked at a few alternatives such as ANTLR, Bison/Lex, and GoldParser. For me, ANTLR-generated code was a nightmare to behold, and Bison/Lex was a little too immortally cryptic for my mere mortal reach. GoldParser, on the other hand, had problems parsing context-sensitive grammars such as ASP.NET, so that was out of the question as well. I needed something that would allow me to build an ASP tag parser incrementally, while at the same time affording me the speed of C++ generated native code. That’s where The Spirit Parser came in handy. After tinkering with it for nearly three months, I finally got the right BNF grammar to parse most CodeSmith template files without a hitch. Once the grammar was done, like any other curious coder with a lot of time on his hands, I figured, “If I can write a parser for this, why not go all the way and write my own implementation from scratch?”

..And so I did. Now, after a year and nearly four rewrites, it is finally ready for public consumption! :)

“I still don’t get it – would you write another engine if CodeSmith is free anyway?”

The main reason why I spent this much time writing another engine is that I felt that CodeSmith, while extremely useful, didn’t fit all of my needs. Like many users, I wanted to send the output of a single template to multiple locations at once, and I also wanted to generate multiple files from a single template without resorting to a few hacks in the template source (no offense, Eric). Secondly, I wanted to use the engine in my own personal projects, but I certainly didn’t have the funds to pay for a license. Third, even if I had the money, I wasn’t going to pay for a product that didn’t have all the features that I was looking for. Lastly, I wanted to write something that would be useful for the developer community at large as a way of saying “Thank you” for all of the help they’ve given me in the past, especially at CodeProject. For me, this was a labor of love, and I hope you enjoy it as much as I had fun coding it!

Using the code

Since this is only an introductory article (and my very first article ever!), I’ll show you just enough to get you started in using this library. Hopefully (once I get more time), we can delve further into the internals of this library in future articles of this series. For now, the following sections will have to suffice. This article will be divided into two parts—first, I’ll show you what the template code looks like, and second, I will show you how you can use the engine in your own applications.

Minimum Requirements for Building

You are going to need:

  • The Boost Libraries and the Spirit Parser Framework. You can get them at this link.
  • Visual Studio 2003. This project makes heavy use of expression templates for the parser code, so you’re going to need VC++ 7.1 or higher to handle the templates used by the Spirit Library. In addition, you’ll also need a C# compiler to build the portion of the library that is written in .NET.
  • Windows 2000/XP. TaHoGen uses COM Interop to bridge the gap between native C++ code and .NET, so for now, this implementation will only run on the standard Microsoft version of the .NET Framework. (I haven’t tested it on Mono just yet, but if you get it to work on that platform, let me know!)
  • A lot of patience. The parser uses some pretty complicated C++ templates from the Spirit library, which significantly slows down build time. Be prepared to wait.

What You Need to Know Before We Begin

This article assumes that you’ve had some experience with writing CodeSmith templates, and that you are familiar with coding in either VB.NET, or C#. With that aside, let’s get started.

Template Language Differences from CodeSmith

The differences between the template languages of TaHoGen and CodeSmith are minimal. For the most part, they are identical, with a few notable exceptions. Let’s start with a simple example – a property get/set generator for C#:

<%@ CodeTemplate ClassName="PropertyGenerator" 
   Namespace=”MyTemplateNamespace” Language="C#" TargetLanguage="C#"%>
<%@ Property Name="Name" Type="System.String" Category="Options" %>
<%@ Property Name="Type" Type="System.String" Category="Options" %>
<%@ Property Name="ReadOnly" Type="System.Boolean" 
                Default="true" Category="Options" %>
public <%=Type%> <%=Name%>
{
 get { return _<%=Name.Substring(0, 1).ToLower() + 
       Name.Substring(1)%>; }<%if (!ReadOnly) {%>
 set { _<%=Name.Substring(0, 1).ToLower() + 
       Name.Substring(1)%> = value; }<%}%>
}

For those of us who are familiar with CodeSmith’s template syntax (or ASP.NET), this is self explanatory. The template above reads two strings, Name and Type, respectively, and generates text similar to the following:

public string MyProperty
{
 get { return _myProperty; }
 set { _myProperty = value; }
}

The difference between the two syntaxes is found in the following line:

<%@ CodeTemplate ClassName="PropertyGenerator" 
  Namespace=”MyTemplateNamespace” Language="C#" TargetLanguage="C#"%>

The ClassName attribute sets the name of the generated template to “PropertyGenerator”, while the Namespace attribute assigns that template to a namespace named “MyTemplateNamespace”. TaHoGen will then generate a template class that looks like this:

namespace MyTemplateNamespace
{
    using System;
    using System.ComponentModel;
    using System.Data;
    using System.Diagnostics;
    using System.Drawing.Design;
    using System.IO;
    using TaHoGen.Generators;
    using TaHoGen.Targets;
    
    public class PropertyGenerator : TaHoGen.Generators.TextGenerator
    {
        private string _name;
        private string _type;
        private bool _readOnly;
        public PropertyGenerator()
        {
        }
        public PropertyGenerator(TaHoGen.PropertyBag propertyBag) : 
                this()
        {
            this.LoadProperties(propertyBag);
        }
        [Category("Options")]
        public string Name
        {
            get
            {
                return this._name;
            }
            set
            {
                this._name = value;
            }
        }
        [Category("Options")]
        public string Type
        {
            get
            {
                return this._type;
            }
            set
            {
                this._type = value;
            }
        }
        [Category("Options")]
        public bool ReadOnly
        {
            get
            {
                return this._readOnly;
            }
            set
            {
                this._readOnly = value;
            }
        }
        protected override void GenerateImpl(System.IO.TextWriter writer)
        {
            writer.Write("\r\npublic ");
            writer.Write(Type);
            writer.Write(" ");
            writer.Write(Name);
            writer.Write("\r\n\t\t{\r\n\t\t\tget { return _");
            writer.Write(Name.Substring(0, 1).ToLower() + Name.Substring(1));
            writer.Write("; }");
            if (!ReadOnly) {
            writer.Write("\r\n\t\t\tset { _");
            writer.Write(Name.Substring(0, 1).ToLower() + Name.Substring(1));
            writer.Write(" = value; }");
            }
            writer.Write("\r\n\t\t}\r\n\t\t");
        }
    }
}

As you can see from the example above, setting the name of the template and assigning it to a particular namespace is trivial. No need to worry if you decide not to define a class name and a namespace for your template—if you omit either one of these attributes, then the default values will be entered for you.

CodeBehind in Other Languages

The syntax for including a codebehind file in TaHoGen is the same as the one in CodeSmith. The only difference here is that TaHoGen supports building codebehind files in languages other than the one currently being used in the template, as shown in this example:

<%@ CodeTemplate ClassName="MyTemplate" 
        Namespace=”MyNamespace” Language="C#" TargetLanguage="C#"%>
<%@ Assembly Src=”myclass1.vb” %> <%---Build a VB.NET source file--%>
<%@ Assembly Src=”myclass2.js” %> <%---Build a JScript source file--%>

<%---The rest of your template would go here--%>

SubTemplate Support

Compiling Templates within Templates at Build Time

This library allows you to compile templates within templates and have those externally compiled templates automatically included as part of the main assembly of your template. The directives:

<%@ Compile Template="Property.tgt" outputfilename="MyExternalAssembly.dll" %>
<%@ Import Namespace=”MyTemplateNamespace”%>

…will compile the example above and include the “MyTemplateNamespace” namespace as part of the build output, and include the namespace of that compiled assembly (“MyExternalAssembly.dll”) as part of your main template.

Compiling Templates (within Templates) at Run Time

You can also build templates at runtime from within your application (and even from within your own templates). Here is a complete example in VB.NET:

Imports System
Imports System.Diagnostics
Imports System.IO
Imports System.Reflection
Imports TaHoGen
Imports TaHoGen.Targets
Module Module1

    Sub Main()
        Dim text As String

        'Read the contents of the template
        Dim reader As New StreamReader("property.tgt")
        text = reader.ReadToEnd()

        'Compile it into a single assembly
        Dim templateAssembly As [Assembly] = TemplateCompiler.Compile(text)

        'Did it succeed?
        If templateAssembly Is Nothing Then
            Console.WriteLine("Template Compilation Failed!")
            Return
        End If

        'There's only going to be one template in this assembly
        'so it's safe to return just the first one 
        Dim templateType As Type = templateAssembly.GetTypes()(0)

        'This *should* work
        Debug.Assert(Not (templateType Is Nothing))

        'Set the properties for the template
        Dim properties As New PropertyTable

        properties("Type") = "string"
        properties("Name") = "MyProperty"
        properties("ReadOnly") = False

        'Instantiate the template and assign the properties at the same time
        Dim args As Object() = {properties}
        Dim generator As ITextGenerator = _
          CType(Activator.CreateInstance(templateType, args), ITextGenerator)

        'We should have a valid generator at this point
        Debug.Assert(Not (generator Is Nothing))

        
        'Write to the console
        Dim output As New ConsoleTarget

        'Attach the output of the generator to the console
        output.Attach(generator)

        'Generate the output itself
        output.Write()
    End Sub
End Module

Most of the code above is pretty straightforward. The call to TemplateCompiler.Compile() builds the template into an assembly, and once that assembly is compiled, the only thing left to do is to create an instance of that template and run it. Notice that although the template itself was instantiated, the code never calls the template directly to generate the text:

        Dim generator As ITextGenerator = _
           CType(Activator.CreateInstance(templateType), ITextGenerator)

        'Write to the console
        Dim output As New ConsoleTarget

        'Attach the output of the generator to the console
        output.Attach(generator)

        'Generate the output itself
        output.Write()

Instead, the output of the template is attached to the console, and the template executes as soon as Output.Write() is invoked. The template itself never directly knows about which target it will be writing to. In short, the output targets and the templates never directly refer to each other outside of their respective interfaces. This approach makes it very easy to attach many outputs to the same template, and vice-versa, and for me, it has come in handy on many occasions.

SubTemplates as Template Properties

There might be times where you would want to leave a portion of your template open, or define a region in that template where you could insert the output of another template. Here’s one way you can do it:

<%@ CodeTemplate ClassName="SubTemplateSample" Language="C#" TargetLanguage="C#"%>
<%@ Property Name="FirstRegion" Type="ITextGenerator" Category="SubTemplates" %>
<%@ Property Name="SecondRegion" Type="ITextGenerator" Category="SubTemplates" %>

#region This is the First Region
<%=RunTemplate(FirstRegion)%>
#endregion

#region This is the Second Region
<%=RunTemplate(SecondRegion)%>
#endregion

Now, you might be wondering how to assign a template to another property of another template. We’ll handle that in the next section. For now, you can think of template properties as being no different from other properties based on primitive types.

Executing Property SubTemplates from within Your Template

Notice that both the FirstRegion and SecondRegion subtemplates from the examples above are template properties of the type ITextGenerator. (This is the base interface that all templates must implement in order to be recognized as templates). In this case, we’re going to use the ITextGenerator properties to act as stubs. Each call to the RunTemplate() method checks if there is a template attached to the current property, and if there is a template attached, it will run that template and insert its output into that particular section of the template. Otherwise, that section will remain blank as if the template never existed.

Engine Usage

Now that we’re done with some of the basics of the template syntax, I’m going to show you how to integrate the engine into your own application. Once you’ve built the entire solution, you’re going to need to reference the TaHoGen.Core.dll assembly in your project in order to use the engine.

A Few Notes before We Begin

If you are going to build or rebuild the binaries for the engine, make sure you have the latest version of the Boost libraries. (At the time of this writing, TaHoGen uses Boost v1.31). You’re going to need it to compile the template parser. Since the template parser is written in C++/COM, you also might need to run regsvr32.exe on CodeSmithParser.dll to register it with COM Interop once it has been built. Anyway, let’s go on to the discussion.

The TemplateCompiler Class

This is the class that does most of the work. It has a single static method, Compile(), which has the following overloads:

// Methods for compiling a single template
public static Assembly Compile(string text)
public static Assembly Compile(string text, bool addDebugSymbols)
public static Assembly Compile(string text, 
       string outputFileName, bool addDebugSymbols)
public static Assembly Compile(string text, string outputFileName, 
       bool addDebugSymbols, ICompilerCallback compilerCallback)

// Methods for compiling multiple templates into one assembly
public static Assembly Compile(string[] fileList)
public static Assembly Compile(string[] fileList, string outputFileName)
public static Assembly Compile(string[] fileList, 
       string outputFileName, bool addDebugSymbols)
public static Assembly Compile(string[] fileList, 
       string outputFileName, bool addDebugSymbols, 
       ICompilerCallback compilerCallback)

Most of the parameters above are self-explanatory, with the exception of the last parameter, compilerCallback. The Template compiler uses ICompilerCallback interface to report the results of a compilation attempt. Its interface is defined as follows:

public interface ICompilerCallback
{
  void BeginCompile(CompilerArgs args);
  void EndCompile(CompilerArgs args);
}

The CompilerArgs class, in turn, is defined as:

public class CompilerArgs 
{
    private string _source;
    private CompilerErrorCollection _errors = new CompilerErrorCollection();
    public CompilerArgs(string source, CompilerErrorCollection errors)
    {
        _source = source;

        if (errors != null)
            _errors.AddRange(errors);
    }
    public string CompiledCode
    {
        get { return _source; }
    }
    public CompilerErrorCollection Errors
    {
        get { return _errors; }
    }
}

This interface can come in handy if you want to implement a GUI that displays the result of a compilation. For now, since we aren’t going to make a GUI in this article, we can safely ignore it.

Compiling a Single Template into an Assembly

Building a template into a compiled assembly is easy. A single call to TemplateCompiler.Compile() does the job:

Dim templateAssembly As [Assembly] = TemplateCompiler.Compile(text)

…where text is a string variable that holds the contents of a single template file.

Compiling Multiple Template Files into a Single Compiled Assembly

Combining multiple template files into a single assembly is just as easy as compiling a single template:

'Define the list of files
Dim files as String() = {“template1.tgt”, “template2.tgt”}

'Build it! 
Dim combinedAssembly As [Assembly] = TemplateCompiler.Compile(files)

Note that all of the template files in that list of files must share the same language. For example, if one of the templates is written in C#, then the rest of the templates have to be written in C# as well.

Shared Properties

As I mentioned earlier, TaHoGen allows you to set multiple property values onto a single, shared object that you can pass to multiple templates so that you will only have to set the properties for all of the templates once. For example, here is how you would insert the property generator template into the FirstRegion from the previous example above:

PropertyTable properties = new PropertyTable();
properties["FirstRegion"] = new PropertyGenerator();

// The sample objects will automatically be initialized with
// the new property generator with the same values.
// Notice that we only have to set the properties once. Cool, eh? 
SubTemplateSample sample1 = new SubTemplateSample(properties);
SubTemplateSample sample2 = new SubTemplateSample(properties);

. . .

// (This is where you would tell the sample templates to output the code, etc)

Alternatively, you can use the LoadProperties() method to assign the property values in the same manner:

…
sample1.LoadProperties(properties);
sample2.LoadProperties(properties);
…

Notice that templates sample1 and sample2 share the same set of properties. I tried to keep the design of the library as simple as possible, and hopefully this feature will help keep things simple.

(Note: The example above assumes that you’ve compiled both the subtemplate sample and the property generator templates into their respective assemblies and referenced them in your project. If you have built your templates from within another template, you will have to instantiate those templates through reflection, similar to what we did above in the complete VB.NET example.)

Another important thing to note is that the PropertyTable is type safe. It will only assign a property value to a template if the property type of that template and the value of whatever is stored in that property table are compatible with each other. Otherwise, the current value stored in the property table will be ignored. (You can also connect a property table to a PropertyGrid control and edit its properties directly—but I’ll save that little detail for the next article.)

Handling Changes in Property Table Values

At this point, you might be wondering what happens if you change a value in a property table object that is shared among two or more templates. If a property value changes in a PropertyTable, does that mean that you have to assign that same PropertyTable to all of those templates all over again? Not at all! Once you change a value on that particular PropertyTable object, that same change will be propagated to all templates that are attached to that object. You only have to set that property value once.

Single Template, Multiple Outputs

There might be times when you might want to send the output of a template to more than one location. Suppose that you wanted to send the output of the PropertyGenerator template to the debug window, the console, and an external file all at the same time. This is how you do it:

// Set the property values as we did before
PropertyTable properties = new PropertyTable();
properties[“Type”] = “string”;
properties[“Name”] = “MyProperty”;
properties[“ReadOnly”] = false;

// Instantiate the property generator and assign the property values
PropertyGenerator generator = new PropertyGenerator(properties);

DebugTarget debugOut = new DebugTarget();
ConsoleTarget consoleOut = new ConsoleTarget();
FileTarget fileOut = new FileTarget(“output.txt”, FileMode.Create);

// Connect the generator to its respective outputs
debugOut += generator;
consoleOut += generator;
fileOut += generator;

// Generate the output, and we’re done
debugOut.Write();
consoleOut.Write();
fileOut.Write();

…It doesn’t get any easier than that. :)

Chaining Templates Together

You can also combine templates together to form more complex templates. Here is a trivial, but useful example with the built-in SimpleTextGenerator template:

SimpleTextGenerator hello = new SimpleTextGenerator(“Hello, ”);
SimpleTextGenerator world = new SimpleTextGenerator(“World!”);

TextGenerator helloWorld = hello + world;

// Say “Hello, World!” to the console
ConsoleTarget console = new ConsoleTarget();
console.Attach(helloWorld);
console.Write();

You can even combine the helloWorld template with another template to form another composite template. As you can see, there’s absolutely no limit to the templates you can make using this feature. The rest I leave to your imagination.

If your favorite .NET language doesn’t support operator overloading

Alternatively, if the .NET language you’re using does not support operator overloading (i.e., VB.NET), you can use the TextGenerator.Combine() method instead:

Dim helloWorld as TextGenerator = TextGenerator.Combine(hello, world)

In C#, the signature of the Combine() method is defined as follows:

public static TextGenerator Combine(params TextGenerator[] generators);

No matter which method you choose (whether it is operator overloading, or the Combine() method), both methods will produce the same result.

Features That Are Not Supported

Inherits Attribute

I chose not to implement the inherits="" attribute in order to keep things simple. Allowing the user to insert their own custom TextGenerator-derived template into the object model unnecessarily complicates the CodeDom portion of the engine, and I think that many of the features that inheritance affords in this particular case can be easily replaced by other methods, such as containment and delegation.

CodeTemplateInfo Object

Since I do not have (nor intend to purchase) a CodeSmith source license, I can’t include anything that might be a part of Eric J. Smith’s object model from CodeSmith. On the other hand, although it’s really easy to implement our own custom CodeTemplateInfo object, I fail to see any immediate use for it at this point.

Features That Will Be Implemented in the Future

There are a lot of things that I would love to implement in TaHoGen, but I either just don’t have the time, or, I just don’t have the skill to implement them at this point. Among these are:

  • Merge Targets. This is the ability to send the output of a template to a region of an existing source file. So far, I have a prototype for a region parser in place, but I can’t figure out how to build the parse tree that will split the regions from the rest of the source text. If you have any experience with working with trees in Spirit and would like to contribute, I would really appreciate it if you could help! :)
  • Compiling Multiple Template Files Written in Different .NET Languages into a Single Assembly. At some point, I want to be able to make it so that any two templates (regardless of the .NET language that they were written in) can be combined into a single .NET assembly. This will make templates much more reusable since you won’t have to rewrite them into another .NET language if you want to combine it with another incompatible template.
  • Our Own SchemaExplorer. The use of SchemaExplorer requires a license from Eric Smith, so using it in TaHoGen is definitely out of the question. We’re going to need an open source equivalent, but that will take some time to implement.

Features That Will Never Be Implemented

  • Intellisense. TaHoGen is a pure code generation engine, not a text editor. Enough said. :)
  • Syntax Highlighting. TaHoGen is a pure code generation engine, not a text editor. Enough said. :)

Points of Interest

I think that it goes without saying that an engine like this one has an infinite amount of potential uses. You can plug this engine into an ASP.NET page and use it to emulate Master Pages, or you can use it to act as an SQL generator for an object-relational mapping layer. The possibilities are endless. I think the best part about templates in this engine (and CodeSmith to some extent) is that it doesn’t impose any sort of methodology on you. It lets you decide what your code should look like, and not the other way around. In the end, the only limit is your imagination.

Special Thanks

I have to thank the following people who made this project possible:

  • Eric Smith - I have to thank him for writing CodeSmith, and getting me sufficiently frustrated enough with it to write my own version. Necessity truly is the mother of (re)invention!
  • James Crowley from developerfusion.uk - Thanks for reminding me to write this article!
  • J. Conwell - I drew a lot of my ideas from your dotNetScript project. Excellent work!
  • Marc Clifton - The big man on the mountain top who has inspired me with the quality of articles that he has written. Keep up the good work, Marc!
  • Tony Allowatt – Your implementation of the PropertyBag really made it easy for me to reuse property values across multiple templates. Thanks, Tony!

License

  • This library is licensed under the terms and conditions defined in the GNU Public License. You are free to use it for non-commercial purposes. If you are going to use any part of it in a commercial application, contact me and we can work something out.

History

  • First published on 3/14/2005.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here