Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / VB

Flee - Fast Lightweight Expression Evaluator

4.91/5 (47 votes)
11 Oct 2007LGPL310 min read 2   3.7K  
A .NET expression evaluator that compiles to IL and is designed for speed.

Demo showing a practical usage of the library

Introduction

Flee is a .NET library that allows you to parse and evaluate arbitrary expressions. The main feature that distinguishes it from other expression evaluators is that it uses a custom compiler to convert expressions into IL and then, using Lightweight CodeGen, emits the IL to a dynamic method at runtime. This means that expression evaluation is several orders of magnitude faster than when using interpretive expression evaluators. In fact, the entire design of the library and the expression language is geared towards making evaluation as fast as possible.

Features

Here a list of the major features:

  • Fast and efficient expression evaluation
  • Compiles expressions to IL using a custom compiler
  • Small, lightweight library
  • Code generated for expressions can be garbage-collected
  • Does not create any dynamic assemblies that stay in memory
  • Supports all arithmetic operations including the power (^) operator
  • Supports string, char, boolean, and floating-point literals
  • Supports 32/64 bit, signed/unsigned, and hex integer literals
  • Features a true conditional operator
  • Emits short-circuited logical operations
  • Allows for customization of expression compilation

About Lightweight CodeGen

Lightweight CodeGen (LCG) is a feature introduced in .NET 2.0 to allow for efficient runtime code generation. Using its DynamicMethod class, you are able to create a method at runtime, set the code of the method body, and finally call the created method. Its main advantages are that the generated method and its code can be garbage collected when no longer referenced, and it does not generate any dynamic assemblies that stay in memory. One of its weaknesses is that the only way to create the method body is to emit IL. Since IL is basically a form of assembly language, performing even simple tasks requires lots of instructions and an understanding of the low-level workings of the CLR. For example, here is the IL required to evaluate the expression DateTime.Now.ToString():

MSIL
L_0001: call valuetype [mscorlib]System.DateTime [mscorlib]System.DateTime::get_Now()
L_0006: stloc.0 
L_0007: ldloca.s dt
L_0009: constrained [mscorlib]System.DateTime
L_000f: callvirt instance string [mscorlib]System.Object::ToString()

Automating the translation of expressions to IL is where this library comes in. Essentially, it does the same thing as a C# compiler except that it compiles at runtime, emits its IL to memory instead of an assembly, and is designed to work with expressions (which are a subset of a complete programming language).

Walkthrough: Creating and Evaluating an Expression

For this walkthrough, we are going to compute the hypotenuse of a triangle with sides a and b, using the expression sqrt(a^2 + b^2). To start, we will need to add a reference to the ciloci.Flee.dll and import the ciloci.Flee namespace. First, we need to declare our expression owner. This is a class to which the dynamic method generated by the expression will be attached. Once attached, the dynamic method will behave as a static function on the class and will have access to all of its members (even non-public ones). We will declare a class and give it two fields to represent the a and b sides of the triangle:

C#
public class ExpressionOwner
{
   public int a;
   public int b;
}

To be able to use the sqrt function in our expression, we have to import the Math class. Importing determines what members, besides the ones on the owner, are visible to an expression. By default, no types are imported, and references to types using their fully-qualified names are not allowed. This is a security feature: since expressions are compiled and evaluated at runtime, and could be entered by users, it would not be a good idea to allow them to call arbitrary methods on any type loaded into the program. Once we import the Math class, we can use all of its constants and methods in the expression. We will use the sqrt function, but we don't need the pow function since the power operator is built in to the expression language. We will need to create an instance of the ExpressionOptions class and use its Imports property to do the actual import:

C#
ExpressionOptions options = new ExpressionOptions();
options.Imports.AddType(typeof(System.Math));

Next, we create an instance of our expression owner and initialize the a and b fields:

C#
ExpressionOwner owner = new ExpressionOwner();
owner.a = 4;
owner.b = 5;

Now, we create the expression, passing it our owner and the options:

C#
try
{
    Expression e = new Expression("sqrt(a^2 + b^2)", owner, options);
}
catch (ExpressionCompileException e)
{
    // This exception will be thrown
    // if for any reason the expression cannot be compiled            
}

There is now a method in memory that contains the code for the expression. The expression has an Evaluator property that exposes a delegate pointing to this method. The delegate will always be an instance of ExpressionEvaluator, which is a generic delegate defined in the library. To evaluate our expression, we need to retrieve the delegate and cast it to an ExpressionEvaluator with the correct return type. Although this seems more work than just returning an object, it ensures that there will be no boxing for expressions that evaluate to value types. Applying the above, we can now evaluate our expression:

C#
ExpressionEvaluator<double> evaluator = (ExpressionEvaluator<double>)e.Evaluator;
double result = evaluator();

That's it! You can now update the fields on the owner, and evaluate the expression again to get the updated result. If you are curious as to what IL was generated by the expression, you can use its EmitToAssembly() method to have it dump the IL to an assembly on disk. You can then use ILDasm (or better yet, Reflector) to inspect it. Here is the IL for our hypotenuse expression:

MSIL
L_0000: ldarg.0 
L_0001: ldfld int32 [ConsoleTester]ConsoleTester.ExpressionOwner::a
L_0006: conv.r8 
L_0007: ldc.i4.2 
L_0008: conv.r8 
L_0009: call float64 [mscorlib]System.Math::Pow(float64, float64)
L_000e: ldarg.0 
L_000f: ldfld int32 [ConsoleTester]ConsoleTester.ExpressionOwner::b
L_0014: conv.r8 
L_0015: ldc.i4.2 
L_0016: conv.r8 
L_0017: call float64 [mscorlib]System.Math::Pow(float64, float64)
L_001c: add 
L_001d: call float64 [mscorlib]System.Math::Sqrt(float64)
L_0022: ret

Note how the a and b fields, which are integers, were implicitly converted to doubles for the ^ operator, and how the ^ operator was translated to a call to the Pow function. The IL stream above is only 35 bytes long, and will be JITed to native processor instructions. Compare this with interpretive evaluators that have to go through several function (and possibly Reflection) calls and if statements to evaluate the same expression.

Demo Application

There is also a demo application that shows a practical usage of the library. The concept was inspired by Pascal Ganaye's demo for his expression evaluator. Essentially, the demo generates an image by evaluating expressions for the red, green, and blue components where the expressions can reference the x and y co-ordinates of the current pixel. This is, basically, a color version of plotting a graph for a function. Using the demo will give you a good sense of how fast expressions can be evaluated. Even on my outdated computer at home, I still manage over a million evaluations per second. This means that, on a modern computer, you could generate an 800x600 image in under a second.

The Expression Language

The expression language that this library parses is a mix of elements of C# and VB.NET. Since the aim of this library is speed, the language is strongly typed (same rules as C#) and there is no late binding. Unlike C#, the language is not case-sensitive. Here is a breakdown of the language elements:

ElementDescriptionExample
+, -Additive100 + a
*, /, %Multiplicative100 * 2 / (3 % 2)
^Power2 ^ 16
-Negation-6 + 10
+Concatenation"abc" + "def"
<<, >>Shift0x80 >> 2
=, <>, <, >, <=, >=Comparison2.5 > 100
And, Or, Xor, NotLogical(1 > 10) and (true or not false)
And, Or, Xor, Not Bitwise100 And 44 or (not 255)
IfConditionalIf(a > 100, "greater", "less")
CastCast and conversioncast(100.25, int)
[]Array index1 + arr[i+1]
.MembervarA.varB.function("a")
String literal"string!"
Char literal'c'
Boolean literaltrue AND false
Real literalDouble and single100.25 + 100.25f
Integer literalSigned/unsigned 32/64 bit100 + 100U + 100L + 100LU
Hex literal0xFF + 0xABCDU + 0x80L + 0xC9LU

License

The library is licensed under the LGPL. This means that as long as you dynamically link (i.e., add a reference) to the assemblies from the official releases, you are free to use the library for any purpose.

Implementation Details

Implementing an expression evaluator requires two things: a parser that converts an expression string into a syntax tree, and a compiler to take the tree and convert it into IL.

The Parser

I chose not to write my own parser, and used Grammatica instead because it is written by a guy who specializes in parsing and language theory. Grammatica is a parser generator: a program that takes a grammar (set of rules for a language) and outputs a class that will parse it. Writing grammars for it is not difficult if you are comfortable with recursion and Regular Expressions. It has great error handling, and will report any invalid grammatical constructs. The source code package for this library includes the grammar file for the expression language, so you can see how it all works. Grammatica is a great tool if you want to do parsing that requires more power than Regular Expressions can provide. Don't be fooled by its alpha status, it is solid as a rock, and has never given me problems.

Once I wrote the grammar for the expression language, I had Grammatica generate a parser for me. The parser class has callback methods that I can hook into to let me know when the parser has hit a certain language element. From there on, it's up to me to decide what to do with that element. As it turns out, the best way to represent the language elements is via a tree structure. An expression like 1 + 2 will be represented as an ADD element with two children representing the operands. Every element validates its children as they are set to make sure that they are valid for its particular operation, and throws an ExpressionCompileException if they are not. As you keep parsing, the tree gets built up until you reach the end of the expression and you are left with one root node representing the whole expression. At this point, we have a valid expression and its compiled representation; we are finished with parsing and have to move on to code generation.

The Compiler

We now have a tree consisting of various operators and operands that we have to translate to IL. Every element in our tree derives from the ExpressionElement class, which requires that the element be able to emit IL. From here, the main challenge is to emit IL that is as compact as possible and follows the type rules of the CLR. This means that you always have to watch for whether you are working with a value type or reference type. The IL emitted for the expression mystring.func() is different than the IL for myint.func(). To generate the IL for our method, we let the root element emit its IL and the IL of all its children. We then call the CreateDelegate method on our DynamicMethod to "burn" the code into memory and get a delegate that points to it. From here, the method's code is set and cannot be changed.

Testing

One thing that is essential for a project like this is unit testing. As you tweak the grammar for your expression, you have to make sure that it doesn't invalidate previously valid expressions and that the result of evaluating an expression is correct. Fortunately, this type of project lends itself perfectly to unit testing. All you have to do is create a test consisting of a text file with an expression on each line and a loop that feeds the expression into the evaluator. If you don't get an exception and the result is correct, the test passes. As you build up your language and add new elements, you keep adding to your text file, so that at any point, you can be sure that you haven't broken any previous expressions.

Conclusion

The idea for this project came from the lessons I learned from my previous project. It turns out that people mainly want to be able to define variables and evaluate them in expressions. The previous project lets them do that, but since it is Excel compatible, it has to evaluate expressions in a particular way and support a more dynamic type system. For this project, I wanted to remove those limitations and try to create a fast and efficient expression evaluator. I also wanted to try my hand at building a compiler since it's something every programmer should do. I look forward to your feedback, and plan on actively maintaining and improving this library based on it. Thanks!

History

  • Jul 26, 2007: Initial article publication.
  • Jul 30, 2007: Revisions to make article better.
  • Aug 19, 2007: Updates for dynamic variables and CalculationEngine.
  • Aug 26, 2007: Added paragraph about ExpressionOptions.
  • Oct 11, 2007: Added operator precedence table.

License

This article, along with any associated source code and files, is licensed under The GNU Lesser General Public License (LGPLv3)