Contents
Introduction
Converting user input to strongly typed values is part of almost any application. Even servers, which do not have any user interface, often have to parse text/XML configuration files and interpret text values written by humans into something more usable.
For example, imagine a simplistic application that downloads files from the Internet and saves them to a local disk. File attributes of the saved files are read from configuration files or command line, and then converted to FileAttributes
enumeration. For "read only + normal" attributes administrator may have typed "0x081", or "129", or "ReadOnly|Normal", or "normal Readonly". All these are unambiguous and make sense, but parsing them is surprisingly difficult and verbose. Something simpler is needed, ideally as simple as:
FileInfo f=new FileInfo("x.txt");
f.Attributes=Utils.To<FileAttributes>("ReadOnly | 0x80");
Another related and common task is parsing and evaluating expressions. These come useful in many scenarios, from COM interop to passing sorting expression from client to server, from interpreting complex configuration files to hiding System.Reflection
verbosity.
There many partial solutions to the expression evaluation built in into .NET framework. From using CSharpCodeProvider
to LINQ, from calculated columns in ASP.NET DataTable
to using reflection and Reflection.Emit
. Of course it's also possible to add IronPython, etc. While solving the problem, these solutions add hundreds of milliseconds to execution time (like calling C# compiler), or are not easy to extend/modify, or are complex, or are not open source, or eat lots of memory, or use syntax not intuitive to C# programmers, or add unwanted dependencies. Things get a bit easier with .NET 4.0 and dynamics, but XSharper framework/scripting language needed a solution that works with only .NET 2.0.
Writing code for evaluating expressions and even programming languages is certainly a lot of fun, thousands of books and examples exist, and there are many good CodeProject articles devoted to the topic. Better yet, there are mature projects like ANTLR or Coco/R that can be used to generate lexers and parsers for any language.
Yet I wanted something simpler, smaller and concentrated more on the evaluation part. A complete expression evaluation engine that would be capable of parsing and evaluating "normally looking" C# expressions (yet with more relaxed syntax) and would work with normal .NET libraries. Something much much simpler than full C# intepreter, definitely not a complete programming language. Something that is compact and easy to embed into any project as an assembly or directly as a bunch of source code files, and that would not have any dependencies at all except C# compiler.
So, the idea with doing the parser "properly", through compiling grammar into thousands of lines of generated C# code, was abandoned. And it's more fun to write from scratch anyway :). The core is based on shunting-yard algorithm, with some C#-specific stuff, such as typecasts, array initializers, short-circuting etc. added. A rather messy part was dealing with evaluation, that was particularly hard because I wanted to separate parsing and type binding completely. This adds flexibility, but parser cannot determine whether x.y.z
is a static property z
of type x.y
, or a property z
of field y
of object x
, or some other combination of the above, and this decision is postponed until runtime.
At the end, there is an expression evaluation engine that gets relaxed type conversion and Eval
done and is simple to use:
var context=new BasicEvaluationContext();
context.Objects["a"]=20;
context.Objects["b"]="Hello";
Console.WriteLine(context.Eval("a+b.Length"));
It can be used for COM interop too, hiding reflection verbosity and without requiring .NET 4.0 or typelib
import. For example, the following prints information about the current SSL certificate of the local Internet Information Server:
var context=new BasicEvaluationContext();
context.SetVariable("iis", Utils.CreateComObject("IIS.CertObj"));
var cert=context.Eval<string>(@"
$iis.set_ServerName('localhost'); // set ServerName property to local IIS server
$iis.set_InstanceName('w3svc/1'); // set InstanceName property to the first website
Encoding.Unicode.GetString($iis.GetCertInfo());
");
Console.WriteLine("Current certificate is:\n{0}",cert);
Expression Syntax
Writing a complete standard-compliant C# interpreter or compiler was not a plan (why bother if a real compiler is csc.exe call away?), so some features were cut and simplifications made to keep complexity under control.
Unsupported Features
- Assignment, lvalues, postfix increment, etc. Writing
a=b+c
is not possible. Neither is b++
nor a[n=b++]=-8
. Properties can still be set using x.set_PropertyName(value)
syntax.
- Overridden operators. Sorry,
DateTime
and TimeSpan
cannot be added together as date+span
. date.Add(span)
syntax has to be used instead.
- C# 3 initializers, like new X{ a=3 }
- Anonymous classes, LINQ, delegates, events, etc.
- Escaped characters. "Hello\n\r" is a string consisting of 9 characters.
- Templates (it is possible to call methods of templated objects, but not to create them).
- Multidimensional arrays, like
new int[4,2]
. Jagged arrays, like a[b][c]
, are fine.
Supported Features
- All the usual +,-,*,/, ||, &&, as, is, typecasts, etc.
- Conditional operator,
X?Y():Z()
which properly short-circuits.
- Null-coalescing operator
a??b
.
- Logical operators || and &&, which short-circuit as well.
- Comments (// and /* */)
- Arrays, with initializers. For example
int[] { 1,2,3}
new
and throw
operators, like new FileInfo("C:\File.exe")
- Namespaces
There are also additional features that will be explained in more detail below.
Using the Code
The most important classes in the Eval
library are:
ParsingReader
, derived from TextReader
, with some useful methods for parsing text input: reading quoted string and numbers from the stream (in all variety of 0x3233l, 0.21e31 and 211.2m syntaxes), skipping white space, etc. It also maintains history, so if ParsingException
is thrown, it can show where exactly the problem occurred.
Utils
class contains a bunch of static
utility methods, with To<T>(object) being particularly useful for easy conversions between types.
Parser
class converts input text to an expression tree.
IOperation
interface represents a single node in the expression tree.
IEvaluationContext
interface is a context used for expression tree evaluation.
BasicEvaluationContext
is simple implementation of IEvaluationContext
interface.
Generally the code is supposed to be used as below, with parser created, expression parsed, and then evaluated on stack:
TextReader input = ...;
using (ParserReader reader=new ParserReader(input))
{
Parser parser=new Parser();
IOperation expressionTree=parser.Parse(reader);
IEvaluationContext context=new SomeClassImplementingIEvaluationContext();
Stack<object> stack=new Stack<object>();
expressionTree.Eval(context,stack);
object resultValue=stack.Pop();
Console.WriteLine("Result: "+resultValue);
}
If just something simple is needed, BasicEvaluationContext
can make it as brief as:
new BasicEvaluationContext().Eval("Console.WriteLine('Hello, world')");
IEvaluationContext
IEvaluationContext
provides values to objects referenced in expressions and gives access to the type system and external methods. This interface has 6 methods:
public interface IEvaluationContext
{
bool TryGetValue(string name, out object value);
Type FindType(string name);
object CallExternal(string name, object[] parameters);
bool TryGetExternal(string name, out object value);
IEnumerable <TypeObjectPair> GetNonameObjects();
bool AccessPrivate { get; }
}
Simple and ready to use implementation of IEvaluationContext
is included as BasicEvaluationContext
, see its source code for more details.
First of all, interpretation engine has two different concepts: external objects and values:
- External objects are accessed through normal C#-like identifiers that cannot contain spaces and special characters. For example, evaluation of a+b will call
TryGetExternal
twice with "a" and "b". BasicEvaluationContext
adds null
, true
and false
to its list of objects by default.
- Variables are similar to objects but referenced in expressions using Perl/PHP-like syntax, with $ prefix and optional {} ( useful for spaces and non-ASCII characters in the name ). In case of
$price * ${number of items}
the engine will call TryGetValue
with "price" and "number of items". Variable name can also be an empty string, for example $.ToString().Length+${}.ToString().Length
evaluation would call TryGetValue("")
twice.
Resolution of type names to .NET types is up to the implementation of FindType
method. This can be used as a basic security feature, so list of types available to the expression may be restricted without using Code Access Security.
Obviously, expression can call methods and access properties of any object using the usual obj.method()
syntax. When object is not specified, like in cos(x)/sin(x), CallExternal interface method is responsible for finding and calling an appropriate method or throwing an exception if it's not possible:
class EvalContext : BasicEvaluationContext
{
public override object CallExternal(string name, object[] parameters)
{
switch (name.ToUpperInvariant())
{
case "SIN":
if (parameters.Length==1)
return Math.Sin(Utils.To<double>(parameters[0]));
break;
case "COS":
if (parameters.Length==1)
return Math.Cos(Utils.To<double>(parameters[0]));
break;
}
return base.CallExternal(name,parameters);
}
}
void Main()
{
var context=new EvalContext();
context.Objects["pi"]=Math.PI;
Console.WriteLine(context.Eval("cos(pi)/sin(pi/4)"));
}
Expression may also automatically call methods of certain objects or classes via special dot-syntax. For example, trigonometry functions are defined in class MyTrig
. Now, if the implementation of GetNonameObjects
returns an instance of MyTrig
object, the expression may be written as ".Cos(x)/.Sin(x)", and it will call the appropriate methods of that MyTrig
object. This syntax difference may be used to avoid name collisions between built-in and user-defined functions.
class MyTrig
{
private bool _useDegrees=false;
public MyTrig(bool useDegrees) { _useDegrees=useDegrees; }
public double Sin(double x) { return Math.Sin(_useDegrees?(x/180)*Pi:x); }
public double Cos(double x) { return Math.Cos(_useDegrees?(x/180)*Pi:x); }
public double Pi { get { return Math.PI; }}
}
void Main()
{
var context=new BasicEvaluationContext();
context.AddNonameObject(new MyTrig(true));
Console.WriteLine("Cos of 45 degrees is "+context.Eval(".cos(45)"));
}
Finally, if AccessPrivate
returns true
, expressions will be allowed to access non-public methods and properties of the objects.
A Few Additional Notes
Case Sensitivity
Variable and object names may be case sensitive or not, depending on the IEvaluationContext
implementation. Method and property names, unlike C#, are case insensitive.
Type Conversions
Type casts are much more relaxed than in C#. For example, (string)(bool)0x21 is valid (evaluates to string true), and so is (FileAttributes)'0x32|normal'. Even (char)"a" works, and returns the first character of the string.
Comma and Semicolon Operators
Comma and semicolon have the same meaning as comma operator in C, returning the last value in the list. So a();b();c()
calls 3 functions, and returns the result of c()
.
Multi Expressions
In addition to single expressions, there is also a concept of multi-expression, with syntax like ${a|b|c|=5}. It will return value of variable a if it is defined. If it's not, it will try to get value of b. If that fails too, value of c, and finally an integer 5. This is a convenient way of providing default values to variables.
Character Types and Strings
Strings may be quoted using " (double quote), ' (single quote) and ` (backquote) interchangeably, and resulting type is always string
and not char
. If char
type is needed, explicit conversion must be made like (char)`a` or new string((char)8,21).
Also, there are no escaping characters in the string. To create string
with \x08 character, for example, concatenation should be used instead: "AAA"+(char)0x8+(char)0x8+(char)0x8+'BBB'.
Arrays
Array can be created without new, just by using {} block. For example, {1,2,3} evaluates to array of 3 integers, {1,2.4,3} to array of 3 doubles, and {1,'2.4',3} to array of 3 objects.
Alternative Syntaxes for Operators
<, > , & characters are inconvenient and difficult to read when expression is embedded into XML. The following may be used instead:
#OR# |
|| |
#AND# |
&& |
#BOR# |
| |
#BAND# |
& |
#BXOR# |
^ |
#EQ# |
== |
#NEQ# |
!= |
#LT# |
< |
#GT# |
> |
#LE# |
<= |
#GE# |
>= |
#NOT# |
! |
#NEG# |
~ |
For example, a && b and a #AND# b are equivalent.
Dates
Dates can be created by wrapping them into #. For example, #2009-1-2 12:08GMT#.
Assignment
While assignment using '=' to properties and variables is not implemented, it still can be done indirectly. For example:
var context=new BasicEvaluationContext();
context.SetObject("this",context);
context.Eval(@"
this.SetVariable('x',20);
this.SetVariable('y',150.2m);
Console.WriteLine('x+y={0}, and type of result is {1}',$x+$y,($x+$y).GetType());
");
Debug Dumps
There is unary operator ##, which converts the object to human readable output (shortcut to XSharper.Core.Dump.ToDump(value)
). For example:
Console.WriteLine(new BasicEvaluationContext().Eval<string>
("##Environment.GetLogicalDrives()"));
Please see a dedicated article for more details.
Performance was not a priority so far. However, it was still interesting to at least get a ballpark figure how slow it is compared to advanced interpreters, like modern Internet browsers and PHP. For testing purposes, a rather complex expression is parsed and evaluated 200,000 times (corresponding test scripts are in EvalBenchmarks
directory in the sample file):
var context=new BasicEvaluationContext();
context.Variables["v_t"]="T";
IOperation tree=null;
var timer=System.Diagnostics.Stopwatch.StartNew();
int loops=200000;
for (int i=0;i<loops;++i)
{
tree=new Parser().Parse(new ParsingReader(@"
( ( $v_t == 'B' ) ? 'bus'.Length :
( $v_t == 'A' ) ? 'airplane'.Length+10 :
( $v_t == 'T' ) ? 'train'.Length+100 :
( $v_t == 'C' ) ? 'car'.Length +1000:
( $v_t == 'H' ) ? 'horse'.Length +10000:
'feet'.Length+100000 );"));
}
Console.WriteLine("Parsing took {0}, or {1} parsings per second",
timer.Elapsed, (long)(loops/timer.Elapsed.TotalSeconds) );
timer=System.Diagnostics.Stopwatch.StartNew();
string res=null;
var stack=new Stack<object>();
for (int i=0;i<loops;++i)
{
tree.Eval(context, stack);
res=Utils.To<string>(stack.Pop());
}
Console.WriteLine("Evaluation took {0}, or {1} evaluations per second",
timer.Elapsed, (long)(loops/timer.Elapsed.TotalSeconds) );
Console.WriteLine("Result="+res);
Testing showed that the parser, when compiled to x64, is 2-5 times slower than Chrome/Firefox/IE8 (IE8 is the fastest, 2500ms vs 10000+ms in Chrome4), and execution is 5-10 times slower (Chrome is the fastest, 170ms). Compared to PHP 5.2.3, the engine is about 3 times slower parsing, and execution about 6 times slower.
Interestingly, compiling to x86 instead of AnyCPU/x64 speeds up parsing by whopping 250% , and parsing becomes even faster than Chrome4 and pretty much on par with PHP (with the test expression above). Execution speed is also faster by about 20%. Apparently optimizations in the x64 version of .NET 3.5 JIT are not as good as in x86 version.
In conclusion, considering that highly optimized engines written in native code were compared to a non-optimized library written in .NET + reflection, the library seems to be doing reasonably well. Its performance (in x86 mode particularly), with tens/hundreds of thousands of expressions parsed/evaluated on a single core of Intel Core CPU per second, should be adequate for many purposes.
To put things into perspective, strongly typed compiled .NET code evaluating the same conditional expression is 300-500 times faster. So when execution speed is critical and expressions are evaluated millions of times, it would be much better to avoid interpretation altogether. Compile frequently used expressions into a .NET assembly using CSharpCodeProvider
class (even if it launches C# compiler behind the scenes for half-a-second), or converts generated expression tree into .NET 3 Expression tree and compiles it, and so on.
History
The latest version of the library (either as Eval
library in samples, or complete XSharper.Core
library) can be downloaded from XSharper.com.