Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / database / SQL-Server

DynamicParser: How to parse Delegates and Dynamic Lambda Expressions and convert them into Expression Trees

4.78/5 (29 votes)
29 Aug 2012CPOL14 min read 2   872  
Describes how to use C# dynamics to convert a delegate into an expression tree

Introduction

Unless you were living in a cave, concepts like "delegates", "lambda expressions", and "expression trees" should not be new to you any longer. Among many other interesting things they permit you to represent a bunch of code as a tree of logical expressions that, for instance, you can later compile and execute. As an example, in the following code the first line creates a logical expression tree for the code written in the lambda expression at the right, and the second line compiles and executes it, printing the line "Value = 7" in the console as expected:

C#
Expression<Action<int>> func = x => Console.WriteLine( "Value= {0}", x );
func.Compile().Invoke( 7 );

Ok, but... why do you need a logical expression tree instead of its actual compiled code? Because its real power comes when you realize that those expression trees can be very useful by themselves: they provide you with a representation of your code’s logic that can be used for many other purposes.

This is great, really. And it opens the door to many cool things. For instance, this is precisely one of the major things a Linq provider does: translating those expression trees into something the database (or alike) can understand.

But... the way this functionality is implemented today has a major restriction: dynamic objects are not supported in expression trees. For instance, the following code will not compile:

Expression<Action<dynamic>> func = x => Console. WriteLine( "Value = {0}", x );

The compiler is not happy with the above and will raise the error: "An expression tree may not contain a dynamic operation". So we are constrained to use the properties and methods of the actual non-dynamic types we have used when written the expression.

And this is bad if we want to express a logical expression in such an agnostic way that it is not bound to any specific type. For instance, this is precisely the need I had when writing the "Kerosene" library, a dynamic ORM that permit you to write any arbitrary SQL code in plain C#, without specifying in advance any knowledge of the schema or types used in the database. So essentially I needed a solution that permits me to write and compile something like this:

Action<dynamic> func = x => x.Where( x.FirstName >= "John" );

This example does compile because it is not an expression tree, but rather it is a perfectly valid dynamic delegate. Note that I was not assuming that in any other part of the code there is class with a property named "FirstName". Just the opposite. What I wanted was the ability of writing such expression regardless what properties and methods I was using, and parse them afterwards in a dynamic and late-bound fashion.

Another interesting thing to note here is that we have written, conceptually, a comparison among strings using standard C# operators, something the compiler is not supposed to allow. This is another benefit of using this late-bound mechanism: these kind of checks are done at run-time instead of at compile-time (and you will see that in our case they are not even enforced, as we are going to deal only with "conceptual" variables, instead of with real ones).

Ok, cool, first objective achieved. But now, how can we obtain the expression tree out from such a "dynamic delegate" (defined as a delegate with at least one dynamic argument)?

The DynamicParser solution

As I have not found anywhere any solution to those needs, I developed the DynamicParser class.

Its static method Parse() takes a dynamic delegate as its argument, parses it, and returns back an object that happens to be an actual instance of this DynamicParser class. The mission in live of this created instance is to to store the results of the parsing. This result can be found in its Result property.

Typically this property will store the representation of your logical expression tree in the form of an instance of a class derived from DynamicNode, a specialized and agnostic class designed precissely for this purposes. I said "typically" because there are some specialized scenarios where the object stored can be of other types, but we will see them later.

Another interesting property is the Arguments one. It permits the enumeration of the dynamic arguments that were found while parsing the expression, that are stored as instances of the DynamicNode.Argument class. DynamicParser will create this enumeration and its contents for you. Note that there is no hard limit to the number of dynamic arguments you can use with DynamicParser – but if you use the Func<> or Action<> forms to create your lambda expression there is a limit of 15 dynamic arguments (this is a C# limit).

To understand how its usage, let’s see a first example:

Func< dynamic, object > fn = x => x.Id >= "Foo";
var parser = DynamicParser.Parse( fn );
Console.WriteLine( "Expression: {0}", parser.Result );

It prints the logical expression that you can expect:

(x.Id GreaterThanOrEqual Foo)

This is good: we have obtained a logical representation out from a dynamic delegate by parsing it, in an agnostic way, and not bound to any specific type. This logical representation is the analogous of a standard C# Expression tree. Let me remember again that in order to register the logic in such agnostic way, and to permit later an easier manipulation, this representation is made out of instances of classes that derive from DynamicNode, as mentioned above.

Do also remember that the result is stored in the Result property. The reason is because, potentially, DynamicParser can return an arbitrary object from its execution. Also, by doing it this way, we will have an object that has other properties with valuable information.

Now, before explaining how to manipulate those trees, or how DynamicParser works, let’s see a second example of a dynamic expression we can parse:

int num = 0;

Func<dynamic, object> fn =
   x => !( x.Aplha[num++,"Hello"].Beta["ZZ"+num] >= x.Beta || x( num ) == null );

for( int i = 0; i < 5; i++ ) {
   var parser = DynamicParser.Parse( fn );
   Console.WriteLine( "Parsed: {0}", parser.Result );
}

This code will produce, for instance in its fourth iteration, the following line:

(Not ((x.Aplha[ 3, Hello ].Beta[ ZZ4 ] GreaterThanOrEqual x.Beta) Or (x( 4 ) Equal null)))

The interesting thing to note is how we have used an external variable in the lambda expression, á la closure way. We could have also used an external method or property of another object. In these cases, when those properties or methods are not attached to a dynamic argument, are going to be considered as "external" ones, and their values captured in order to be stored in the results' tree.

Note that this values are captured in the very moment the expression is parsed (or invoked if you wish). This is why the value used from the variable "num" changes in each iteration.

Another thing that is also interesting is how have we invoked "directly" the dynamic argument, as in "x( num )". This a perfectly valid and supported expression in DynamicParser, recorded in a specific node, and that can be used for any purpose you may later want.

Let see now another interesting example:

Func<dynamic, object> func = x => new { x.FirstName, x.FamilyName, Salary = x.Base + x.Variable };

var parser = DynamicParser.Parse( func );
Console.WriteLine( "Type: {0}", parser.Result.GetType().Name );
Console.WriteLine( "Parsed: {0}", parser.Result );

It will produce:

Type: <>f__AnonymousType0`3
Parsed: ( x ) => { FirstName = x.FirstName, FamilyName = x.FamilyName, Salary = (x.Base Add x.Variable) }

This is very interesting because, as you can see, the result of the parsing is not necessarily an expression tree. In this case, the result is an anonymous type with three properties, each of them being a DynamicNode instance: either a "Get Member" expression for the first and second ones, or a "Bynary Expression" for the last addition operation. Also note that the first two ones are named automatically (and dynamically), whereas the third one has a specific name we have given to it.

Limitations

  • The DynamicParser solution can only parse delegates that have at least one dynamic argument. Indeed, if you are not using dynamic arguments, you might be better served by using regular C# expression trees, for instance.
  • The ternary operator "?:" is not supported. Because the way the solution works, it can only parse either the "then" part or the "else" part, but not both at the same time.
  • There is only a limited support for conversion operators, as in "(string)x.Name". Actually, strings, nullable types, and those with parameterless constructors are completely supported. All others are converted to straight objects, something that may impose a limitation on the expressions to parse.

How it works

The basic idea is to actually execute the delegate, using its DynamicInvoke() method, and intercepting and taking note of the dynamic invocations and bindings done by the DLR (Dynamic Language Runtime) against the dynamic arguments used.

The tricky part was about how to maintain the ball rolling, so to speak, once a given binding is performed. If the dynamic argument we are about to bind just derives from DynamicObject I have found no way to make it happens. But after a lot of research and experimentation I have devise a way to make it work by using the IDynamicMetaObjectProvider interface.

When a class implements this interface, its GetMetaObject() method is supposed to return instances of a class derived from DynamicMetaObject, that in our solution will be the DynamicMetaNode class. What DynamicParser does is to override the BindXXX() methods in such a very specific way that the execution of the dynamic delegate flows and it is not interrupted. So I am able to intercept and annotate each logical binding, and use this knowledge to build the logical tree and its nodes.

I have implemented this mechanism in the abstract DynamicNode class, which is the parent class of all the node types that will form a logical tree. The dynamic arguments used to invoke the delegate are instances of the DynamicNode.Argument class, so that they can use the mechanism explained above to start the dynamic binding and its annotations.

Now, what is that way? Essentially, it means that each BindXXX() operation should return a new DelegateMetaNode object built in such a way that its Expression and Restrictions properties permit the newly created object to be used in the next binding operation. An actual (simplified) example is as follows:

public override DynamicMetaObject BindGetMember( GetMemberBinder binder )
{
   var obj = (DynamicNode)this.Value;
   var node = new DynamicNode.GetMember( obj, binder.Name ) { _Parser = obj._Parser };
   obj._Parser.LastNode = node;

   var par = Expression.Variable( typeof( DynamicNode ), "ret" );
   var exp = Expression.Block(
      new ParameterExpression[] { par },
      Expression.Assign( par, Expression.Constant( node ) )
   );
   return new DynamicMetaNode( exp, this.Restrictions, node );
}

If you take a look at the code in the download, you'll see that the trickiest and hardest ones to implement were the BindConvert() and BindUnary() methods.

The first one because I had to return a DynamicMetaNode that needs to be compatible with the type expected by the conversion operation, and to that end, I have to create an appropriate "real" object of the correct type. But strings, for instance, don't have a default parameterless constructor. I ended up treating strings as a special case and trying for any other types some default possibilities. This is why the support for conversion operators is limited.

The second one was even harder, because the binding mechanism injects the IsFalse and IsTrue expression types that initially I was not aware of. So the overall schema kept failing till I realized that those unary expressions where injected into the flow by the binding mechanism without my knowledge or control, and I learnt how to deal with them.

Manipulation of a Logical Tree

So now that we have obtained a logical representation, how can we manipulate it? The best way to do it is to use an implementation of the Visitors pattern using the fact that any node returned is an instance of a class derived from DynamicNode. I'm not going to go deeper into this pattern as there is a lot of information in the Internet about it.

Each of those nodes have a number of properties that store the information you will need to understand how they were created, what other objects they refer to, and so on. Note that, on top of the properties each derived class may have, there is also one property provided by the base abstract DynamicNode class, which is named Host. When this property is not null, it means the instance is part of a broader logical expression, maintained by the node this property returns.

And what are those specialized classes?

  • DynamicNode.Argument: it represents the dynamic arguments used in the delegate or lambda expression. Its Tag property maintains its actual name as used in the expression. For instance, when parsing the dynamic lambda expression "x => x.Id > 7", this property maintains the string "x". You can obtain the list of the dynamic arguments used from the Arguments property of the instance of DynamicParser returned by its static Parse() method, as explained above.
  • DynamicNode.GetMember: it represents a member access operation, as in "x => x.BadgeId". In this case its Name property maintains the string "BadgeId", and its Host property refers to a DynamicNode.Argument instance as explained above. Let's see another example: "x => x.Employee.BadgeID". In this case, the Name property has also the string "BadgeID", but its Host property refers to another instance of the GetMember class, the one maintaining the "x.Employee" access operation.
  • DynamicNode.SetMember: it represents a set member operation, as in "x => x.BadgeID = "007"". It is basically the same class as the GetMember one, but it has an additional property named Value, of type object, that stores the value that has been (conceptually) set to this member.
  • DynamicNode.GetIndex: it represents the indexed get operation, as in "x => x.Id[ 27 ]". In this case, its Host is the GetMeber instance of the "x.Id" operation, and instead of having a Name property it has an Indexes property of type array of objects, where the actual values of the indexes are stored. Note that these indexes can be regular objects, but can perfectly be also dynamic expressions as well, as in "x => x[ x.Alpha ]".
  • DynamicNode.SetIndex: it represents the indexed set operation, as in "x => x.Id[ 27 ] = DateTime.Now". And yes, on top of GetIndex’s ones it has its own Value property to store the value to set to this indexed property.
  • DynamicNode.Method: it represents the call of a method, as in "x => x.Property.Method( args )". In this case its Host property maintains the "x.Property" access operation, and it has two additional properties: Name, which maintains the name of the method invoked, in this case the string "Method", and the Arguments property, which maintains an array with the arguments used. This array can be empty if no arguments were used.
  • DynamicNode.Invoke: it represents a direct invocation of the dynamic argument, as in "x => x( args )". Its Host property represents the dynamic argument, and in its Arguments property there is the array (potentially empty) of the arguments used.
  • DynamicNode.Binary: it represents a binary operation. For instance a logical comparison one (as in "x => x.Alpha >= x.Beta"), or a numeric-like operator (as in "x => x.Alpha + x.Beta"), or many other possibilities. Its Host property is null as it is not part of a broader expression. Its Left property maintains its left operand, and it is always a DynamicNode instance (actually the one that initiated the binding of the bynary operation). Its Right property maintains the right operand, and is of type object as it can be any value or object. Its Operation property is of type ExpressionType, and maintains what binary operation has been bound.
  • DynamicNode.Unary: it represents an unary operation, as in "x => !x.BadgeID". Its Host property is null as it is not part of a broader expression. It has also the same Operation property as above, and a Target property that maintains the DynamicNode instance to what this unary operation is bound to.
  • Finally, the DynamicNode.Convert class: it represents a conversion operation as in "x => (string)x.BadgeID". Its Target property maintains the DynamicNode instance to what this conversion operation is bound to, and a ConvertTo property, maintaining the type to convert to.

An interesting point to mention is that you can instantiate the above classes directly, so you can create your own logical tree from scratch if you need so, or to include whatever result you may have obtained into a more complex one, for example.

Points of Interest

One first thing to mention is that the CLR/DLR behaves, by default, in such a way that when a dynamic delegate is executed once, its compiled code is cached and it is not parsed (bound) again. Obviously this is for performance reasons but for our needs this default mode of operation is very unfortunate. Yes, suppose that in your expression you are using an external variable, or invoking an external method. Then its value would be cached and not fetched again.

The good news are that DynamicParser generates its internal DynamicMetaNodes instances built in such a way that the compiled code of the delegates is not cached. So each time a parsing is invoked the external variables and methods are invoked as well.

A second thing to mention is that, when parsing unary operations, the CLR/DLR can inject automatically IsTrue or IsFalse logical operations. You may not be aware of this, but they exist and they shall resolve to a Boolean value. DynamicParser always resolve these scenarios to false in order to keep the ball rolling and parse the rest of the expression, if any.

History

  • [v4, August 2012]: this version is about improvements, cleaner architecture, and performance.
  • [v3, October 2010]: avoids the caching of the parsed delegate the DLR was performing before.
  • [v2, May 2010]: cleaner solutions and some bugs resolved.
  • [v1, April 2010]: initial version.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)