Introduction
Most applications need to evaluate a formula at run-time. The .NET Framework, despite allowing advanced compilation support, does offer a quick and light-weight eval
function. This article introduces a usable eval
function with some rarely available functionalities:
- Fast, single-pass parser
- Highly extensible; you can add your own variables and functions without having to change the library itself
- Common priorities are supported : 2+3*5 returns 2+(3*5)=17
- Boolean operations are supported, i.e. "and," "not," "or"
- Comparison operators are supported, i.e. <=, >=, <>
- Supports numbers, dates, strings, and objects
- Supports calls to object properties, fields and methods
- Runs an expression multiple times without needing reparsing
- Can automatically detect when an expression needs to be re-evaluated
- The expression syntax is completely checked before starting the evaluation
- Fully human-readable code -- not generated using Lex/Yacc tools -- and therefore permits amendments to the core syntax of the evaluator, if needed
This article also attempts to explain how the whole thing works.
Why an interpreter?
People often tell me that there is no place in their application for an evaluator because it is too complicated for their users. I do not agree with this vision. An evaluator is a cheap way to HIDE the complexity for the average user and provide powerful features for the advanced user. Lets take an example.
In your application, you let the users choose the title of the window. This is convenient and simple; it's just a textbox where they can type what they want. The difficulty comes when some users want more. Let's say that they want to see their User IDs or the time. Then you have 3 alternatives:
- You add a form in your program and give your user more options on what they can show.
- You don't do it.
- You use an evaluator (i.e. mine).
The first option requires lots of work on your side and can potentially confuse the more basic users. The second option won't confuse your users, but might lose you the more advanced ones. The third option is ideal because you can keep your textbox and let the powerful user type what they want.
Title of the window for user : %[USERID], the time is %[NOW]
And you're done. The interface is still using a regular textbox and is not complicated. On the coding side, it is really not much to add. In terms of power, you can add a new variable every day and as long as you document it all, your users stay satisfied.
Why not use the .NET Framework built-in compiler?
Using the .NET Framework compilation capabilities seem to be the most obvious way to make an evaluator. However, in practice this technique has a nasty side effect. It looks like it creates a new DLL in memory each time you evaluate your function and it seems nearly impossible to unload the DLL. You can refer to remarks at the end of the article Evaluating Mathematical Expressions by Compiling C# Code at Runtime for more details.
Using other engines or application domains is an option if you want a full VBScript or C# syntax. If you need to write classes and loops, this is probably the way to go. This evaluator is neither using CodeDOM nor trying to compile VB source. It parses an expression character-by-character and evaluates its value without using any third party DLL.
Using the code
The evaluator can be run with just two lines of code:
In VB
Dim ev As New Eval3.Evaluator
MsgBox(ev.Parse("1+2+3").value)
In C#
Eval3.Evaluator ev = new Eval3.Evaluator(
Eval3.eParserSyntax.c,/*caseSensitive*/ false);
MessageBox.Show(ev.Parse("1+2+3").value.ToString());
Providing variables or functions to the evaluator
By default, the evaluator does not define any function or variable anymore. This way, you can really decide which function you want your evaluator to understand. To extend the evaluator, you need to create a class. Below is a VB Sample; a C# version is available in the Zip file.
Public Class class1
Public field1 As Double = 2.3
Public Function method2() As Double
Return 3.4
End Function
Public ReadOnly Property prop3() As Integer
Get
Return 4.5
End Get
End Property
End Class
Note that only public members will be visible.
Usage
Dim ev As New Eval3.Evaluator
ev.AddEnvironmentFunctions(New class1)
MsgBox(ev.Parse("field1*method2*prop3").value.ToString)
You can also use a more dynamic version. I don't really like this method, but it can be useful. Note that the value of an extention can change once parsed, but the type should not.
Public Class class2
Implements iVariableBag
Public Function GetVariable(
ByVal varname As String) As Eval3.iEvalTypedValue _
Implements Eval3.iVariableBag.GetVariable
Select Case varname
Case "dyn1"
Return New Eval3.EvalVariable("dyn1", 1.1, _
"Not used yet", GetType(Double))
End Select
End Function
End Class
ev.AddEnvironmentFunctions(New class2)
Dim code As opCode = ev.Parse("dyn1*field1")
MsgBox(code.value & " " & code.value)
Recognized types
The evaluator can work with any object, but it will allow common operators (+ - * / and or) only on usual types. Internally, I use these types:
enum evalType
number will convert from integer, double, single, byte, int16...
boolean
string
date equivalent to datetime
object anything else
There is a shared function in the evaluator to return all those types as string:
Evaluator.ConvertToString(res)
This function will return every type using a default format.
How does this all work?
If you just want to use the library, please refer to the 'Using the code' section. The following sections are just for curious people who want to know it works. The techniques I used are rather traditional and can, I hope, be a good introduction to the compilation theory.
The evaluator is made of a classic Tokenizer followed by a classic Parser. I wrote both of them in VB, without using any Lex or Bisons tools. The aim was readability over speed. Tokenizing, parsing and execution are all done in one pass. This is elegant and, at the same time, quite efficient because the evaluator never looks ahead or backwards more than one character.
The tokenization
The first thing the evaluator needs to do is split up the string you provide into a set of Tokens. This operation is called tokenization and in my library it is done by a class called tokenizer
The tokenizer reads the characters one by one and changes its state according to the characters it encounters. When it recognizes one of the Token types, it returns it to the parser. If it does not recognize a character, it will raise a syntax error exception. Once the class is created with this command,
tokenizer = new Tokenizer("1+2*3+V1")
...the evaluator will just access tokenizer.type to read the type of the first token of the string. The type returned is one of those listed in the chart below. Note that the tokenizer is not reading the entire string. To improve performance, it will only read a single token at a time and return its type. To access the next token, the evaluator will call the method tokenizer.nextToken(). When the tokenizer reaches the end of the string, it returns a special token end_of_formula.
Recognized token types
enum eTokenType
operator_plus +
operator_minus -
operator_mul *
operator_div /
operator_percent %
open_parenthesis (
comma ,
dot .
close_parenthesis )
operator_ne <>
operator_gt <=
operator_ge >=
operator_eq =
operator_le <=
operator_lt <
operator_and AND
operator_or OR
operator_not NOT
operator_concat &
any word starting with a letter or _
value_identifier
value_true TRUE
value_false FALSE
any number starting 0-9 or .
value_number
any string starting ' or "
value_string
open_bracket [
close_bracket ]
Initial state
none
State once the last
character is reached
end_of_formula
|
The parser
The parser has been completely rewritten in this version. The parser is using the information provided by the tokenizer (the big brown box) to build a set of objects out of it (the stack on the right). In my library, each of these objects is called an OpCode. Each OpCode returns a value and can have parameters or not.
Opcode 1
Opcode 2
Opcode 3
Opcode *
Two Opcode +
and Opcode +
The OpCodes + and * have two parameters. The rest of the OpCodes have none. One of the more complicated concepts of the parser is that of priorities. In our expression...
1 + 2 * 3 + v1
...the evaluator has to understand that what we really mean is:
1 + (2 * 3) + v1
In other words, we need to do the multiplication first. So, how can this be done in one pass? At any time, the parser knows its level of priority:
enum ePriority
none = 0
concat = 1
or = 2
and = 3
not = 4
equality = 5
plusminus = 6
muldiv = 7
percent = 8
unaryminus = 9
When the parser encounters an operator, it will recursively call the parser to get the right part. When the parser returns the right part, the operator can apply its operation (for example, +) and the parsing continues. The interesting part is that while calculating the right part, the Tokenizer already knows its current level of priority. Therefore, while parsing the right part, if it detects an operator with more priority it will continue its parsing and return only the resulting value.
The interpretation
The last part of the evaluation process, is the interpretation. This part is now running a lot faster thanks to the OpCode.
To get the result out of the stack of OpCodes, you just need to call the root OpCode value. In our sample, the root OpCode is a + operator. The property Value will in turn call the value of each of the operands and the result will be added and returned. As you can see from this picture, the speed of evaluation is now quite acceptable. The program below needs 3 full expression evaluations for every single pixel in the image. For this image, it required 196,608 evaluations and, despite that, it returned in less than a second.
The class at the core of this new project is the OpCode class. The key property in the opCode class is the property 'value'.
Public MustInherit Class opCode
Public Overridable ReadOnly Property value(
) As Object Implements iEvalValue.value
MustOverride ReadOnly Property ReturnType(
) As evalType Implements iEvalValue.evalType
...
End Class
Each OpCode returns its value through it. For the operator +, the value is calculated this way:
Return DirectCast(mParam1.value, Double) + DirectCast(mParam2.value, Double)
Is that really faster?
It is faster if you need to evaluate the functions more than once. If you need to evaluate the function only once, you might not care about speed anyway. So, I would recommend this new version in either case. As you can see from the picture above, 3 formulas are evaluated for every pixel of the image. The image being 256x256 pixels, the evaluator had to calculate 196,608 expressions. So, simple expressions are returned in less than 5 microseconds. I think this is acceptable for most applications.
Dynamic variables
Dynamic variables are an interesting concept. The idea is that if you use several formulas in your application, you don't want to recalculate all the formulas when a variable changes. The evaluator as a built-in ability to do that. On this page, the program uses the dynamic ability:
To use this ability once you have parsed your expression:
mFormula3 = ev.Parse(tbExpression3.Text)
You only have to wait for the event mFormula3.ValueChanged
Private Sub mFormula3_ValueChanged( _
ByVal Sender As Object, _
ByVal e As System.EventArgs) _
Handles mFormula3.ValueChanged
Dim v As String = Evaluator.ConvertToString(mFormula3.value)
lblResults3.Text = v
LogBox3.AppendText(Now.ToLongTimeString() & ": " & v & vbCrLf)
End Sub
You said it supports objects?
Yes, the evaluator supports the .
operator. If you enter the expression theForm.text
then the evaluator will return the title of the form. If you enter the expression theForm.left
, it will return its runtime left position. This feature is only experimental and has not been tested yet. That is why I have put this code here, hoping that others will find its features valuable and submit their improvements.
How does this work?
In fact, the object came free. I used System.Reflection
to evaluate the custom functions. The same code is used to access the object's methods and properties. When the parser encounters an identifier that is a keyword without any meaning to it, it will try to reflect the CurrentObject
to see if it can find a method or a property with the same name.
mi = CurrentObject.GetType().GetMethod(func, _
_Reflection.BindingFlags.IgnoreCase _
Or Reflection.BindingFlags.Public _ Or Reflection.BindingFlags.Instance)
If a method or a property is found, it will feed its parameters.
valueleft = mi.Invoke(CurrentObject, _
_ System.Reflection.BindingFlags.Default, Nothing,
_ DirectCast(parameters.ToArray(GetType(Object)), Object()), Nothing)
Are there any known bugs or requests?
The following are requests/bugs from the original project:
Someone reported that you need the option 'Compare Text' for the evaluator to work properly. I think this is fixed now. If you want the evaluator to be case-sensitive you can ask for it in the evaluator constructor.
Someone also reported that the evaluator did not like having a comma as a decimal point in the windows international settings. This is fixed, too, I believe.
My request: If you find this library useful or interesting, don't forget to vote for me. :-)
Points of interest
Speed Tests: I wish I could have the time to compare various eval methods. If someone wants to help, please contact me. To my knowledge, this is the only formula evaluator available on CodeProject with a separate Tokenizer, Parser and Interpretor. The extensibility is extremely easy due to internal use of System.Reflection
.
History
18th May 2007
- Article edited and posted to the main CodeProject.com article base.
4th May 2006
- Fix a bug introduced in the last version where the functions were not recognized properly.
- Add a few more samples in both C# and VB sample programs using Arrays and default members (Controls.Item).
27th April 2006
- Implements Array
- Starts differencing C# and Vb
20th April 2006
- Try to Improve the article with more pictures.
19rd April 2006
- C# compatibility (a few variables and members were renamed to avoid c# keywords conflicts).
- C# sample
- Move the core evaluator within a DLL
- Allow 'on the fly variable' through a new interface called 'iVariableBag'
13rd April 2006
- New article (as the original was not editable).
- Entirely new parser using OpCodes
- 10th Feb 2005
- Greatly increased the length and detail of the article
7th Feb 2005