Introduction
The Interpreter pattern is used to model a grammar using a set of classes that represent possible expressions within the grammar. For instance, for the algebraic expression 4 + 4, an AddExpression
class can be used to represent the expression. Included in the AddExpression
class would be a reference to the two operands and the capacity to evaluate the expression and return a value based on the two operands.
Without getting too tied down by words, you can see how the possible variety of algebraic expressions can be represented using classes. Additionally, expressions themselves can be used as operands to other expressions based on a defined order of operations.
When it comes to representing variable expressions, things get a bit trickier since the evaluation of a variable expression changes based on the current value of the variable. The Interpreter pattern provides a clean solution to this problem by defining a context class that provides the current value of variables that may appear within the expression being evaluated.
In this article, I will show how the Interpreter pattern can be used to develop a full-featured algebraic expression parser. It is a tractable, non-trivial problem, which makes it a great candidate for demonstrating the Interpreter pattern. Furthermore, instances naturally arise for the utilization of other design patterns such as factory, template method, and composite, which makes it an even better candidate for demonstrating the disciplined use of design patterns in general.
The standard warning regarding design patterns should not be forgotten. They should only be applied where appropriate. It is almost always the case that good solutions to non-trivial problems will not follow an ideal object-oriented design. Trade-offs are often necessary. More often than not, simpler designs are better than complex designs and designs that included a tangle of design patterns are usually the most complex.
General Structure
The structure of the project follows the general structure of a parser in compiler terms. At a high level, a parser transforms a character sequence in a well-defined grammar into an intermediate data structure that can be evaluated at a later point while varying its input. At a lower level, the first step in the parsing process is to transform the character sequence into a token sequence. This step is called lexical analysis. The next step is to transform the token sequence into the intermediate data structure that can be evaluated.
The AXTokenizer
class is responsible for transforming the initial character sequence into a token sequence. If the character sequence is not a valid token sequence, an exception is thrown. If the character sequence is a valid token sequence, the next step is general validation performed by the AXValidator
class.
Once a valid token sequence is obtained, the ExpressionFactory
is used to create a sequence of Expression
objects from the token sequence. The expression sequence is then evaluated by code in the AXParser
class to produce an Expression
object that represents the input algebraic expression by means of a tree of Expression
objects. This expression tree is the intermediate data structure produced by the parser. The ExpressionContext
class serves as the input to the expression tree and is passed down to the leaf expressions for evaluation by means of substitution.
Expression Base Class
All Expression
classes are contained in the Expressions.cs file. Each Expression
class represents a possible expression in our grammar, which in this case is algebra. All Expression
classes are derived from the abstract base class Expression
. I chose an abstract base class over an interface since I wanted to provide some common implementation while at the same time requiring certain functions to be implemented in derived classes.
The Expression
class provides an implementation for the IsBound
, Bind
and Evaluate
methods, and expects the InnerBind
and InnerEvaluate
methods to be defined in derived classes. The InnerBind
method takes arguments and binds them to operands to be evaluated by the InnerEvaluate
method. The IsBound
method indicates whether or not arguments have been bound to the operands. The utility of the IsBound
method will be apparent when it comes to parsing. As for the Evaluate
and InnerEvaluate
methods, both take an ExpressionContext
object as an argument. This is ultimately utilized by VariableExpression
objects, as will be discussed below.
One obstacle is the way the System.Math
library operates. In general, transcendental functions -- functions that cannot be evaluated by algebraic means, like the trigonometric functions -- are approximated by a summation of terms. From calculus, exact values of trigonometric functions can be obtained from corresponding infinite series. Since computers cannot calculate the result of an infinite series, values for trigonometric functions have to be approximated. The problem is, that results in Math.Cos(Math.PI/2.0)
returning some very small non-zero number, which is incorrect.
I choose to use the Template Method pattern. In this case, the template method is InnerEvaluate
. Each Expression
derived class can implement whatever mathematical operation it chooses and return a result without having to worry about the zero issue. The Evaluate
method itself converts all values below the ZERO_THRESHOLD
constant to zero. The Bind
method follows the same pattern with the InnerBind
method serving as the template method. This provides a bit of symmetry and also ensures that the _isBound
member is properly set.
The following is the implementation of the Expression
class.
public abstract class Expression
{
protected readonly double ZERO_THRESHOLD = 1.0 * Math.Pow(10.0, -14.0);
protected bool _isBound;
public Expression()
{
_isBound = false;
}
public bool IsBound()
{
return _isBound;
}
public void Bind(params object[] arguments)
{
InnerBind(arguments);
_isBound = true;
}
public double Evaluate(ExpressionContext context)
{
double result = InnerEvaluate(context);
if (Math.Abs(result) <= ZERO_THRESHOLD)
return 0;
else
return result;
}
protected abstract void InnerBind(params object[] arguments);
protected abstract double InnerEvaluate(ExpressionContext context);
}
Here is where things are not quite perfect. A professional algebraic expression parser would have to use a better math library that returns known zeros from trigonometric functions. That could be done in the individual expression classes too, but that makes the job unnecessarily difficult.
Additionally, the design of the Expression
classes can be questioned as well. The Decorator pattern is a good pattern for wrapping a class in another class that implements the same interface, but simply delegates to the inner class and then adds its own bit of behavior to the result. In this case, the Decorator pattern cannot be used because it hides the actual type of the class it decorates. Typically, OOD sources warn against writing code tied to the types of classes. In this case, however, utilization of ExpressionFactory
would not be as clean as it is. Here is an area where decisions are not black and white, and trade-offs are being made.
Expression Derived Classes
Among the Expression
derived classes are ConstantExpression
, NumericExpression
, BinaryExpression
and FunctionExpression
. Each of these classes implements the InnerBind
method that takes a variable number of arguments and binds them to internally maintained operands. When the Bind
method is called, the InnerBind
method is called in turn and the Expression
is considered to be bound. When the Evaluate
method is called, the InnerEvaluate
method is called in turn, at which point the bound operands are operated upon to produce a result.
For instance, the following is the definition of the AddExpression
class, which derives from BinaryExpression
.
public sealed class AddExpression : BinaryExpression
{
protected override double InnerEvaluate(ExpressionContext context)
{
return _operand1.Evaluate(context) + _operand2.Evaluate(context);
}
}
As another example, the following is the definition of the CosExpression
class, which derives from FunctionExpression
.
public sealed class CosExpression : FunctionExpression
{
protected override double InnerEvaluate(ExpressionContext context)
{
return Math.Cos(_operand.Evaluate(context));
}
}
ConstantExpression
objects are always considered bound and their InnerEvaluate
method always returns the same value. NumericExpression
objects are similarly simple in that only one operand is bound and is the value returned from the InnerEvaluate
method.
VariableExpression and ExpressionContext
For the most part, the Expression
derived classes are straightforward, with the exception of VariableExpression
, which is the only expression that utilizes the ExpressionContext
argument to the Evaluate
method rather than simply passing it on to the Evaluate
methods of the operands.
The ExpressionContext
class maintains a dictionary that associates a double value with a Variable
enumeration value. Although ExpressionContext
does not derive from Expression
, it has a Bind
method of its own to associate values with variables. The value associated with a variable can be obtained by performing a look-up on an ExpressionContext
object. Following is the code for the Lookup
method.
public double Lookup(Variable variable)
{
if (_bindings.ContainsKey(variable))
return _bindings[variable];
else
return double.NaN;
}
When the Evaluate
method is called on a VariableExpression
object, a look-up is performed on the ExpressionContext
argument and the value is subsequently returned. Following is the definition of the VariableExpression
class.
public sealed class VariableExpression : Expression
{
private Variable _variable;
protected override void InnerBind(params object[] arguments)
{
_variable = (Variable)arguments[0];
}
protected override double InnerEvaluate(ExpressionContext context)
{
return context.Lookup(_variable);
}
}
Tokenization
Having introduced the Expression
classes, it's time to look at exactly how a character sequence is translated into an expression sequence. The first step in this process, as mentioned above, is tokenization. The AXTokenizer
class is responsible for tokenizing the character sequence. All in all, the AXTokenizer
is such a simple class that it seems wrong to refer to this step as lexical analysis.
Tokens are defined as sequences of characters that match a given regular expression pattern. The AXTokenizer
class has two methods for adding and removing patterns, AddPattern
and RemovePattern
, respectively. It has an additional method for performing the actual tokenization based on the current set of patterns, Tokenize
. Tokenize
takes a string and returns a list of strings representing the sequence of tokens in the original string.
I wanted this step to be completely deterministic, regardless of caveats that may occur. For instance, distinguishing the negation operation on a number from a subtract operation. With this in mind, the Tokenize
method favors matching longer strings to patterns first and iteratively attempts to match smaller and smaller substrings to patterns. I saw no need to focus on performance at this stage, since Tokenize
is not a method to be called repeatedly.
The following is the implementation of the Tokenize
method.
public List<string> Tokenize(string text)
{
List<string> tokens = new List<string>();
for (int i = 0; i < text.Length; i++)
{
bool matched = false;
for (int j = text.Length - i; j > 0 && !matched; j--)
{
foreach(KeyValuePair<string,Regex> pair in _patterns)
{
if (pair.Value.IsMatch(text.Substring(i, j)))
{
matched = true;
tokens.Add(text.Substring(i, j));
i += j - 1;
break;
}
}
}
if (!matched)
{
throw new AXTokenException(i,
"Unrecognized character sequence starting at index " +
i.ToString() + ".");
}
}
return tokens;
}
The ExpressionFactory
Once a token sequence is obtained from the AXTokenizer.Tokenize
method, it needs to be converted to a sequence of corresponding Expression
derived class objects. This is accomplished with the help of the ExpressionFactory
class.
Just as the AXTokenizer
class has add and remove methods for patterns, the ExpressionFactory
class has methods for adding and removing associations between patterns and Expression
derived class types, AddAssociation
and RemoveAssociation
, respectively. Additionally, it defines a Create
method that creates an instance of an Expression
derived class type based on a token argument.
By iterating over the list of tokens returned from the AXTokenizer
and calling the Create
method on an appropriately initialized ExpressionFactory
object, an expression sequence can be obtained. Technically, ExpressionFactory
is an instance of the Simple Factory pattern. Following is the code for the Create
method.
public Expression Create(string token)
{
if (!string.IsNullOrEmpty(token))
{
foreach (KeyValuePair<string, KeyValuePair<Regex,Type>> pair in _associations)
{
if (pair.Value.Key.IsMatch(token))
{
ConstructorInfo info = pair.Value.Value.GetConstructor(Type.EmptyTypes);
return (Expression)info.Invoke(null);
}
}
}
return null;
}
Parsing
You may have noticed that the AXTokenizer
, ExpressionFactory
and AXValidator
classes are all marked internal. So, how are their facilities used? The AXParser
class ties it all together.
AXParser
has an AXTokenizer
member and an ExpressionFactory
member. Upon construction, these members are initialized to support the most common algebraic expressions. The AddAssociation
method allows a client to extend this functionality by adding new associations for ConstantExpression
and FunctionExpression
types.
Besides the AddAssociation
method, AXParser
has one other public method, Parse
. Parse
takes a string and returns an Expression
object. This Expression
object is the expression tree, representing the fully parsed algebraic expression passed as a parameter. The algebraic expression can now be evaluated for any assignment to the independent variables. This is done by constructing an ExpressionContext
object, calling the ExpressionContext.Bind
method for each independent variable, and passing the object as a parameter to the Evaluate
method of the returned Expression
object.
The first thing the Parse
method does is remove spaces from the input string. After that, it uses the AXTokenizer.Tokenize
method to convert the string into a token sequence. It subsequently converts the token sequence to an expression sequence as described above. Once an expression sequence is obtained, AXValidator
is used to validate it. At this point, the expression sequence is ready to be built up into a single expression tree.
Building of the expression tree takes place in a single while loop that iterates until there is only one Expression
in the list. Before entering the loop and after each iteration of the loop, excess parentheses are removed by calls to the RemoveExcessParens
method. During each iteration of the loop, the index of the highest precedence operation is determined by a call to the DetermineHighestPrecedence
method and then the index is passed as an argument to the CollapseExpression
method.
For a high-level view of the parsing process, following is the Parse
method definition.
public Expression Parse(string text)
{
string copy = text.Replace(" ", string.Empty).Trim();
List<string> tokens = _tokenizer.Tokenize(copy);
List<Expression> expressions = TokensToExpressions(tokens);
AXValidator.Validate(expressions);
RemoveExcessParens(expressions);
while (expressions.Count > 1)
{
int i = DetermineHighestPrecedence(expressions);
CollapseExpression(expressions, i);
RemoveExcessParens(expressions);
}
return expressions[0];
}
Earlier I mentioned the Expression.IsBound
function. It can now be understood why this function is necessary by looking at the implementation for the CollapseExpression
method below. First, once an expression is bound, you can make a valid call to its Evaluate
method. With that in mind, a bound expression becomes the equivalent of a ConstantExpression
object and should be considered as such in subsequent parsing. For instance, notice the call to the IsBound
method in the if statement.
private void CollapseExpression(List<Expression> expressions, int i)
{
Expression current = expressions[i];
Expression previous = new NullExpression();
Expression next = new NullExpression();
if (i - 1 >= 0)
previous = expressions[i - 1];
if (i + 1 < expressions.Count)
next = expressions[i + 1];
if (current is SubtractExpression && !previous.IsBound() && !(
previous is RightParenExpression))
{
SubtractExpression expression = (SubtractExpression)current;
NumericExpression zero = new NumericExpression();
zero.Bind(0.0);
expression.Bind(zero, next);
expressions.RemoveAt(i + 1);
}
else if (current is FunctionExpression)
{
FunctionExpression expression = (FunctionExpression)current;
expression.Bind(next);
expressions.RemoveAt(i + 1);
}
else if (current is BinaryExpression)
{
BinaryExpression expression = (BinaryExpression)current;
expression.Bind(previous, next);
expressions.RemoveAt(i + 1);
expressions.RemoveAt(i - 1);
}
}
The same considerations for the IsBound
method factor into the implementation for the DetermineHighestPrecedence
method. An Expression
object that has been bound is effectively treated as a ConstantExpression
.
Another important point to recognize is how the DetermineHighestPrecedence
and CollapseExpression
methods handle the case for negation. Negation is simply subtraction from zero. This decision follows the principle of Occam’s Razor, "Entities should not be multiplied beyond necessity." My belief is that what I have presented here is as close to optimally clean as possible.
How to Use
At the top of the article I have included a link to the project files. The project compiles into a class library with the name AXLibrary.dll. I have not included a demo project, since it is my hope to use the AXLibrary library in a WPF 3-D graphing calculator article for The Code Project. Additionally, using the library is very simple. Following is an example of how to use the library.
AXParser parser = new AXParser();
Expression expression = parser.Parse("cos(PI/4)*(5+x)");
ExpressionContext context = new ExpressionContext();
context.Bind(Variable.x, 4);
MessageBox.Show(expression.Evaluate(context).ToString());
The project is compiled in Visual Studio 2008 against the .NET framework version 3.5. However, the project can easily be converted to any other .NET framework version, since it uses vanilla features of the C# language. Unfortunately, you will have to create a container project first and copy the files over manually, as there is no tool for converting to an older version that I know of.
History
- 5 March, 2008 -- Original version posted