Introduction
The aim of this article is to demonstrate an interface-based programming paradigm in the design and implementation of a simple Expression Parser & Evaluator using RPN in C#. The intended audience skill set would be Beginner to Intermediate.
Background
For introduction about Expressions & Reverse Polish Notation, refer to this article - Expression Evaluator: using RPN by lallous, in the C++/MFC section.
Design Description
(All code referred here is inside RPNParser.cs file.)
Interfaces
Every expression is made of operands and operators. These are reflected in the interfaces - IOperand
and IOperator
.
IArithmeticOperations
defines methods for binary arithmetic operators (+, -, *, \, %).
IComparisonOperations
defines methods for all comparison operators (>, >=, <, <=, ==, !=).
ILogicalOperations
defines methods for logical operators (&& , ||).
The idea behind these interfaces is as follows -
Each Operand supports a set of Operations and the Operators evaluate a pair of Operands using the Operations supported by the two Operands.
E.g. A Long Operand supports Arithmetic and Comparison Operations, whereas a String might support only Comparison Operations. Boolean Operands support only the Logical Operations.
An Operator like + needs only to know that it has to call the Plus method on an IOperand
implementing the IArithmeticOperations
interface. This keeps the algorithm of expression parsing & evaluation (in this case the RPN sequencing) separated from the data type that does the actual evaluations and hence makes it very extensible to newer data types.
Operands
Operand
class is an abstract base class implementing the IOperand
interface and additionally providing data storage.
LongOperand
is an Operand
class specific to Long
& Int
(Int64
and Int32
) types of operands and implements the IArithmeticOperations
, IComparisonOperations
interfaces.
BoolOperand
is an Operand
class specific to bool
type of operands and implements only the ILogicalOperations
interface.
OperandHelper
is a static class providing object factory functionality to create appropriate Operand
objects based on the Type passed to it (OperandHelper.CreateOperand
).
For new types of Operands like Decimal/String or user-defined types, e.g. a Matrix, this factory method would have to be extended.
Operators
IOperator
defines one method called Eval
and is used to evaluate the LHS and RHS of the expression.
Operator
is an abstract base class implementing this interface and provides data storage for the operator.
ArithmeticOperator
, ComparisonOperator
, LogicalOperator
are operator classes to support IArithmeticOperations, IComparisonOperations, ILogicalOperations
respectively.
OperatorHelper
is a helper class providing object factory functionality for creation of appropriate Operator
objects (OperatorHelper.CreateOperator
). This class provides services to determine operator-precedence. This is based on the relative indexes of the operators defined in the m_AllOps
variable. It also provides Regular Expressions used during parsing of the input expression string. E.g. The regex [=<>!]{1,2} is used to parse for Comparison operators.
Expression Parsing
The Tokenizer
class provides functionality of parsing the input expression string using regular expressions.
The TokenEnumerator
class supports the IEnumerator
interface and is used to loop through the various tokens in the input expression string. The tokens in the expression are returned to the caller as Token
objects.
The given expression can be parsed as either Arithmetic or Logical or Comparison Expression Types. This is controlled by the enum
s ExpressionType::ET_ARITHMETIC, ExpressionType::ET_COMPARISON
and ExpressionType::ET_LOGICAL
. A combination of these enum
types can also be given.
E.g. To parse the expression as all of these, pass ExpressionType.ET_ARITHMETIC| ExpressionType.ET_COMPARISON|ExpressionType.ET_LOGICAL
to the Tokenizer c'tor.
The generation of the RPN sequence is in the RPNParser::GetPostFixNotation
method and it uses the algorithm as described in the above mentioned C++ article. The return value is an ArrayList
consisting of Operands and Operators in the RPN sequence.
This method takes an argument bFormula
that determines if the expression being parsed is a formula (in the form of x+y*z/t) or has direct values (3+6*7/2). If the expression is not a formula, then the values are extracted by this method and stored in the Token object for later evaluation. If the expression is a formula, then the values for the corresponding variables need to be passed separately as a hashtable for evaluation.
Expression Evaluation
RPNParser::EvaluateRPN
is the method performing actual evaluation of the RPN generated by the RPNParser::GetPostFixNotation
method and is again based on the algorithm described in the above mentioned C++ article. This method takes as input the ArrayList
with the RPN sequence and if the expression is a formula, a hashtable with values for the variables in the expression.
Usage
A typical usage of the parser would be as follows -
RPNParser parser = new RPNParser();
string szExpression = @"((2+3)*2<10 || 1!=1) && 2*2==4";
parser.EvaluateExpression( szExpression,
Type.GetType("System.Int64" ), false, null );
The attached code contains a small form-based application to test the logic.
It uses the RPN_Parser::Convert2String()
method to get a string representation of the RPN sequence.
Extensibility
The Parser can be easily extended to support new Operand types and Operators by subclassing the Operand
and Operator
classes. Since the Parsing and Evaluation logic works totally on interfaces, typically nothing would have to be changed there. Of course, the factory classes need to be extended to support the new types and if adding new operators, the regular expressions used & their precedence information present inside the Operatorhelper
class would have to be modified accordingly.
Hope someone out there finds this useful.
Current Shortcomings
- Unary operators (+,-,!) not handled.
- Does not yet support expressions having mixture of variables and values (e.g. x+y/3*z%2 ).
- Bitwise & and | not handled.
- Does not handle a combo expression with multiple braces as just one of the ExpressionTypes?
// E.g. ((2+3)*2<10 || 1!=1) && 2*2==4 as just a logical expression cannot be done.