Introduction
I remember the good old college days, in particular the first numerical algorithm class of mine. The first assignment of that class
was to create a mathematical calculator which took in expressions to evaluate. The application would then be used for other assignments during the semester
(note this was the first assignment of the semester on the first day of class) in which we would substitute common mathematical functions for algorithms we wrote.
With only taking one programming class so far in college, the assignment felt daunting. The first question that came to mind was, "If this is a math class,
why can't we just write the numerical algorithms in MatLab and just show its simple usages there". I felt the
task of creating a calculator just like my precious TI 92 to be totally overwhelming.
In retrospect, if I had taken the time to define requirements for the assignment, I would have realized the assignment was pretty trivial. The amount of work to be done
would not have been scary, and the code produced for the assignment would not have been as ugly and full of bugs, as it ended up being.
The issue I faced when I was given that assignment was that I had no base point at which
to begin thinking of a solution (yes, I remember BODMAS (bracket, off, division, multiplication, addition, and subtraction)). I however didn't know how to structure
the requirements needed into a suitable form or grammar, so that I could focus only on fulfilling the rules created for the grammar.
I was more bothered about functionality and how to parse every corner case I could think of.
A few years later, while reading the book "Algorithms and Data Structures in C++ by Leendert Ammeraal", specifically Chapter 11 (which discusses
about the fundamentals of interpreters and compilers), I decided to revisit that dreadful assignment of mine.
Background
There are two concepts I used in creating the mathematical expression evaluator detailed in this article. One is the use of BNF to come
up with a grammar suitable for the creation of the mathematical expression and the second is recursive decent parsing which I used to translate the grammar to code.
I will not claim to be a grammar expert, as such the rules I defined can be refined as the reader wishes (and any flaws I make can be forwarded to me and I will fix it).
I sort of followed the rules from the Wikipedia page on EBNF http://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_Form.
The reader should however note that the implementation in this article matches my grammar rules.
Below is the grammar implemented. I will presume the grammar is pretty straightforward, as such I will only mention a few assumptions I made within the grammar.
Firstly, I presume that Variables do not contain Digits as part of the name (Variables can be 1 or more Alphabetical letters).
I presume that the Unary operator "-" (e.g., -1 or -ab) is a factor defined by ( "(-Expression", e.g., (- 1) ,
((-a) + b), or (-ab)). I presume that a Function is part of Term and it can be decomposed back to a factor. The only other assumption
I believe worth mentioning is the factorial "!" operator, that is made optional for Terms. I had a choice of either moving this
to Factor or Term, my final decision was to place it within the definition of Term.
INPUT = EXPRESSION
EXPRESSION = TERM { "+" | "-" TERM }
TERM = (FUNCTION | FACTOR) { "*" | "/" | "%" | "^" FUNCTION | FACTOR } { "!" }
FUNCTION = ( "COS"| "SIN" | "TAN" | "EXP" |"LN" | "LOG" "(" FACTOR ")" )
FACTOR = NUMBER|VARIABLE| ( "(" ["-"] EXPRESSION ")" )
VARIABLE = ALPHABHET,{ALPHABHET}
NUMBER= DIGIT,{["."] DIGIT}
DIGIT = 0|1|..|9
ALPHABHET = a|b|..|z|A|B|..|Z
The technique used to transform the grammar to code is called recursive descent parsing (RDP), which is a form of top down parsing
built from a set of functions that are written in terms of each other. With recursion in mind, I created a function for all Non Terminal
identifiers that focuses on only aspects of their defined grammar.
Below are the main methods used to represent the grammar.
ISymbol Expression()
Symbol Term()
Symbol FunctionEvaluator()
Symbol Factor()
Utilizing the code discussed underneath, I created an example mathematical expression parser. As shown by the image below:
Using the Code
The first thing that came to mind while thinking of how to implement the expression parser was how I wanted the parser to be used.
I figured that the user of the Evaluator
class would want to be able to set and get the Expression
to parse,
the user would want to get the floating point Result
if the Expression could be evaluated. If it can not be evaluated (that is if it contains
undefined variables), the user would want to get a processed string result. With that in mind, I created the IExpression
interface. I then created
the ISymbol
interface for the intermediate types of information I would need as I processed each symbol. For numbers, the FloatingValue
variable is used;
for variables, operations, and functions StringValue
is used. To differentiate between all variables, operations, functions, and parenthesis,
I created the SymbolType
enumeration (in the Expression calculator example
attached, the symbol type helped with implementing situations such as "-1", "1--1", "1*-1" ).
public interface IExpression
{
string Expression { get; set; }
double Result { get; set; }
bool CanEvaluate { get; set; }
string StrResult { get; set; }
}
public enum SymbolType {expression, operation, function, variable, digit,brackets,endTag}
public interface ISymbol:IExpression
{
SymbolType SymbolType { get; set; }
double FloatingValue { get; set; }
string StringValue { get; set; }
}
The Symbol
class implements the ISymbol
interface and includes the method TypedClone()
which implements a strongly typed member-wise clone of the
Symbol
object itself. The clone is needed to maintain the original expression passed in. The extension method in the static class ExtensionLib
adds a means to clone an IExpression
type to a Symbol
type.
public class Symbol:ISymbol
{
public string Expression { get; set; }
public double Result { get; set; }
public bool CanEvaluate { get; set; }
public string StrResul { get; set; }
public SymbolType SymbolType { get; set; }
public double FloatingValue { get; set; }
public string StringValue { get; set; }
public Symbol TypedClone()
{
return this.MemberwiseClone() as Symbol;
}
}
public static class ExtensionLib
{
public static Symbol TypedClone(this IExpression v)
{
return ((Symbol) v).TypedClone();
}
}
The field _currentPositionCnt
maintains the position in the Expression
string indicating where to continue parsing from.
_processVariableExpression
is the flag that allows a check to be made to see if a Variable
can be mapped to a value. The _lastReadSymbol
holds
the last Symbol
parsed that is yet to be consumed. The _expression
object holds a copy of the original object that can then be modified.
The _originalExpression
object holds the original object with no modifications made to it. _listOfFunctions
holds the list of supported functions.
_listOfBinaryOperators
holds the operators that take the two parameters in performing calculations. _listOfUnaryOperators
holds the operators
that act on a single parameter (or Expression
) and the VariableMapper
dictionary holds a mapping of Variables to Values.
private int _currentPositionCnt;
private bool _processVariableExpression;
private Symbol _lastReadSymbol;
private IExpression _expression;
private IExpression _originalExpression;
private readonly List<string> _listOfFunctions = new List<string>();
private readonly List<string> _listOfBinaryOperators = new List<string>();
private readonly List<string> _listOfUnaryOperators = new List<string>();
public readonly Dictionary<string, double> VariableMapper { get; private set; }
The constructor initializes the _listOfFunctions
, _listOfBinaryOperators
, _listOfUnaryOperators
list
and VariableMapper
dictionary with their respective values (functions, operators, and a dictionary of variables to values).
public Evaluator()
{
_listOfFunctions.AddRange(new [] { "COS", "SIN", "TAN", "EXP", "LN", "LOG" });
_listOfBinaryOperators.AddRange(new [] { "*", "/", "^", "%", "+", "-" });
_listOfUnaryOperators.Add("!");
VariableMapper = new Dictionary<string, double> {{"PI", Math.PI}};
}
The Resolve()
method takes in an object which adheres to the IExpression
interface, and a boolean argument which tells us we should evaluate mappable variables.
The method trims out the space in between characters, it also factors expressions with variables to make for a compact calculation the next go around when we actually match variables
with values. The latter portion can be removed and the Factor()
method updated to always just check if a Variable
can be mapped to a value.
public IExpression Resolve(IExpression exp, bool resolveExp=false)
{
_currentPositionCnt = 0;
_originalExpression = exp;
_expression = exp.TypedClone();
_expression.Expression = _expression.Expression.Replace(" ", "");
_expression.Expression += "#";
var result = Expression();
if (!NextIs("#"))
throw new Exception
("An error occurred while procession the expression [ "
+ _originalExpression.Expression + " ] At position "
+ _currentPositionCnt.ToString());
if (resolveExp && !result.CanEvaluate && VariableMapper.Count > 0)
{
_processVariableExpression = true;
_lastSymbolType = SymbolType.expression;
_currentPositionCnt = 0;
_lastReadSymbol = null;
_expression.Expression = result.StrResult.Replace(" ", "");
result = Expression();
}
return result;
}
The ResolveAsString()
method takes in an object which adheres to the IExpression
interface, and a boolean argument which tells us we should evaluate
mappable variables. The method returns to the caller a string result of either a floating point value or a string expression.
public string ResolveAsString(IExpression exp, bool resolveExp=false)
{
var result = Resolve(exp, resolveExp);
return result.CanEvaluate ? result.Result.ToString() : result.StrResult;
}
The Expression()
method calls the Term()
method. The resultant from the call can either be a value that can
be evaluated, a Variable, or an Expression. A check is then made to see if the next following Symbol
retrieved
via the NextIs()
method call is an operator such as "+
", "-
",
in which case we call Term()
again to get a second parameter needed for evaluation. Else we return with the X Symbol
.
private ISymbol Expression()
{
var x = Term();
if (x == null) return null;
for (; ; )
{
if(NextIs("+"))
{
var y = Term();
if (y.CanEvaluate && x.CanEvaluate)
x.Result = x.FloatingValue = x.FloatingValue + y.FloatingValue;
else
{
x.StringValue =
"("+(x.CanEvaluate ? x.FloatingValue.ToString()
: x.StringValue) +
" + " + (y.CanEvaluate ? y.FloatingValue.ToString()
: y.StringValue) + ")";
x.CanEvaluate = false;
}
}
else if (NextIs("-"))
{
var y = Term();
if (y.CanEvaluate && x.CanEvaluate)
x.Result = x.FloatingValue = x.FloatingValue - y.FloatingValue;
else
{
x.StringValue = "("+(x.CanEvaluate ? x.FloatingValue.ToString()
: x.StringValue) +
" - "+ (y.CanEvaluate ? y.FloatingValue.ToString()
: y.StringValue) + ")";
x.CanEvaluate = false;
}
}
else
{
break;
}
}
x.Result = x.FloatingValue;
x.StrResult = x.StringValue;
return x;
}
The Term()
method calls the Factor()
method. The resultant can either be a value that can be evaluated, a variable, or an expression.
A check is then made to see if the next following Symbol
retrieved via the NextIs()
method call is an operator such as "*
",
"/
", "%
", "^
", "!
" or a Function, else we return with the X symbol.
If it is one of the above operators, we call the Factor()
method again to get the next Symbol
(which again can be a value, a variable, or an expression).
Note the factorial case does not call the Factor()
method again. Also worthy of pointing out is the fact that operator
precedence are all equal within this method. To raise precedence of any operation, parenthesis would need to be used, to encompass that portion of the expression.
private Symbol Term()
{
var x = Factor();
if (x == null) return null;
for (; ; )
{
if (NextIs("*"))
{
var y = Factor();
if (y.CanEvaluate && x.CanEvaluate)
x.Result = x.FloatingValue = x.FloatingValue * y.FloatingValue;
else
{
x.StringValue = "("+(x.CanEvaluate? x.FloatingValue.ToString()
: x.StringValue) +
" * " + (y.CanEvaluate ? y.FloatingValue.ToString()
: y.StringValue) + ")";
x.CanEvaluate = false;
}
}
else if (NextIs("/"))
{
var y = Factor();
if (y.CanEvaluate && x.CanEvaluate)
x.Result = x.FloatingValue = x.FloatingValue / y.FloatingValue;
else
{
x.StringValue = "("+(x.CanEvaluate ? x.FloatingValue.ToString()
: x.StringValue) +
" / " + (y.CanEvaluate ? y.FloatingValue.ToString()
: y.StringValue) + ")";
x.CanEvaluate = false;
}
}
else if (NextIs("%"))
{
var y = Factor();
if (y.CanEvaluate && x.CanEvaluate)
{
x.Result = x.FloatingValue = x.FloatingValue % y.FloatingValue;
}
else
{
x.StringValue = "("+(x.CanEvaluate ? x.FloatingValue.ToString()
: x.StringValue) +
" % " + (y.CanEvaluate ? y.FloatingValue.ToString()
: y.StringValue) + ")";
x.CanEvaluate = false;
}
}
else if (NextIs("^"))
{
var y = Factor();
if (y.CanEvaluate && x.CanEvaluate)
{
x.Result = x.FloatingValue = Math.Pow(x.FloatingValue , y.FloatingValue);
}
else
{
x.StringValue = "( (" + (x.CanEvaluate ? x.FloatingValue.ToString()
: x.StringValue) +
")^ " + (y.CanEvaluate ? y.FloatingValue.ToString()
: y.StringValue) + ")";
x.CanEvaluate = false;
}
}
else if (NextIs("!"))
{
if (x.CanEvaluate)
{
x.Result = x.FloatingValue = Factorial(x.FloatingValue);
}
else
{
x.CanEvaluate = false;
x.StringValue = "(" + (x.CanEvaluate ? x.FloatingValue.ToString()
: x.StringValue) +")! ";
}
}
else if (_listOfFunctions.Contains(x.StringValue))
{
x = FunctionEvaluator(x.StringValue);
}
else
{
break;
}
}
return x;
}
The Factor()
method calls the Next()
method to get the next Symbol
. If the Symbol
can be evaluated, it is immediately
returned else we check if the Symbol
is a parenthesis. If the Symbol
is a parenthesis, we go on to call the expression again.
Note the PreviewNextifTrueMoveTrackerByOne()
method. This is used to handle the case where a negative sign follows the opening parenthesis.
If we can process Variables, we look up the variable names in the VariableMapper
dictionary and get the corresponding value. Note: one of the things
I would have loved to have implemented here is, the ability for Variables to point to Variables.
private Symbol Factor()
{
var x = Next();
_lastReadSymbol = null;
if (x != null && x.CanEvaluate)
return x;
if (x != null)
{
if (x.StringValue == "(")
{
bool isunaryMinus = PreviewNextifTrueMoveTrackerByOne("-");
x = Expression() as Symbol;
if (x == null || !NextIs(")"))
{
throw new
Exception("Expression needs to end with a parenthesis "
+ "[ " + _originalExpression.Expression
+ " ] Error at position "
+ _currentPositionCnt.ToString());
}
if (isunaryMinus)
{
if (x.CanEvaluate)
x.FloatingValue *= -1;
else
x.StringValue = "((-1) * " + x.StringValue+")";
}
}
if(_processVariableExpression && x.SymbolType == SymbolType.variable)
{
if (VariableMapper.ContainsKey(x.StringValue))
{
x.FloatingValue = VariableMapper[x.StringValue];
x.StringValue = string.Empty;
x.CanEvaluate = true;
}
}
}
_lastReadSymbol = null;
return x;
}
The FunctionEvaluator()
method calls the Factor()
method to get the expected value, variable, or expression that is the argument passed into
the function. If the Symbol
returned by the Factor()
method can be evaluated, the appropriate Math function is called, else it is presumed
that the argument composition includes Variables and a string combination of the supposed Math function and the argument is composed.
public Symbol FunctionEvaluator(string value)
{
Symbol x;
if (NextIs("("))
{
_lastReadSymbol = new Symbol
{
SymbolType = SymbolType.brackets,
StringValue = "("
};
x = Factor();
}
else
{
throw new Exception(
"Function must be imediately followed by parenthesis. Error : ["
+ _originalExpression.Expression
+ " ] At position "
+ _currentPositionCnt.ToString());
}
if (x != null && x.CanEvaluate)
{
switch(value.ToLower())
{
case "sin":
{
x.FloatingValue = Math.Sin(Math.PI/180 * x.FloatingValue);
break;
}
case "cos":
{
x.FloatingValue = Math.Cos(Math.PI / 180 * x.FloatingValue);
break;
}
case "tan":
{
x.FloatingValue = Math.Tan(Math.PI / 180 * x.FloatingValue);
break;
}
case "exp":
{
x.FloatingValue = Math.Exp(x.FloatingValue);
break;
}
case "ln":
{
x.FloatingValue = Math.Log(x.FloatingValue, Math.E);
break;
}
case "log":
{
x.FloatingValue = Math.Log10(x.FloatingValue);
break;
}
}
}
else if (x != null && !x.CanEvaluate)
{
x.StringValue = "( " + value + "( " + x.StringValue + " ))";
}
return x;
}
The Next()
method is what handles the parsing of the string characters to symbols. It deals with the separation of string letters to either
Functions or Variables. Note: you will see that the restriction of Variables to only string characters is enforced by how
we parse for Variables and Integers. The beginning If
statement is put in place so as to avoid advancing to the next
Symbol
if the previous set Symbol
has yet to be claimed.
private Symbol Next()
{
if (_lastReadSymbol != null)
{
var sym = _lastReadSymbol;
_lastReadSymbol = null;
return sym;
}
if (_expression.Expression != null &&
_expression.Expression.Length > _currentPositionCnt)
{
var length = _expression.Expression.Length;
if (IsNumber())
{
var val = _expression
.Expression[_currentPositionCnt]
.ToString();
_currentPositionCnt++;
while (_currentPositionCnt < length && IsNumber())
{
val += _expression.Expression[_currentPositionCnt++];
}
_lastReadSymbol =
new Symbol
{
SymbolType = SymbolType.digit,
FloatingValue = double.Parse(val),
CanEvaluate = true
};
return _lastReadSymbol;
}
if (IsString())
{
var val = _expression
.Expression[_currentPositionCnt]
.ToString();
_currentPositionCnt++;
while (_currentPositionCnt < length && IsString())
{
val += _expression.Expression[_currentPositionCnt++];
}
if (_listOfFunctions.Contains(val))
{
_lastReadSymbol =
new Symbol
{
SymbolType = SymbolType.function,
StringValue = val
};
return _lastReadSymbol;
}
_lastReadSymbol =
new Symbol
{
SymbolType = SymbolType.variable,
StringValue = val ,
CanEvaluate = false
};
return _lastReadSymbol;
}
if ((_expression.Expression[_currentPositionCnt] == '.'
&& (_expression.Expression[_currentPositionCnt + 1] < '0'
&& _expression.Expression[_currentPositionCnt + 1] > '9')))
{
throw new
Exception("Expression can not have a decimal point"
+" without an accompanying digits."
+" Expression [ "
+ _originalExpression.Expression
+ " ] At position "
+ _currentPositionCnt.ToString());
}
if (_expression.Expression[_currentPositionCnt] == '(' ||
_expression.Expression[_currentPositionCnt] == ')')
{
_lastReadSymbol = new Symbol
{
SymbolType = SymbolType.brackets,
StringValue = _expression
.Expression[_currentPositionCnt++]
.ToString()
};
return _lastReadSymbol;
}
if (_listOfBinaryOperators.Contains(_expression
.Expression[_currentPositionCnt]
.ToString()))
{
_lastReadSymbol =
new Symbol
{
SymbolType = SymbolType.operation,
StringValue = _expression
.Expression[_currentPositionCnt++]
.ToString()
};
return _lastReadSymbol;
}
if (_listOfUnaryOperators.Contains(_expression
.Expression[_currentPositionCnt]
.ToString()))
{
_lastReadSymbol =
new Symbol
{
SymbolType = SymbolType.operation,
StringValue = _expression
.Expression[_currentPositionCnt++]
.ToString()
};
return _lastReadSymbol;
}
if (_expression.Expression[_currentPositionCnt] == '#')
{
_lastReadSymbol =
new Symbol
{
SymbolType = SymbolType.endTag,
StringValue = _expression
.Expression[_currentPositionCnt++]
.ToString()
};
return _lastReadSymbol;
}
}
return null;
}
The Factorial()
method was written to allow me to add factorial as a function within the expression evaluator.
private double Factorial(double number)
{
var val = (long)number;
double retVal=1;
for (long i =val; i>0; i--)
{
retVal *= i;
}
return retVal;
}
The NextIs()
method picks the next Symbol
by calling Next()
and if its value (FloatingValue
for Symbol
s that can be evaluated
and StringValue
for Symbol
s that can not be evaluated) matches the argument passed in, _lastReadSymbol
is set to null, to allow advancement
of picking a new Symbol
via the Next()
method.
private bool NextIs(string val)
{
var t = Next();
if (t != null)
{
if (t.CanEvaluate)
if (t.FloatingValue.ToString() == val)
{
_lastReadSymbol = null;
return true;
}
else
{
_lastReadSymbol = t;
}
else
if (t.StringValue == val)
{
_lastReadSymbol = null;
return true;
}
else
_lastReadSymbol = t;
}
return false;
}
The preview next and advance tracker by one method was created to handle the case where a negative sign follows the start of a
parenthesis "(-" (it is used in the Factor()
function).
private bool PreviewNextifTrueMoveTrackerByOne(string val)
{
if (_expression.Expression != null
&& _expression.Expression.Length > _currentPositionCnt)
{
if (_expression.Expression[_currentPositionCnt].ToString() ==val)
{
_currentPositionCnt++;
return true;
}
}
return false;
}
The IsNumber()
method is used within the Next()
method to check that the current character is a number and not a
letter. The check done works for floating point numbers, by checking that if a dot occurs, the next character would have to be a number.
private bool IsNumber()
{
return ((_expression.Expression[_currentPositionCnt] >= '0'
&& _expression.Expression[_currentPositionCnt] <= '9') ||
(_expression.Expression[_currentPositionCnt] == '.' &&
(_expression.Expression[_currentPositionCnt + 1] >= '0'
&& _expression.Expression[_currentPositionCnt + 1] <= '9')));
}
The IsString()
method is used within the Next()
method to check that the current character is a letter and not a number.
private bool IsString()
{
return ((_expression.Expression[_currentPositionCnt] >= 'A'
&& _expression.Expression[_currentPositionCnt] <= 'Z') ||
(_expression.Expression[_currentPositionCnt] >= 'a'
&& _expression.Expression[_currentPositionCnt] <= 'z'));
}
Points of Interest
The example attached is a mathematical expression calculator, it utilizes a variant of the code above to implement a functioning calculator. It allows the user to write
mathematical expressions that can contain variables, and factors these expressions in term of variables. It also allows for mapping of variables to values so as to enable
a complete evaluation of the expression. It allows for the storing of expressions and the recalling of expressions (this functionality goes in a round robin form).
The Define button allows you to define variable mapping.
Some of the fun things that can be done to enhance the example code given is:
- Implement a differential calculator just like the one in the following
CodeProject work: http://www.codeproject.com/Articles/13731/Symbolic-Differentiation
- Change the implementation to use postfix.
- Implement graphing functionality (probably my favorite because it means you will have to handle the variable within variables case).
History
No revisions were made yet.