Mathematical Expression Parser Using Recursive Descent Parsing

Ty Anibaba

4.78/5 (45 votes)

6 Apr 2014CPOL10 min read

96.1K

1.2K

Mathematical Expression Parser Using Recursive Descent Parsing

Download ExpressionCalculator.zip - 211.4 KB

Introduction

I remember the good old college days, in particular the first numerical algorithm class of mine. The first assignment of that class was to create a mathematical calculator which took in expressions to evaluate. The application would then be used for other assignments during the semester (note this was the first assignment of the semester on the first day of class) in which we would substitute common mathematical functions for algorithms we wrote. With only taking one programming class so far in college, the assignment felt daunting. The first question that came to mind was, "If this is a math class, why can't we just write the numerical algorithms in MatLab and just show its simple usages there". I felt the task of creating a calculator just like my precious TI 92 to be totally overwhelming.

In retrospect, if I had taken the time to define requirements for the assignment, I would have realized the assignment was pretty trivial. The amount of work to be done would not have been scary, and the code produced for the assignment would not have been as ugly and full of bugs, as it ended up being.

The issue I faced when I was given that assignment was that I had no base point at which to begin thinking of a solution (yes, I remember BODMAS (bracket, off, division, multiplication, addition, and subtraction)). I however didn't know how to structure the requirements needed into a suitable form or grammar, so that I could focus only on fulfilling the rules created for the grammar. I was more bothered about functionality and how to parse every corner case I could think of.

A few years later, while reading the book "Algorithms and Data Structures in C++ by Leendert Ammeraal", specifically Chapter 11 (which discusses about the fundamentals of interpreters and compilers), I decided to revisit that dreadful assignment of mine.

Background

There are two concepts I used in creating the mathematical expression evaluator detailed in this article. One is the use of BNF to come up with a grammar suitable for the creation of the mathematical expression and the second is recursive decent parsing which I used to translate the grammar to code.

I will not claim to be a grammar expert, as such the rules I defined can be refined as the reader wishes (and any flaws I make can be forwarded to me and I will fix it). I sort of followed the rules from the Wikipedia page on EBNF http://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_Form. The reader should however note that the implementation in this article matches my grammar rules.

Below is the grammar implemented. I will presume the grammar is pretty straightforward, as such I will only mention a few assumptions I made within the grammar.

Firstly, I presume that Variables do not contain Digits as part of the name (Variables can be 1 or more Alphabetical letters). I presume that the Unary operator "-" (e.g., -1 or -ab) is a factor defined by ( "(-Expression", e.g., (- 1) , ((-a) + b), or (-ab)). I presume that a Function is part of Term and it can be decomposed back to a factor. The only other assumption I believe worth mentioning is the factorial "!" operator, that is made optional for Terms. I had a choice of either moving this to Factor or Term, my final decision was to place it within the definition of Term.

INPUT = EXPRESSION
EXPRESSION = TERM { "+" | "-" TERM }
TERM = (FUNCTION | FACTOR) { "*" | "/" | "%" | "^"  FUNCTION | FACTOR } { "!" }
FUNCTION = ( "COS"| "SIN" | "TAN" | "EXP" |"LN"  | "LOG"  "("  FACTOR  ")"  )
FACTOR = NUMBER|VARIABLE| ( "(" ["-"] EXPRESSION ")" )
VARIABLE = ALPHABHET,{ALPHABHET}
NUMBER= DIGIT,{["."] DIGIT}
DIGIT = 0|1|..|9
ALPHABHET = a|b|..|z|A|B|..|Z

The technique used to transform the grammar to code is called recursive descent parsing (RDP), which is a form of top down parsing built from a set of functions that are written in terms of each other. With recursion in mind, I created a function for all Non Terminal identifiers that focuses on only aspects of their defined grammar.

Below are the main methods used to represent the grammar.

ISymbol Expression() 
Symbol Term()
Symbol FunctionEvaluator()
Symbol Factor()

Utilizing the code discussed underneath, I created an example mathematical expression parser. As shown by the image below:

318667/ExpressionCalculator3.png

Using the Code

The first thing that came to mind while thinking of how to implement the expression parser was how I wanted the parser to be used. I figured that the user of the Evaluator class would want to be able to set and get the Expression to parse, the user would want to get the floating point Result if the Expression could be evaluated. If it can not be evaluated (that is if it contains undefined variables), the user would want to get a processed string result. With that in mind, I created the IExpression interface. I then created the ISymbol interface for the intermediate types of information I would need as I processed each symbol. For numbers, the FloatingValue variable is used; for variables, operations, and functions StringValue is used. To differentiate between all variables, operations, functions, and parenthesis, I created the SymbolType enumeration (in the Expression calculator example attached, the symbol type helped with implementing situations such as "-1", "1--1", "1*-1" ).

public interface IExpression
{
    string Expression { get; set; }
    double Result { get; set; }
    bool CanEvaluate { get; set; }
    string StrResult { get; set; }
}

public enum SymbolType {expression, operation, function, variable, digit,brackets,endTag}
 
public interface ISymbol:IExpression
{
    SymbolType SymbolType { get; set; }
    double FloatingValue { get; set; }
    string StringValue { get; set; }
}

The Symbol class implements the ISymbol interface and includes the method TypedClone() which implements a strongly typed member-wise clone of the Symbol object itself. The clone is needed to maintain the original expression passed in. The extension method in the static class ExtensionLib adds a means to clone an IExpression type to a Symbol type.

public class Symbol:ISymbol
{
    public string Expression { get; set; }
    public double Result { get; set; }
    public bool CanEvaluate { get; set; }
    public string StrResul { get; set; }
    public SymbolType SymbolType { get; set; }
    public double FloatingValue { get; set; }
    public string StringValue { get; set; }
    public Symbol TypedClone()
    {
        return this.MemberwiseClone() as Symbol;
    }
}

public static class ExtensionLib
{
    public static Symbol TypedClone(this IExpression v)
    {
        return ((Symbol) v).TypedClone();
    }
}

The field _currentPositionCnt maintains the position in the Expression string indicating where to continue parsing from. _processVariableExpression is the flag that allows a check to be made to see if a Variable can be mapped to a value. The _lastReadSymbol holds the last Symbol parsed that is yet to be consumed. The _expression object holds a copy of the original object that can then be modified. The _originalExpression object holds the original object with no modifications made to it. _listOfFunctions holds the list of supported functions. _listOfBinaryOperators holds the operators that take the two parameters in performing calculations. _listOfUnaryOperators holds the operators that act on a single parameter (or Expression) and the VariableMapper dictionary holds a mapping of Variables to Values.

private int _currentPositionCnt; 
private bool _processVariableExpression; 
private Symbol _lastReadSymbol; 
private IExpression _expression; 
private IExpression _originalExpression; 
private readonly List<string> _listOfFunctions = new List<string>();
private readonly List<string> _listOfBinaryOperators = new List<string>(); 
private readonly List<string> _listOfUnaryOperators = new List<string>();
public readonly Dictionary<string, double> VariableMapper { get; private set; }

The constructor initializes the _listOfFunctions, _listOfBinaryOperators, _listOfUnaryOperators list and VariableMapper dictionary with their respective values (functions, operators, and a dictionary of variables to values).

public Evaluator() 
{
    _listOfFunctions.AddRange(new [] { "COS", "SIN", "TAN", "EXP", "LN", "LOG" });
    _listOfBinaryOperators.AddRange(new [] { "*", "/", "^", "%", "+", "-" });
    _listOfUnaryOperators.Add("!");
    VariableMapper =  new Dictionary<string, double> {{"PI", Math.PI}};
}

The Resolve() method takes in an object which adheres to the IExpression interface, and a boolean argument which tells us we should evaluate mappable variables. The method trims out the space in between characters, it also factors expressions with variables to make for a compact calculation the next go around when we actually match variables with values. The latter portion can be removed and the Factor() method updated to always just check if a Variable can be mapped to a value.

public IExpression Resolve(IExpression exp, bool resolveExp=false)
{
    _currentPositionCnt = 0;
    _originalExpression = exp;
    _expression = exp.TypedClone();
    _expression.Expression = _expression.Expression.Replace(" ", "");
    _expression.Expression += "#";
    var result = Expression();
    if (!NextIs("#"))
       throw new Exception
               ("An error occurred while procession the expression [ " 
                + _originalExpression.Expression + " ] At position " 
                + _currentPositionCnt.ToString());
    if (resolveExp && !result.CanEvaluate && VariableMapper.Count > 0)
    {
        _processVariableExpression = true;

        _lastSymbolType = SymbolType.expression;
        _currentPositionCnt = 0;
        _lastReadSymbol = null;
        _expression.Expression = result.StrResult.Replace(" ", "");

        result = Expression();
    }
    return result;
}

The ResolveAsString() method takes in an object which adheres to the IExpression interface, and a boolean argument which tells us we should evaluate mappable variables. The method returns to the caller a string result of either a floating point value or a string expression.

public string ResolveAsString(IExpression exp, bool resolveExp=false)
{
    var result = Resolve(exp, resolveExp);
    return result.CanEvaluate ? result.Result.ToString() : result.StrResult;
}

The Expression() method calls the Term() method. The resultant from the call can either be a value that can be evaluated, a Variable, or an Expression. A check is then made to see if the next following Symbol retrieved via the NextIs() method call is an operator such as "+", "-", in which case we call Term() again to get a second parameter needed for evaluation. Else we return with the X Symbol.

private ISymbol Expression()
{
    var x = Term();
    if (x == null) return null;
    for (; ; )
    {
        if(NextIs("+"))
        {
            var y = Term();
            if (y.CanEvaluate && x.CanEvaluate)
                x.Result = x.FloatingValue = x.FloatingValue + y.FloatingValue;
            else
            {
                x.StringValue = 
                    "("+(x.CanEvaluate ? x.FloatingValue.ToString() 
                                       : x.StringValue) +
                      " + " + (y.CanEvaluate ? y.FloatingValue.ToString() 
                                             : y.StringValue) + ")";
                x.CanEvaluate = false;
            }
        }
        else if (NextIs("-"))
        {
            var y = Term();
            if (y.CanEvaluate && x.CanEvaluate)
                x.Result = x.FloatingValue = x.FloatingValue - y.FloatingValue;
            else
            {
                x.StringValue = "("+(x.CanEvaluate ? x.FloatingValue.ToString() 
                                                   : x.StringValue) +
                                " - "+ (y.CanEvaluate ? y.FloatingValue.ToString() 
                                                      : y.StringValue) + ")";
                x.CanEvaluate = false;
            }
        }
        else
        {
            break;
        }
    }
    x.Result = x.FloatingValue;
    x.StrResult = x.StringValue;
    return x;
}

The Term() method calls the Factor() method. The resultant can either be a value that can be evaluated, a variable, or an expression. A check is then made to see if the next following Symbol retrieved via the NextIs() method call is an operator such as "*", "/", "%", "^", "!" or a Function, else we return with the X symbol. If it is one of the above operators, we call the Factor() method again to get the next Symbol (which again can be a value, a variable, or an expression). Note the factorial case does not call the Factor() method again. Also worthy of pointing out is the fact that operator precedence are all equal within this method. To raise precedence of any operation, parenthesis would need to be used, to encompass that portion of the expression.

private Symbol Term()
{
    var x = Factor();
    if (x == null) return null;
    for (; ; )
    {
        if (NextIs("*"))
        {

            var y = Factor();
            if (y.CanEvaluate && x.CanEvaluate)
                x.Result = x.FloatingValue = x.FloatingValue * y.FloatingValue;
            else
            {
                x.StringValue =  "("+(x.CanEvaluate?  x.FloatingValue.ToString() 
                                                   : x.StringValue) +
                        " * " + (y.CanEvaluate ? y.FloatingValue.ToString() 
                                       : y.StringValue) + ")";
                x.CanEvaluate = false;
               
            }

        }
        else if (NextIs("/"))
        {

            var y = Factor();
            if (y.CanEvaluate && x.CanEvaluate)
                x.Result = x.FloatingValue = x.FloatingValue / y.FloatingValue;
            else
            {
                x.StringValue = "("+(x.CanEvaluate ? x.FloatingValue.ToString() 
                                                   : x.StringValue) +
                        " / " + (y.CanEvaluate ? y.FloatingValue.ToString() 
                                       : y.StringValue) + ")";
                x.CanEvaluate = false;
            }
        }
        else if (NextIs("%"))
        {

            var y = Factor();
            if (y.CanEvaluate && x.CanEvaluate)
            {
                x.Result = x.FloatingValue = x.FloatingValue % y.FloatingValue;
            }
            else
            {
                x.StringValue = "("+(x.CanEvaluate ? x.FloatingValue.ToString() 
                                                   : x.StringValue) +
                        " % " + (y.CanEvaluate ? y.FloatingValue.ToString() 
                                       : y.StringValue) + ")";
                x.CanEvaluate = false;
            }
        }
        else if (NextIs("^"))
        {

            var y = Factor();
            if (y.CanEvaluate && x.CanEvaluate)
            {
                x.Result = x.FloatingValue = Math.Pow(x.FloatingValue , y.FloatingValue);
            }
            else
            {
                x.StringValue = "( (" + (x.CanEvaluate ? x.FloatingValue.ToString() 
                                                       : x.StringValue) +
                        ")^ " + (y.CanEvaluate ? y.FloatingValue.ToString() 
                                       : y.StringValue) + ")";
                x.CanEvaluate = false;
            }
        }
        else if (NextIs("!"))
        {
            if (x.CanEvaluate)
            {
                x.Result = x.FloatingValue = Factorial(x.FloatingValue);
            }
            else
            {
                   x.CanEvaluate = false;
                   x.StringValue = "(" + (x.CanEvaluate ? x.FloatingValue.ToString() 
                                                     : x.StringValue) +")! ";
            }
        }
        else if (_listOfFunctions.Contains(x.StringValue))
        {
            x = FunctionEvaluator(x.StringValue);
        }
        else
        {
            break;
        }
    }
    return x;
}

The Factor() method calls the Next() method to get the next Symbol. If the Symbol can be evaluated, it is immediately returned else we check if the Symbol is a parenthesis. If the Symbol is a parenthesis, we go on to call the expression again. Note the PreviewNextifTrueMoveTrackerByOne() method. This is used to handle the case where a negative sign follows the opening parenthesis. If we can process Variables, we look up the variable names in the VariableMapper dictionary and get the corresponding value. Note: one of the things I would have loved to have implemented here is, the ability for Variables to point to Variables.

private Symbol Factor()
{
    var x = Next();
    _lastReadSymbol = null;
    if (x != null && x.CanEvaluate)
        return x;
    if (x != null)
    {
        if (x.StringValue == "(")
        {
            bool isunaryMinus = PreviewNextifTrueMoveTrackerByOne("-");
            x = Expression() as Symbol;

            if (x == null || !NextIs(")"))
            {
                  throw new 
                                   Exception("Expression needs to end with a parenthesis " 
                              + "[ " + _originalExpression.Expression 
                                 + " ] Error at position " 
                          + _currentPositionCnt.ToString());
            }
            if (isunaryMinus)
            {
                if (x.CanEvaluate)
                    x.FloatingValue *= -1;
                else
                    x.StringValue = "((-1) * " + x.StringValue+")";
            }
        }
        if(_processVariableExpression && x.SymbolType == SymbolType.variable)
        {
            if (VariableMapper.ContainsKey(x.StringValue))
            {
                x.FloatingValue = VariableMapper[x.StringValue];
                x.StringValue = string.Empty;
                x.CanEvaluate = true;
            }
        }
    }

    _lastReadSymbol = null;
    return x;
}

The FunctionEvaluator() method calls the Factor() method to get the expected value, variable, or expression that is the argument passed into the function. If the Symbol returned by the Factor() method can be evaluated, the appropriate Math function is called, else it is presumed that the argument composition includes Variables and a string combination of the supposed Math function and the argument is composed.

public Symbol FunctionEvaluator(string value)
{
    Symbol x;
    if (NextIs("("))
    {
        _lastReadSymbol = new Symbol
                  {
                       SymbolType = SymbolType.brackets,
                       StringValue = "("
                   };
        x = Factor();
    }
    else
    {
         throw new Exception(
                "Function must be imediately followed by parenthesis. Error : [" 
                + _originalExpression.Expression 
                + " ] At position " 
                + _currentPositionCnt.ToString());
    }
    if (x != null && x.CanEvaluate)
    {
        switch(value.ToLower())
        {
            case "sin":
                {
                   x.FloatingValue = Math.Sin(Math.PI/180 * x.FloatingValue);
                   break;
                }
            case "cos":
                {
                   x.FloatingValue = Math.Cos(Math.PI / 180 * x.FloatingValue);
                   break;
                }
            case "tan":
                {
                   x.FloatingValue = Math.Tan(Math.PI / 180 * x.FloatingValue);
                   break;
                }
            case "exp":
                {
                   x.FloatingValue = Math.Exp(x.FloatingValue);
                   break;
                }
            case "ln":
                {
                   x.FloatingValue = Math.Log(x.FloatingValue, Math.E);
                   break;
                }
            case "log":
                {
                   x.FloatingValue = Math.Log10(x.FloatingValue);
                   break;
                }
        }
    }
    else if (x != null && !x.CanEvaluate)
    {
        x.StringValue = "( " + value + "( " + x.StringValue + " ))";
    }
    return x;
}

The Next() method is what handles the parsing of the string characters to symbols. It deals with the separation of string letters to either Functions or Variables. Note: you will see that the restriction of Variables to only string characters is enforced by how we parse for Variables and Integers. The beginning If statement is put in place so as to avoid advancing to the next Symbol if the previous set Symbol has yet to be claimed.

private Symbol Next()
{
    if (_lastReadSymbol != null)
    {
        var sym = _lastReadSymbol;
        _lastReadSymbol = null;
        return sym;
    }
    if (_expression.Expression != null &&

        _expression.Expression.Length > _currentPositionCnt)
    {
        var length = _expression.Expression.Length;
        if (IsNumber())
        {
            var val = _expression
                    .Expression[_currentPositionCnt]
                    .ToString();

            _currentPositionCnt++;
            while (_currentPositionCnt < length && IsNumber())
            {
                val += _expression.Expression[_currentPositionCnt++];
            }
            _lastReadSymbol = 
            new Symbol 
                { 

                    SymbolType = SymbolType.digit, 
                    FloatingValue = double.Parse(val), 
                    CanEvaluate = true
                };
            return _lastReadSymbol;
        }
        if (IsString())
        {
            var val = _expression
                    .Expression[_currentPositionCnt]

                    .ToString();
            _currentPositionCnt++;
            while (_currentPositionCnt < length && IsString())
            {
                val += _expression.Expression[_currentPositionCnt++];
            }
            if (_listOfFunctions.Contains(val))

            {
                _lastReadSymbol = 
                new Symbol 
                    { 
                        SymbolType = SymbolType.function, 
                        StringValue = val 
                    };
                return _lastReadSymbol;
            }
           
            //return variable
            _lastReadSymbol = 

                        new Symbol 
                        { 
                            SymbolType = SymbolType.variable, 
                            StringValue = val , 
                            CanEvaluate = false
                        };
            return _lastReadSymbol;
        }
        if ((_expression.Expression[_currentPositionCnt] == '.'

            && (_expression.Expression[_currentPositionCnt + 1] < '0'
             && _expression.Expression[_currentPositionCnt + 1] > '9')))
        {
             throw new 
            Exception("Expression can not have a decimal point"

                  +" without an accompanying digits."
                  +" Expression [ " 
                  + _originalExpression.Expression 
                  + " ] At position " 
                  + _currentPositionCnt.ToString());
        }

        if (_expression.Expression[_currentPositionCnt] == '(' ||
            _expression.Expression[_currentPositionCnt] == ')')
        {
            _lastReadSymbol = new Symbol
            {
                 SymbolType = SymbolType.brackets,

                 StringValue = _expression
                         .Expression[_currentPositionCnt++]
                         .ToString()
            };
            return _lastReadSymbol;
        }
        if (_listOfBinaryOperators.Contains(_expression
                            .Expression[_currentPositionCnt]

                            .ToString()))
        {
            _lastReadSymbol = 
            new Symbol
            {
                SymbolType = SymbolType.operation,
                StringValue = _expression
                           .Expression[_currentPositionCnt++]
                          .ToString()

            };
            return _lastReadSymbol;
        }
        if (_listOfUnaryOperators.Contains(_expression
                            .Expression[_currentPositionCnt]
                            .ToString()))
        {
            _lastReadSymbol = 
            new Symbol
               {

                SymbolType = SymbolType.operation,
                StringValue = _expression
                         .Expression[_currentPositionCnt++]
                                    .ToString()
               };
            return _lastReadSymbol;
        }
        if (_expression.Expression[_currentPositionCnt] == '#')

        {
            _lastReadSymbol = 
            new Symbol
                {
                 SymbolType = SymbolType.endTag,
                 StringValue = _expression
                               .Expression[_currentPositionCnt++]
                           .ToString()
                };
            return _lastReadSymbol;

        }
    }
    return null;
}

The Factorial() method was written to allow me to add factorial as a function within the expression evaluator.

private double Factorial(double  number)
{
    var val = (long)number;
    double retVal=1;
    for (long i =val; i>0; i--)
    {
        retVal *= i;
    }

    return retVal;
}

The NextIs() method picks the next Symbol by calling Next() and if its value (FloatingValue for Symbols that can be evaluated and StringValue for Symbols that can not be evaluated) matches the argument passed in, _lastReadSymbol is set to null, to allow advancement of picking a new Symbol via the Next() method.

private bool NextIs(string val)
{
    var t = Next();
    if (t != null)
    {
        if (t.CanEvaluate)
            if (t.FloatingValue.ToString() == val)
            {
                _lastReadSymbol = null;

                return true;
            }
            else
            {
                _lastReadSymbol = t;
            }
        else
            if (t.StringValue == val)
            {
                _lastReadSymbol = null;
                return true;
            }
            else
                _lastReadSymbol = t;

    }
    return false;
}

The preview next and advance tracker by one method was created to handle the case where a negative sign follows the start of a parenthesis "(-" (it is used in the Factor() function).

private bool PreviewNextifTrueMoveTrackerByOne(string val)
{
    if (_expression.Expression != null 
            && _expression.Expression.Length > _currentPositionCnt)

    {
        if (_expression.Expression[_currentPositionCnt].ToString() ==val)
        {
            _currentPositionCnt++;
            return true;
        }
    }
    return false;
}

The IsNumber() method is used within the Next() method to check that the current character is a number and not a letter. The check done works for floating point numbers, by checking that if a dot occurs, the next character would have to be a number.

private bool IsNumber()

{
    return ((_expression.Expression[_currentPositionCnt] >= '0'
             && _expression.Expression[_currentPositionCnt] <= '9') ||
            (_expression.Expression[_currentPositionCnt] == '.' &&

             (_expression.Expression[_currentPositionCnt + 1] >= '0'
              && _expression.Expression[_currentPositionCnt + 1] <= '9')));
}

The IsString() method is used within the Next() method to check that the current character is a letter and not a number.

private bool IsString()
{
    return ((_expression.Expression[_currentPositionCnt] >= 'A'

             && _expression.Expression[_currentPositionCnt] <= 'Z') ||
            (_expression.Expression[_currentPositionCnt] >= 'a'
             && _expression.Expression[_currentPositionCnt] <= 'z'));

}

Points of Interest

The example attached is a mathematical expression calculator, it utilizes a variant of the code above to implement a functioning calculator. It allows the user to write mathematical expressions that can contain variables, and factors these expressions in term of variables. It also allows for mapping of variables to values so as to enable a complete evaluation of the expression. It allows for the storing of expressions and the recalling of expressions (this functionality goes in a round robin form). The Define button allows you to define variable mapping.

318667/ExpressionCalculator.png

318667/ExpressionCalculator2.png

Some of the fun things that can be done to enhance the example code given is:

Implement a differential calculator just like the one in the following CodeProject work: http://www.codeproject.com/Articles/13731/Symbolic-Differentiation
Change the implementation to use postfix.
Implement graphing functionality (probably my favorite because it means you will have to handle the variable within variables case).

History

No revisions were made yet.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)