Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

A Very Simple Parser

0.00/5 (No votes)
12 Apr 2004 1  
An article describing how to parse simple mathematical expressions (containing only +, -, *, /, and numbers) and evaluate their values.

Introduction

One day, when I was 15 years old, I went to a shop near my house and I saw a scientific calculator. When I used the calculator, there was something that took my attention. The calculator was not like the other simple calculators I used to use (at that time). There was a bar in the top of the display of the calculator that you type any expression in (e.g. 5*sin(pi/2)+5) and the calculator evaluates the result. I was amazed how the calculator understood the text, i.e. parsed the expression, and evaluated the result. Now, after being 18, I decided to make a series of programs that do the same thing. This is my first program in this series.

This article explains a simple program that parses simple expressions containing only +, -, *, /, and numbers (e.g. 5+6*3.3) and evaluates the result.

Let's Do It

I dislike too much talk, so I think the best way to illustrate my idea is to begin with an example. Consider the following expression:

1*2*3-4*5/6.6+2

The first thing we should put in mind is that multiplication and division have precedence over addition and subtraction. We should find some way to implement this. The best method is to partition the expression into terms. Thus the result will be to evaluate the following terms:

1*2*3
4*5/6.6
2

After evaluating each term alone, we add them putting in mind the sign before each expression. For example, the expression 4*5/6.6 has a minus sign, so we should subtract the value of the term from the final result instead of adding it. For this example, the following sequence will occur:

  1. The value of fResult (which we will use for the final result of the expression) will be set to zero.
  2. As we are using right to left parsing, terms to the right will be evaluated before terms to the left.
  3. The term 2 will be evaluated and added to fResult. Thus fResult will have the value of 2.
  4. The term 4*5/6.6 (which has the value of 3.030303) will be evaluated and subtracted (because of the minus sign) from the final result. Thus fResult will be containing -1.030303.
  5. The term 1*2*3 (which has the value of 6) will be evaluated and added to the final result. Thus fResult will be containing 4.969697.

Let's examine the following segment of code:

bool EvaluateExpression(const char * strExpression, double & fResult)
{
    // Evaluates the value of each term.

    int nLength = strlen(strExpression);
    int nEnd = nLength - 1;
    double fTerm;
    fResult = 0.0;
    // Searches the expression from right to left for a '+'

    // or '-', i.e. partition the expression

    //  into terms.

    for (int i = nLength - 1; i >= 0; i--)
        if (strExpression[i] == '+')
        {
            if (!EvaluateTerm(strExpression + i + 1, nEnd - i, fTerm))
                // Invalid term!!!

                return false;
            fResult += fTerm;
            nEnd = i-1;
        }
        else
            if (strExpression[i] == '-')
            {
                if (!EvaluateTerm(strExpression + i + 1, nEnd - i, fTerm))
                    // Invalid term!!!

                    return false;
                fResult -= fTerm;
                nEnd = i-1;
            }

    // The first term is often not preceded by a sign!!!

    if (strExpression[0] != '+' && strExpression[0] != '-')
    {
        if (!EvaluateTerm(strExpression, nEnd + 1, fTerm))
            // Invalid term!!!

            return false;
        fResult += fTerm;
        nEnd = i-1;
    }

    return true;
}

As we see, the function searches the string of the expression from right to left for any '+' or '-'. A term lies either between two signs ('+' or '-') or between one sign and one end of the expression. After finding a term, the function EvaluateExpression calls another function, which is EvaluateTerm to evaluate the value of the found term. In a minute, we will be seeing how the function EvaluateTerm works.

After the function has evaluated the value of the term, it examines its sign and adds the value of the term to the final result.

Now, let's take a look at the function EvaluateTerm. This function almost has the same idea as the function EvaluateExpression. Just like how we did with expression, the better method to find the value of a term is to partition it into numbers and evaluate the term, putting in mind the sign before each number. For example:

4*5/6.6

We should partition this term into:

4
5
6.6

and then evaluate the results considering the sign of each number. For example, for the term 4*5/6.6, the following sequence will occur:

  1. The value of fResult (which we will use for the final result of the term) will be set to one (instead of zero).
  2. As we are using right to left parsing, numbers to the right will be evaluated before terms to the left.
  3. fResult will be divided by 6.6 to result in 0.151515.
  4. fResult will be multiplied by 5 to result in 0.757576.
  5. fResult will be multiplied by 4 to result in 3.030303 which is the final result of the term.

Let's examine the code of the function:

bool EvaluateTerm(const char * strTerm, int nTermLength, double & fResult)
{
    // Evaluates the value of each term.

    int nEnd = nTermLength - 1;
    fResult = 1.0;
    double fNumber;
    // Searches the term from right to left for

    // a '*' or '/', i.e. partition the terms

    //  into numbers.

    for (int i = nEnd; i >= 0; i--)
        if (strTerm[i] == '*')
        {
            if (!StrToFloat(strTerm + i + 1, nEnd - i, fNumber))
                // Invalid number!!!

                return false;
            fResult *= fNumber;
            nEnd = i-1;
        }
        else
            if (strTerm[i] == '/')
            {
                if (!StrToFloat(strTerm + i + 1, nEnd - i, fNumber))
                    // Invalid number!!!

                    return false;
                fResult /= fNumber;
                nEnd = i-1;
            }

    // The first term should not be preceded by a sign!!!

    if (strTerm[0] != '*' && strTerm[0] != '/')
    {
        if (!StrToFloat(strTerm, nEnd + 1, fNumber))
            // Invalid number!!!

            return false;
        fResult *= fNumber;
    }

    return true;
}

As we see, the function traces the term from right to left looking for either '*' or '/'. As is the case for a term, a number lies either between two signs (now they are '*' and '/' instead of '+' and '-') or between one sign and one end of the term.

Note: The StrToFloat function converts a string to the value it represents.

Final Notes

Before I finish my article, I want to mention the following two points:

  1. The user should verify that the expression doesn't contain any invalid characters. This is done in the sample using the function VerifyExpression. This function searches the expression string for any character other than '+', '-', '*', '/', '.', and '0'-'9'. If it finds such character(s), it will notify that the expression is invalid. It is left to the reader to see the function in the code because it's not necessary to lengthen the article too much with easy stuff.
  2. In this sample, I didn't use exception handling because it is not the subject of the article. But in a real application, it is better to use exception handling instead of just returning 'false' which will not tell us more than "your expression couldn't be evaluated"!

Questions

  1. I was about to enter 5*6 in the program, but I decided, as a test to the VerifyExpression function which doesn't accept white spaces as valid characters, to enter the expression 5 *6 and 5* 6. I expected that the program will display "Invalid expression!", but for my surprise, the program didn't and, instead, it displayed 5 for the first test and 0 for the second test. Why?
  2. The following expressions are invalid expressions but the program doesn't violate them and it may produce unexpected results:
    5**6
    5//6
    
    1+*2
    5*+6

    Run the program and test them. Can you find out the reason behind the answer? Also, can you solve these problems?

  3. In my program, I used right to left parsing. Why do you think I did? Try to make a similar program that uses left to right parsing. Which is easier?

Remark: If you take a look at the VerifyExpression function, you will notice that the function just searches for invalid characters. For example, the expression 5**6 will pass the test of VerifyExpression function. How can you discover such syntax errors?

Well, if you just solve the problems mentioned in question 2, you will find that such syntax errors will automatically be discovered, how come that?

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here