This article shows a different and more “natural” approach to the problem of calculating the value of an expression.
Introduction
I have always been bothered by the counter-intuitiveness of AST or RPN algorithm used to evaluate expressions. In this article, I will show a different and more “natural” approach to the problem of calculating the value of an expression.
The Idea
Let's say that I need to calculate this expression:
1+3*(5-(8-4)+(2+3)^2)
If I use pen and pencil to do the calculation, I would start reducing the expression by calculating the numerical result of the operations with the highest priorities and iteratively simplify the expression until I am left with the final numerical value:
Step 1 | 1+3*(5-4+5^2) |
Step 2 | 1+3*(5-4+25) |
Step 3 | 1+3*26 |
Step 4 | 1+78 |
Step 5 | 79 |
The expression evaluator that I coded follows exactly this idea, where the priority of an operation is calculated by combining the priority of the operator and its nesting level.
Operator Priority
Each arithmetical operator is assigned a base priority level.
Operator | Priority |
+ | 1 |
- | 1 |
* | 2 |
/ | 2 |
^ | 3 |
If the operator is nested inside parentheses, the level of priority is equal to the base priority plus the nesting level multiplied by 10
. The priority levels for the operators in the expression we just saw will look like this:
By executing the operation starting from the highest level and replacing the operators and operand with the result, we can reduce the expression to a final value:
Implementation
The implementation follows few steps:
- The input string containing the expression will be formatted.
- An array containing all the tokens of the expression will be built.
- A second array containing the levels of all the operators will be built.
Formatting
The formatting process will transform the input string in a string that follow certain rules. For example, the empty spaces will be eliminated, an expression like 2(3+1)
will be transformed in 2*(3+1)
, etc. Also, it will throw an exception in case the input string starts with a operator different than plus or minus and other situations.
Tokenization
The tokenization process takes the formatted expression and produces:
- an array containing the tokens (as shown above).
- an array containing the priority for every operator.
- an array containing the list of priority. This Array will be sorted and we will start the process of reduction from the highest value of priority.
Example
Let's suppose we have need to evaluate the expression string:
-5+2*(1+((7/8^2)-12^(-2))+9/4)
After being formatted, the string will be transformed into:
0-5+2*(1+((7/8^2)-12^(0-2))+9/4)
The tokenization process will create these three arrays:
_tokens
_tokenPriority
_priorityValues
Again, let's recall how the calculation of the priority of an operator is done:
We start with the base priority of each operator:
+ -> 1
- -> 1
* -> 2
/ -> 2
^ -> 3
and to that, we add the level of parentheses the operator is in multiplied by 10
.
Calculation
The array _priorityValues
is sorted and starting from the highest value (in our example 33
), all the operators in the _tokens
array are calculated and then they are replaced with the result of that calculation. So the arrays _tokens
and _tokensPriority
get transformed from:
to:
and so on until we arrive at the final result.
Conclusion and Points of Interest
Clearly, the algorithm can be expanded and optimized in many different ways, but that is out of my scope for today. Attached to this article, there is a zipped Visual Studio project with all the code of the parser in a class and a simple Form
.
History
- 21st December, 2022: Initial version