Introduction
Please refer to the last project I posted, Parsing Algebraic Expressions Using the Interpreter Pattern, as needed. That project was more geared around using the interpreter pattern for something interesting and was a nice exercise in which the usage of other patterns would be likely to arise. However, from a parsing perspective, it was largely ad-hoc. I had interest in the area but not the time to pursue. In the intervening years, my wife and I raised our kids up to a manageable age and I actually found some time to go through the first half of the Red Dragon book.
This project accomplishes the same thing as the last project but the implementation is significantly different. In addition, there were some weaknesses/bugs in the initial implementation. The most notable was the left associativity of the exponent operator. The shift-reduce parser here uses a precedence table that encodes whether an operator is left associative or right associative. The exponent operator should have been right associative.
The most notable difference you will recognize immediately is that this new project accomplishes everything in less than 400 lines of code in the actual compiler, albeit a very minimal compiler. This is a testament to the expressiveness of JavaScript. I was motivated to do this in JavaScript since it is a very practical language to work with, being the universal language of the browser, and HTML 5 capabilities let me demonstrate the compiler in a cool little graphing calculator project with a minimum of fuss.
I want to keep the explanation of this program to a minimum since I cannot hope to do as good a job as the Red Dragon book. The explanation proceeds from a brief description of the interpreter pattern and ties that in to what the output of the program will be. This is followed by the shift-reduce algorithm used to produce this output. Finally, a brief explanation of some of the JavaScript code that may be novel is given.
The Interpreter Pattern
The Interpreter pattern is used to model a grammar using a set of objects that represent possible expressions within the grammar. For instance, for the algebraic expression 4 + 4, an Add expression object can be used to represent the expression. Included in the Add expression object would be a reference to the two operands and the capacity to evaluate the expression and return a value based on the two operands.
Without getting too tied down by words, you can see how the possible variety of algebraic expressions can be represented using expression objects. Additionally, expressions themselves can be used as operands to other expressions based on a defined order of operations. Given this, algebraic expressions in general can be represented as binary trees of expression objects and can be evaluated via a traversal.
When it comes to representing variable expressions, things get a bit trickier since the evaluation of a variable expression changes based on the current value of the variable. The Interpreter pattern provides a clean solution to this problem by defining a context class that provides the current value of variables that may appear within the expression being evaluated. This context object is passed down from the root of the tree through successive evaluations of operands whether or not it is used with a given level. In this way, the context can be provided where needed at any point in the tree.
For this project, the root of the tree mentioned above is the ultimate output of the parse. Once the root expression object is obtained, it is evaluated successively to produce a graph of an algebraic expression.
Shift-Reduce Parsing
If you have ever heard of LR parsing, know that LR parsing is a type of Shift-Reduce parsing. Shift-Reduce is a general term that applies to any parsing method that uses a stack, has a shift operation, and some kind of reduction. Implementing full LR parsing is a big task. I set out to do that originally until I learned about operator-precedence parsing. The grammar I was hoping to write a nice parser for qualified as an operator-precedence grammar and hence lent itself to the simpler parsing technique rather than LR. I believe operator-precedence parsing is the standard shift-reduce parsing method used in hand-held calculators.
For a grammar to qualify as an operator-precedence grammar, it must not have two successive non-terminals on the right hand side of a production nor have any empty productions. Roughly, the grammar used in this project is as follows:
E -> E + E | E - E | E * E | E / E | E ^ E | (E) | ~E | FUNC E | NUMBER | VARIABLE
The above grammar is ambiguous. There are various means of eliminating ambiguity in a grammar so as to prepare it for the parsing algorithm to be used, including introducing levels to force precedence relations. Operator-precedence parsing doesn’t eliminate the ambiguity in the grammar itself but instead uses a precedence table that guides successive steps of the parse. It not only is useful for establishing precedence but it also encodes associativity rules for operators of equal precedence and signals errors.
One oddity to mention is the presence of the ‘~’ unary operator. This is to distinguish it from the binary subtraction operation without having to code for special cases in the parser. I found that the book solution to this problem is to have the scanner convert the unary minus ‘-’ symbol to a ‘~’ before providing output to the parser. This eliminates special casing in the parser but results in a nasty looking if
statement in the scanner.
Typically, the shift-reduce parser output would be another expression written in postfix notation which would lend itself to easy evaluation via another stack. In this instance, I went with the interpreter pattern as mentioned above. So the reduction step is a matter of binding terms to expression instances as operands. Ultimately a tree is built up this way and the output of the parser is the root of the tree.
The above explanation is limited. I will happily answer any questions you may have regarding the theory behind the parse and why it works but to explain it fully in all details would be quite a task. So I can do my best to fill in what holes there may be.
JavaScript
Most of the JavaScript used here is pretty standard except for the use of one idiom referred to as the Module pattern in Douglas Crockford’s book, JavaScript: The Good Parts. His general angle is that JavaScript is a great prototypal language and it should be used that way rather than trying to force it into being an object oriented language. The Module pattern would qualify as using JavaScript the way JavaScript was intended. In this pattern, a function is defined in which local variables are defined. In this code, I used the “_” convention to indicate that they are in some sense private. Next, the function defines an object and assigns properties and methods to the object that have access to the locals just defined. It then returns the object. Because of the lexical scoping used in JavaScript, when the functions on the return object are called they will remember the lexical environment in which they were defined and be able to access those locals. This is often called a closure.
var scanner = function () {
var _expressionTypes = [];
var result = {};
result.add_expressionType = function (expressionType) {
_expressionTypes.push(expressionType);
}
...
return result;
}
To understand closures in a wider context, look at static (lexical scoping) vs. dynamic scoping. Dynamic scoping is typically based on the current stack of activation records determined by the order of function calls during run-time whereas static scoping is based on the lexical structure of the program. The function defined above, add_expressionType
, within the scanner
function definition has access to the _expressionTypes
variable since it is in static
scope. Even after the call to the scanner
function completes, this relationship is maintained. It can be said that the relationship is maintained by an entity referred to as a closure, which is a means to implement lexical scoping. Understanding closures in this wider context makes it a bit easier to understand and file away in your knowledge base. Lots of articles on closures seem to dance around the fundamentals and describe them as if they were some kind of magical creature. They’ve been around a long time and Java and C# usage of closures or closure-like constructs are nothing new.
The Code
The organization of this project is quite simple. There is a default.html file and two folders that needs to appear in the same directory as default.html. One folder contains scripts and the other contains styles. The script file containing the parser is compiler.js. Open the default.html file in any HTML 5 compliant browser and it should work out of the box. Keep in mind that IE 9 is not HTML 5 compliant. I’m sure I can make it work in IE 9 with a well-placed if
statement but at this point I’ve decided to code according to the standard and forget IE compatibility. It does work with all other modern browsers with HTML 5 compliance. My attitude is that if you want the web to work then you shouldn’t be using IE.
My intent in doing this project was to enjoy myself and also enjoy sharing. With that in mind, I don't enjoy writing code to handle errors. So please ensure that all 3 form fields are filled out with something that makes sense before clicking the Plot button.
History
- 26th September, 2011: Initial version