Introduction
Table based parser generation offers the possibility of both fast and flexible parser construction. This article describes an implementation of a particular method of constructing a parse table for an LR (left to right bottom up) parser called an LALR parser or Lookahead-LR parser.
Background
An LR parser consists of a parse table and a stack. A parser of this kind recognizes a string of input tokens by examining the input string from left to right and performing one of the following operations:
- Shift the token onto the stack
- Reduce several of the tokens on the stack by replacing the tokens on the right hand side of a grammatical production, which must exist on the stack with the token on the left hand side of the production
- 'Accept' the input string
- Produce an error
The LALR algorithm produces a parser table that decides on the possible reductions from a given state using a concept of look-ahead. The algorithm examines the productions that lead into and out of each state via both transitions and grammatical productions, and determines, for each production represented in the state, which tokens could come after that production.
Using the code
The code I've published creates a parse table, and formats the parse table to the console output.
using System;
using CodeProject.Syntax.LALR;
namespace TestProject
{
class MainClass
{
public static void Main (string[] args)
{
Grammar grammar = new Grammar();
grammar.Tokens = new string[]{"S'", "e", "+", "-", "*", "/", "i", "(", ")"};
grammar.PrecedenceGroups = new PrecedenceGroup[]
{
new PrecedenceGroup
{
Derivation = Derivation.None,
Productions = new Production[]
{
new Production{
Left = 0,
Right = new int[]{1}
},
new Production{
Left = 1,
Right = new int[]{6}
},
new Production{
Left = 1,
Right = new int[]{7, 1, 8}
}
}
},
new PrecedenceGroup
{
Derivation = Derivation.LeftMost,
Productions = new Production[]
{
new Production{
Left = 1,
Right = new int[]{1, 4, 1}
},
new Production{
Left = 1,
Right = new int[]{1, 5, 1}
},
}
},
new PrecedenceGroup
{
Derivation = Derivation.LeftMost,
Productions = new Production[]
{
new Production{
Left = 1,
Right = new int[]{1, 2, 1}
},
new Production{
Left = 1,
Right = new int[]{1, 3, 1}
}
}
}
};
Parser parser = new Parser(grammar);
Debug.DumpParseTable(parser);
}
}
}
And produces the following output:
+---+---+---+---+---+---+---+---+---+---+---+
|###| $ |S' | e | + | - | * | / | i | ( | ) |
|---+---+---+---+---+---+---+---+---+---+---+
| 0| | |S 1| | | | |S 2|S 3| |
+---+---+---+---+---+---+---+---+---+---+---+
| 1|R 0| | |S 4|S 5|S 6|S 7| | | |
+---+---+---+---+---+---+---+---+---+---+---+
| 2|R 1| | |R 1|R 1|R 1|R 1| | |R 1|
+---+---+---+---+---+---+---+---+---+---+---+
| 3| | |S 8| | | | |S 2|S 3| |
+---+---+---+---+---+---+---+---+---+---+---+
| 4| | |S 9| | | | |S 2|S 3| |
+---+---+---+---+---+---+---+---+---+---+---+
| 5| | |S10| | | | |S 2|S 3| |
+---+---+---+---+---+---+---+---+---+---+---+
| 6| | |S11| | | | |S 2|S 3| |
+---+---+---+---+---+---+---+---+---+---+---+
| 7| | |S12| | | | |S 2|S 3| |
+---+---+---+---+---+---+---+---+---+---+---+
| 8| | | |S 4|S 5|S 6|S 7| | |S13|
+---+---+---+---+---+---+---+---+---+---+---+
| 9|R 5| | |R 5|R 5|S 6|S 7| | |R 5|
+---+---+---+---+---+---+---+---+---+---+---+
| 10|R 6| | |R 6|R 6|S 6|S 7| | |R 6|
+---+---+---+---+---+---+---+---+---+---+---+
| 11|R 3| | |R 3|R 3|R 3|R 3| | |R 3|
+---+---+---+---+---+---+---+---+---+---+---+
| 12|R 4| | |R 4|R 4|R 4|R 4| | |R 4|
+---+---+---+---+---+---+---+---+---+---+---+
| 13|R 2| | |R 2|R 2|R 2|R 2| | |R 2|
+---+---+---+---+---+---+---+---+---+---+---+
Parser class
The Parser
class encapsulates the LALR table construction algorithm, but also exposes several methods and properties useful for grammar analysis and debugging.
Parser - Constructor
Analyses the grammar producing the parse table, using the below methods, and other supporting methods.
GenerateLR0Items
Generates the LR(0) States of the parser by starting with the State S' -> .X
where X
is the start symbol of the grammar. The production S' -> X
must be explicitly passed into the constructor above.
LR(0) items consists of the set of items {S' -> .X}
and all of the states reachable by substituting the very next symbol after the '.' (X
in this case) with the left hand side of any production that has X on the right hand side. This operation is called the closure of an LR(0) item. In the grammar above, State 0 consists of the following set of items.
S' -> . e
e -> . i
e -> . ( e )
e -> . e * e
e -> . e / e
e -> . e + e
e -> . e - e
We then find the states Goto(X)
for any X
which is a token of the grammar and find the successor states of state 0. This is done by including each item with the 'X' to the right of the '.' and putting them into the new state, then calculating the closure of that new state. On the token e
in the grammar presented, the new state, state '1', has the following LR(0) items:
S' -> e .
e -> e . * e
e -> e . / e
e -> e . + e
e -> e . - e
The method repeats this process until there are no new states.
ComputeFirstSets
This method computes the set of terminals that are possible to occur after any token in the grammar. The first set of the token e
is {i
, (
}. This construction makes it possible to compute the LALR states later.
CalculateLookAheads
Calculates the look-ahead items for each LR0 item from GenerateLR0Items
above. A looka-head at the end of a production in the state tells the parser that it is safe to perform a reduction. For example, state 2 above on token )
is able to reduce by rule e -> i
thus replacing an i
on the stack with the non-terminal e
. The parser knows it can do this because the LALR state 2 contains the LALR Item e -> i. , )
for production e -> i
and look-ahead token )
.
GenerateParseTable
This method constructs the actions of the parse table. It does this by combining the Goto actions from each state and the reduction actions possible depending on the look-aheads generated in the previous method. If at any state there is a conflict between either a goto or a reduction, the algorithm attempts to resolve this by choosing a rule from a higher precedence group. If there is no clear winner, then the algorithm checks whether the rule should produce left-most or right-most derivations. A left most derivation will favour a reduction, whereas a right-most derivation will favour a goto. If there is still no clear winner, the algorithm will announce either a Reduce-Reduce or a Shift-Reduce conflict error.
Debug class
The Debug
class in the sample code contains several methods that might be useful for debugging a grammar or parser.
DumpParseTable
Writes a formatted parse table to the console.
DumpLR0State
Writes an LR(0) state to the console. For example, the following snippet will write State 0 above to the console.
Debug.DumpLR0State(parser, parser.LR0States[0]);
DumpLALRState
Writes an LALR state to the console, including look-aheads. The following snippet will write the LALR items in State 0 of the generated parser to the console.
Debug.DumpLR1State(parser, parser.LALRStates[0]);
References
| The project code implements the LALR algorithm described in the dragon book "Compilers: Principles, Techniques, and Tools" (1986) by Aho, Sethi, and Ullman.
|
Feature backlog
I intend to implement the following features in later updates.
- Generate a C# type that will parse an input grammar
- Parse an input grammar from a file
- ComponentModel/Reflection invocation of types/methods that perform the reduction rules of the grammar
- Generate parsers in different languages
History
- 2011-09-07: Initial implementation generating parse table.
- 2015-06-03: Added link to resource