Introduction
Most major parser generators operate off of some variant or another of the Context-Free Grammar theory of parsing, which is a rigorous mathematical model designed to describe Chomsky Type-2 languages. This library provides basic support for querying symbols and computing first/follows/predicts for a grammar for k=1.
Basically, most parser generators need the same basic services: A way to keep track of rules and symbols, and a way to compute FIRSTS/FOLLOWS/PREDICT sets. That's what this little library does.
It's highly specialized, but if you're building a parser generator, you could do worse than starting with this library.
Conceptualizing this Mess
First of all, a note about collections in my APIs. I don't use "getter" properties for collections. I use "Fill
" methods which are more flexible in that they can return a new collection filled with items or fill an existing collection. You'll see various FillXXXX()
methods in my code, and this is what they are. Calling them like...
var syms = cfg.FillSymbols();
...is the equivalent of using a getter property.
CfgDocument
is the main CFG object that basically represents a grammar, while CfgRule
represents a single rule. All symbols are represented as simple strings. The CfgDocument.Rules
property holds a collection of rules that make up the grammar. Once these are filled, you can perform various computations on them.
After that, coding it is pretty straightforward, and I've included an extensive demo project which includes a runtime parser powered by this lil beast.
Coding this Mess
I prefer to let the demo code do the talking.
static void Main(string[] args)
{
if(0==args.Length)
{
Console.Error.WriteLine("Must specify input CFG");
return;
}
var cfg = CfgDocument.ReadFrom(args[0]);
cfg.RebuildCache();
Console.WriteLine("See: http://hackingoff.com/compilers/ll-1-parser-generator");
Console.WriteLine();
Console.WriteLine("CFG has {0} rules composed of {1} non-terminals and
{2} terminals for a total of {3} symbols" ,cfg.Rules.Count,
cfg.FillNonTerminals().Count, cfg.FillTerminals().Count,
cfg.FillSymbols().Count);
Console.WriteLine();
Console.Write("Terminals:");
foreach(var t in cfg.FillTerminals())
{
Console.Write(" ");
Console.Write(t);
}
Console.WriteLine();
Console.WriteLine();
var predict = cfg.FillPredict();
var follows = cfg.FillFollows();
foreach(var nt in cfg.FillNonTerminals())
{
Console.WriteLine(nt+" has the following rules:");
foreach(var ntr in cfg.FillNonTerminalRules(nt))
{
Console.Write("\t");
Console.WriteLine(ntr);
}
Console.WriteLine();
Console.WriteLine( nt + " has the following PREDICT:");
foreach (var t in predict[nt])
{
Console.Write("\t");
Console.WriteLine((t.Symbol??"<empty>")+" - "+t.Rule);
}
Console.WriteLine();
Console.WriteLine(nt + " has the following FOLLOWS:");
foreach (var t in follows[nt])
{
Console.Write("\t");
Console.WriteLine(t);
}
Console.WriteLine();
}
Console.WriteLine("Building simple parse table");
var parseTable = new Dictionary<string, Dictionary<string, CfgRule>>();
foreach (var nt in cfg.FillNonTerminals())
{
var d = new Dictionary<string, CfgRule>();
parseTable.Add(nt, d);
foreach(var p in predict[nt])
{
if(null!=p.Symbol)
{
CfgRule or;
if(d.TryGetValue(p.Symbol,out or))
{
Console.Error.WriteLine("First-first conflict between " +
p.Rule + " and " + or);
} else
d.Add(p.Symbol, p.Rule);
} else
{
foreach(var f in follows[nt])
{
CfgRule or;
if (d.TryGetValue(f, out or))
{
Console.Error.WriteLine("First-follows conflict between " +
p.Rule + " and " + or);
}
else
d.Add(f, p.Rule);
}
}
}
}
#region Build a Lexer for our parser - out of scope of the CFG project but necessary
Console.WriteLine("Building simple lexer");
var fas = new FA[]
{
FA.Literal("+","add"),
FA.Literal("*","mul"),
FA.Literal("(","lparen"),
FA.Literal(")","rparen"),
FA.Repeat(FA.Set("0123456789"), "int")
};
var lexer = new FA();
for(var i = 0;i<fas.Length;i++)
lexer.EpsilonTransitions.Add(fas[i]);
Console.WriteLine();
#endregion
var text = "(1+3)*2";
Console.WriteLine("Creating tokenizer");
var tokenizer = new Tokenizer(lexer, text);
Console.WriteLine("Creating parser");
var parser = new Parser(parseTable, tokenizer, "Expr");
Console.WriteLine();
Console.WriteLine("Parsing " + text);
Console.WriteLine(parser.ParseSubtree());
}
And there's your runthrough. Obviously if you're building a parser-generator, you'll be using the PREDICT/FOLLOWS sets to build that parse table in code, or even rendering recursive descent parsers with it like Parsley does.
Other Resources
History
- 31st January, 2020 - Initial submission