This article describes how to use the Parakeet parsing library in C#. Using a recursive-descent parsing library can make parsing complex patterns in text much easier than writing code from hand, or having to learn special languages invented by parsing tools. Parakeet allows grammars to be defined directly in C# using operator overloading to map rule combinators to PEG operations.
Parsing Text in C# Using Parakeet
Parakeet is an open-source C# recursive descent parsing library that uses operator overloading to make it easy to declare grammars in a readable format. It is an attempt to combine the advantages of parser generator tools and hand-rolled parsers.
Parakeet was designed particularly for the parsing of non-trivial programming languages, and supports error recovery, tokenization, and parse tree generation. The Plato language implementation is being built using Parakeet.
To get you started, the following is an example of a grammar for parsing CSV files:
public class CsvGrammar : BaseCommonGrammar
{
public static readonly CsvGrammar Instance = new CsvGrammar();
public override Rule StartRule => File;
public Rule StringChar => Named(AnyChar.Except('"') | "\"\"");
public Rule String => Node(DoubleQuotedString(StringChar));
public Rule Text => Node(AnyChar.Except(",\n\r\"").OneOrMore());
public Rule Field => Node(Text | String);
public Rule Row => Node(Field.ZeroOrMore() + Optional('\r') + '\n');
public Rule File => Node(Row.ZeroOrMore());
}
Parsing Rules
Parsers are defined as set of parsing rules defined in terms of each other. Rules are combined together using "combinators". In some contexts, a single rule might also be known as a "parser". A rule is an instance of a class derived from Rule
which provides a member function:
ParserState Match(ParserState state)
If the rule matches the input at the position represented by the ParserState
, a new ParserState
instance will be returned, otherwise, the function returns null
.
Rule Combinator
A rule combinator is a function that creates a rule from other rules, similar to the core operations of a PEG grammar. For example: Choice
, Sequence
, ZeroOrMore
, OneOrMore
, NotAt
, etc.
In Parakeet, there are several ways to create rules from other rules:
- Creating instances of the combinator classes (i.e., using
new
) - Calling one of the extension methods on
Rule
(e.g., Optional()
, NotAt()
, etc.) - Using operator overloading
+
=> Sequence
|
=> Choice
!
=> NotAt
For more information on rule combinators, see the Wikipedia Article on Parsing Expression Grammars.
ParserState - The Input and Output of Rule Matching
A ParserState
is an immutable class that contains:
- A pointer to the input - a
string
combined with look-up tables for quickly determining line and columns from indexes) - An offset from the beginning of the input
string
- A pointer to a linked list of parse nodes
- A pointer to a linked list of error
The fields of a ParserState
instance are:
public class ParserState
{
public readonly ParserInput Input;
public readonly int Position;
public readonly ParserNode Node;
public readonly ParserError LastError;
...
}
Defining Grammars
A grammar is a collection of parsing rules that together describe how to parse input string
s. A Parakeet grammar is a class derived from the Grammar
class which provides an overidden definition of the starting rule, and an optional whitespace rule.
abstract Rule StartRule {get;}
virtual Rule WS {get;}
Most rules are defined as computed properties, but can also be functions or fields. It is up to the programmer.
Rules that are directly associated with properties in the grammar are usually either Node
rules or Named
rules. Named
rules simply return the result of matching a child rule and are associated with the property name, and help with grammar debugging and diagnostics.
Node Rules
A Node
rule is like a NamedRule
in that it also has a name but if matched successfully will return a ParserState
that has at least two new ParserNode
instances added to the linked list of nodes.
One node points the beginning of the match, and the other node points to the end.
A ParserNode
looks like this:
public class ParserNode
{
public readonly ParserRange Range;
public readonly string Name;
public readonly ParserNode Previous;
...
}
A linked list of ParserNode
instances can be converted into a parse tree using the ParseTreeNode ParserNode.ToParseTree()
method.
Generating Parsing Errors with OnFail Rules
There are several ways that a parsing rule might not match successfully:
- Returning
null
- this represents the normal failure to match - Not reaching the end of the input:
ParserState.AtEnd == false
- Accumulating one or more
ParserError
instances.
A special rule called OnFail
is used to generate ParserError
instances. The OnFail
rule is expected to appear as a child of a SequenceRule
.
The OnFail
rule indicates that if the preceding child rules succeed, then any failure in the following rule will generate a ParserError
. An optional recovery rule can be provided as a parameter, allowing the parser to advance to the next location, such as just past the end of statement, and attempt to continue.
The following is a snippet from the Plato grammar which demonstrates how the error handling and recovery occurs.
public Rule AdvanceOnFail =>
OnFail(Token.RepeatUntilPast(EOS | "}"));
public Rule IfStatement =>
Node(Keyword("if") + AdvanceOnFail + ParenthesizedExpression
+ Statement + ElseClause.Optional());
The IfStatement
rule indicates that if the keyword if
is matched, then it must be followed by an expression enclosed in parenthesis, then a valid statement, and then optionally an else
clause.
If a failure occurs after the keyword if
, then we know there was a parsing error. The parser will consume tokens until it passes the end of statement (EOS) marker (;
) or a closing brace (}
). The IfStatement
rule will return a valid ParserState
, but it will have a new ParserError
prepended to the linked list of errors.
Final Words
The best way to learn about Parakeet is to read the tests and samples grammars, and to give it a try. The source code and several example grammars are hosted on Github at https://github.com/ara3d/parakeet.
History
- 19th March, 2024: Initial version