Introduction
BNFUP is a class library that implements an object compiler that you can link to your own projects in order to provide the user the ability of type some source code in a language you have designed and create utility objects by compiling it. Think by example in the creation of arithmetic expressions or complex data input using scripts.
The library provides also editing services to create your own tools to design and edit the languages. It comes with an editor, in the project BNFUPEditor, that allows you to start using the editing features without writting any line of code from the beginning.
There are also three examples of implementation that you can use to understand how to link your own projects with the compiler services.
Background
In order to use the compiler, first you have to design a class library and a language with which you can create the objects, using BNF rules to do it. With the BNFUPEditor tool you can design the language and test it using the built-in compiler.
As in this article I will center in the source code of the library, you can read about the use of the editor visiting my blog: using the BNFUPEditor tool. Here you can read a spanish version of the same article.
In the Samples subdirectory of the solution, you can find some samples of languages, corresponding with the three examples implemented in the Expressions, ExpressionDrawer and CompilableDataTable projects.
The project is written in C# using Visual Studio 2015.
Using the code
Let's start by defining the syntax you have to use to write your own language, in BNF format:
<<rulelist>> ::= <rule> [<rulelist>]
<rule> ::= <rulename> '::=' <defrule> ';'
<rulename> ::= <ruleid>
| '<<' <identifier> '>>'
<defrule> ::= <defrulech> ['|' <defrule>]
<defrulech> ::= <item> [<defrulech>]
<ruleid> ::= '<' <identifier> '>'
<identifier> ::= {a-zA-Z} [<rid>]
<rid> ::= {a-zA-Z0-9} [<rid>]
<item> ::= <token>
| <ltoken>
| <charset>
| <ruleid>
| '[' <item> ']'
<token> ::= ''' {.} [<rtoken>] '''
| ''' '\'' [<rtoken>] '''
<rtoken> ::= {.} [<rtoken>]
| '\'' [<rtoken>]
<ltoken> ::= <token> [',' <ltoken>]
<charset> ::= '{' <charlist> '}'
<charlist> ::= '\}' [<charlist>]
| <char> [<charlist>]
As you can see, all rules must end with a semi colon. You can mark a part of a rule as optional using brackets, and create rules composed by some alternatives using the character | to separate them.
You can define tokens using single quotation marks, and character sets using braces and the syntax of regular expressions character sets.
The rule names are enclosed between the characters < and >. The main rule, with which the code parsing starts, is marked using the double characters << and >>.
The rules are separated of their definition with the token ::=.
The elements you can use in the definition of the rules are the following:
- token: a word that must appear in the source code at the corresponding position.
- List of tokens: a set of tokens, each of which, and only one, must appear in the source code at the corresponding position.
- Character sets: which define a set of characters that can appear in the corresponding position of the source code.
- Rule names: you can use anyone of the rules defined in the languaje, the rule you are defining too, to allow recursive definitions.
You can write your language definition in a text file and then open it with the rule editor, which will transform the text to the internal format of the application and will allow you to save it in the final binary format. With the editor you can also define which rules or rule elements will generate objects.
When an item that generates an object is reached, the object is created in an uninitialized state. When the element is totally parsed, the object is completed by passing it the corresponding portion of the source code (think by example in a number, which is parsed character by character, when the number text is complete, the compiler passes it to the number object to set their value).
If a new object is created within the definition of a rule that have created another object, it is considered a subordinate object, and it is passed to the main object once completed.
In order to link your own objects with the compiler engine, you have to implement some simple interfaces in your code. These interfaces are defined in the BNFUP.Interfaces namespace.
The interface that all your compilable objects must implement is ICompilableObject, defined as follows:
public interface ICompilableObject
{
string Text { get; set; }
bool AddItem(ICompilableObject item);
ICompilableObject Simplify();
void Test(Form fmain);
}
With the Text property, the compiler will provide the object the source code portion that has generated it.
With the AddItem method the compiler will provide the object another subordinated object, created within the rule before the main object completion. The object must return true if the object is accepted, or false otherwise, which results in a compilation error.
With the method Simplify you can return a simplified version of the object, or the object itself if there is no need of simplification. This method is called after object completion.
Finally, the Test method can be implemented to allow test the object in a compiler tool you have written or in the built-in compiler of the BNFUPEditor tool.
Your class library must have a class that implement the ICompilableObjectFactory interface, which is used to create the objects and provide them to the compiler. These class must have a constructor without parameters:
public interface ICompilableObjectFactory
{
ICompilableObject Object { get; }
void Init();
ICompilableObject CreateObject(string ctype, IRuleItem item);
}
The property Object is used to store the final object created by the compilation process.
Use the method Init to perform the initialization needed before compilation.
The CreateObject method is called every time the compiler must create an object. The ctype parameter is an unique identifier for the type of object to be created, and the item parameter is the item that fired the object creation.
You can use the compiler with the RuleTable class defined in the BNFUP.Rules namespace. The RuleTable.FromFile static method will create an instance of this class using a .bnf file generated by the BNFUPEditor tool.
To link the compiler with your class library, initialize the Factory property of the RuleTable instance with the ICompilableObjectFactory object.
Finally, use the Build method of the RuleTable instance to compile the source code, passed as a generic TextReader, so you can use a text file or the text typed by the user in a control.
In this link you can see examples of integration between the BNFUP.dll and your own class library. Here in spanish.
Compiler implementation
The compiler is implemented with some classes located in the BNFUP.Rules namespace. The types of items that construct the language infrastructure, and the classes that implement them are the following:
- Token and character sets: that are atomic elements. They are implemented in the classes Token and CharacterSet, which are derived for a common abstract base class named ItemBase.
- List of tokens: is a list that can only contain elements of the Token class. Implemented in the class TokenList. Allows to define a set of tokens, one of which must appear in the corresponding position of the source code.
- Item list: is a generic list of elements that can contain elements of any type. Their main function is to group elements so you can mark all of them as optional by marking the list as optional, instead of mark the elements itself, which would makes them optional everywhere they appear in the language definition. It is implemented in the class RuleItemList.
- Rules: is a special kind of list of items. There are two kind of rules, a simple rule, similar to the item list and implemented in the RuleChain class, and a rule composed by some alternatives, that can only contain objects of type RuleChain and is implemented in the class AlternativeRules. Both classes derive of the abstract base class RuleBase.
All of them implement the IRuleItem interface, in the BNFUP.Interfaces namespace, defined as follows:
public interface IRuleItem
{
bool Completed { get; set; }
bool Optional { get; set; }
string Description { get; }
IEnumerable<IRuleItem> Childs { get; }
string GetText();
void UpdateRule(string name, RuleBase rule, string current);
void ChangeRule(RuleBase oldrule, RuleBase newrule, string current);
void DeleteRule(RuleBase rule, string current);
bool Matches(Parser parser);
void Build(Parser parser);
}
Completed is an internal property used to mark the item as linked with the event system of the compiler, or to unlink from it giving it the value false.
Optional is used to mark the item as optional. Optional items can appear or not in the corresponding position of the source code.
Description: contains a description of the item.
Childs: enumerates the child items, if any.
GetText: Method used to obtain the source code portion that generates the item.
UpdateRule: is an internal method used during the construction of an object when the rules are read from a text file. If a rule is not already defined but is parte of the current rule which is built, a special class, TempRule, is used instead. When the process of the text file ends, the RuleTable object uses this method to update those rules with the true version of the rule.
ChangeRule: used to change one rule for another in the current item.
DeleteRule: eliminates the use of a rule in the current item, but doesn't delete the rule from the rule table.
Matches: used to test if the item matches in the current source code position. The class Parser controls the source code position and provides testing for tokens and character sets.
Build: builds the current item and makes the Parser advance in the source code or raise an exception if no match is found.
As all item types can create objects, there is also an interface defined for this task. ICompilableObjectGenartor, in the BNFUP.Interfaces namespace:
public delegate void ObjectCreatedHandler(ICompilableObjectGenerator item);
public interface ICompilableObjectGenerator
{
event ObjectCreatedHandler OnCreateObject;
event ObjectCreatedHandler OnCompleteObject;
event ObjectCreatedHandler OnDiscardObject;
string CompilableType { get; set; }
ICompilableObjectFactory Factory { get; set; }
ICompilableObject Object { get; }
void UpdateObject(ICompilableObject obj);
}
There are three compilation events to inform the RuleTable when an object is created (when the item starts their build process), completed (when the item ends parsing the code) or discarded (if the item is optional and doesn't match with the source code).
CompilableType: is the unique identifier of the object that this item generates. Items without value in this property doesn't generates any object.
Factory: is the ICompilableObjectFactory that generates the objects.
Object: contains the current compiled object generated by the item.
UpdateObject: used to update the item object version.
To provide edition services, there is the IEditableItem interface, defined in the BNFUP.Interfaces namespace.
public enum EditActions
{
Delete, Cut, Copy, Paste, Enlist, Extract, Free, Up, Down, New, Simplify, Alternatives
}
public interface IEditableItem
{
bool CanCopy { get; }
int ItemIndex { get; set; }
IEnumerable<ToolStripItem> EditOptions(IEditableItem parent, IEditableItem gparent, IRuleItemClipboard clipboard);
bool PerformAction(ToolStripItem action, IEditableItem parent, IEditableItem gparent, RuleTable rules, IRuleItemClipboard clipboard);
void AddChild(IRuleItem item);
void RemoveChild(int item);
void ReplaceChild(int item1, IRuleItem item2);
void InsertAfter(int item1, IRuleItem item2);
void MoveUp(int item);
void MoveDown(int item);
bool ValidateType(Type t, bool generic);
}
CanCopy: used internally to show or not the options to cut, copy and paste the object.
ItemIndex: the index of the item within their parent object.
EditOptions: enumerates all the options allowed to edit the current item, as ToolStripItem objects with the corresponding EditActions enum value in their Tag property.
PerformAction: used to execute the selected edit action.
ValidateType: used to admit or reject a type of object as a valid child. The generic parameter indicate if the type is for an existent object or an unitialized version of the generic class.
The other methods are to manage the list of child items or the position of the item in their parent list.
Other auxiliary interfaces
In the BNFUP.Interfaces namespace there are three more auxiliary interfaces:
public interface IItemDrawing
{
Color ForeColor { get; set; }
SizeF Measure(Graphics gr, Font fextra, Font fmain);
void Draw(Graphics gr, float x, float y, Font fextra, Font fmain);
}
public interface IRuleDrawing
{
SizeF MeasureRule(Graphics gr, Font fextra, Font fmain);
void DrawRule(Graphics gr, float x, float y, Font fextra, Font fmain);
}
public interface IRuleItemClipboard
{
bool ContainsData { get; }
void SetData(IRuleItem data);
IRuleItem GetData();
}
IItemDrawing and IRuleDrawing are used to draw the rules in the custom ListBox of the BNFUPEditor tool.
IRuleItemClipboard is a custom implementation to provide clipboard services.
Edition services in the RuleTable class
With this constructor you can build a RuleTable object from a TextReader containing the BNF rules of the language definition.
public RuleTable(TextReader rdr, ICompilableObjectFactory factory)
There is an event handler that you can use to get compilation messages:
public delegate void CompilerEventHandler(string ev);
public event CompilerEventHandler OnCompilationFeedback = null;
There are the following public properties:
public bool CaseSensitive { get; set; }
public bool DebugMode { get; set; }
public ICompilableObjectFactory Factory { get; set; }
public RuleBase Root { get; }
public IEnumerable<Type> ItemTypes { get; }
public IEnumerable<IRuleItem> SimpleItems { get; }
public IEnumerable<RuleBase> Rules { get; }
CaseSensitive: can be used to make the language case-sensitive or case-insensitive.
DebugMode: used to disable the message logging of the compiler. Set this property to false before save the RuleTable object, to avoid serialization errors due to the feedback event.
Factory: used to link the compiler with the object creator in your class library.
Root: contains the root Rule of the language.
ItemTypes: enumerate all the types on the library which implement the IRuleItem interface.
SimpleItems: enumerate all the atomic items defined in the language, i. e. Token and CharacterSet types.
Rules: enumerate all the rules in the language.
The public methods that provide editing features are the following:
public void AddRule(RuleBase rule)
public void RefreshRule(RuleBase newrule, RuleBase oldrule)
public void AddItem(IRuleItem item)
public void RemoveRule(RuleBase rule)
public void RemoveItem(ItemBase item)
public RuleBase GetRule(string name)
public IRuleItem GetItem(string name)
public void LinkItem(IRuleItem item)
public IRuleItem RestoreItem(IRuleItem item)
public ICompilableObject Build(TextReader rdr)
With the AddRule method you can add a new rule to the language definition.
RefreshRule is intended to use when you paste an item and you want to convert it to the canonical version stored in the list of rules or the list of atomic objects.
AddItem can be used to add a new token or character set to the language.
RemoveRule removes a rule from the RuleTable and all the instances of it in the remaining rules.
RemoveItem is the varsion of the previous function to remove tokens or character sets.
GetRule can be used to get a rule by their name.
GetItem is the version of this function for tokens and charcacter sets.
LinkItem is a method that links the event handlers in an item object to the compiler event handling system.
RestoreItem is used when you paste an item from the clipboard. This item is replaced with the canonical version stored in the item lists of the RuleTable class.
Build is the method that you have to call to start the compilation of the source code.
And that's all. I hope yhat anyone found useful this software for their own project.
Thanks for read!!!