Introduction
This article focuses on parsing custom XML-based languages using a user-defined schema. (Sorry for possible language mistakes, I am not a native speaker).
If you want to parse an XML document with a specific structure, you don't need to deal with the XmlDocument
/XPathNavigator
class to load values using XPath expressions (and then check the value for null reference, validate it etc.); all you have to do is to define a parser schema. You can specify the elements and attributes the element must have, the maximum number of optional nodes that should be parsed (or the minimum number of optional nodes that must be present) or you can validate the node’s value at the time of schema evaluation. The most advanced "feature" of parser schemas is the possibility to transform parsed mark-up into an object at the time of schema evaluation. You provide a transformer (an instance of a class implementing the specific interface) and when the evaluation is done, you get the results.
Here comes Wayku
Then, I decided to develop my own XML-based language for the specific purpose. To parse the mark-up, I used to use the XPathNavigator
class. Therefore, I had to load every value using an XPath query and then use its value. And here comes the problematic part – I had to check for the missing nodes, and then validate them. The resulting code was quite ugly. It was not very elegant because of all the validation logic.
So I did a small brainstorming on how to simplify document parsing and node validation. Voila, Wayku was born.
I am sure you are wondering what "Wayku" means. It is a nation (house) from the Dune universe. Waykus look after passengers on space ships while staying free on them. Therefore, Parser Schemas will serve you if you provide them with the schema. You pay (provide), they serve.
Possible usage
You may use parser schemas for:
- parsing of custom XML-based languages.
- validation of the XML document structure.
- XML document transformation.
- parsing of custom XML-based configuration files.
Parser schemas principles
General information
A parser schema consists of a couple of rules. A rule specifies node name, namespace and the type of XML mark-up it can match (parse).
Some rules hold information about the nodes that must or should be present as their children. These rules are called parental rules. Parental rules match elements. All parental rules are, surprisingly, derived from the ParentalRule
class.
Another kind of rules are called non-parental rules. These rules do not hold information about the child nodes, thus these rules are suitable for attributes, processing instructions and text node parsing. All non-parental rules are children of the NonParentalRule
class.
The origin of all rules, parental or non-parental, is the Rule
class.
Usage
To construct a parser schema, you have to define a rule that will represent (and parse) the parent (usually the root) node of your XML document (note that you do not need to parse the document from its root). The parent node must be an instance of a class derived from the ParentalRule
class.
Then you assign child rules to the parent node. Child rules can have their child rules and those can have their child rules and those can have their child rules…ad infinitum. (Actually, the limit is the number of available slots on the call stack because there is a recursive call in the child rules evaluation process. Don’t scare, the stack is deep enough to handle various bizarrely structured documents.)
After you construct the schema, you evaluate it. If your document does not conform to the schema, exception will be thrown. A few exceptions can occur; like the document does not contain the required node, contains less optional elements than required - I mean asked for - or a value of node is not valid (it’s your handler that judges the validity).
When the schema evaluation is finished (it’s a moment, trust me), you can retrieve the required information.
Advanced principles
Say you want to validate the node value and if you find it invalid, stop the evaluation – done; say you want to transform evaluated data into an object when the evaluation of the node is finished – done; say you want to parse exotic kinds (for example comments) of XML structures but there are no rules matching them – done, just define your own rule.
Node value validation
If you want to validate the node value at the evaluation time, you can use a rule implementing the IValidatingRule
interface. Then you can provide an instance of a class implementing the IValueValidator
interface and that’s all.
IValueValidator
interface declares a single method – its signature is bool IsValidValue(string)
. I was thinking about implementing the validator as a delegate but I chose the interface because of a future need to provide other validation functionalities.
When the evaluation engine calls the ValidateValue
method, it will get the node value that you can validate. If your method returns true
, everybody is happy and the evaluation continues. Therefore, if you decide to punish that nasty element with invalid value and return false
, the evaluation aborts.
Rules supporting the value validation are rules parsing attributes, text content and text elements. All these rules are derived from the ValidatingNonParentalRule
class (implementing the IValidatingRule
interface). As you can see, these rules cannot demand any child nodes.
Node transformation
If you want to make your life easier, you can use the so-called transforming rules. Transforming rules represent a special sort of rules; when these rules are evaluated, the evaluation engine calls the methods from the INodeTransformer
interface assigned to them. You provide the transformation logic and the engine does the rest. Note that the transformation is initiated after all the child rules of the transforming rule are evaluated so that you can access them and their value from your transformation code.
Transforming rules are generic classes so after evaluation, you get strong-typed results.
Instead of retrieving and interpreting values of child rules of ordinary rules, you can get a single, ready-to-use object from the transforming rule.
Imagine that your XML document describes a bookshelf. You can construct the transforming rule describing the structure of the mark-up representing a book and then evaluate it. When everything goes right, you will get the instance of the Book
class (you had designed before). Easy to use, powerful solution, isn’t it?
Custom rules
As I wrote earlier, you can define your own rules. If you are not satisfied with built-in rules, you can define your own parental, non-parental, validating or transforming rules.
Just derive your rule from one of the abstract classes parser schemas come with. For parental and non-parental rules, as you know, you can derive your class from the ParentalRule
class and NonParentalRule
class, respectively.
The only thing that must be specified, when defining a custom rule, is the type of a node that will be matched by the rule. Therefore, if you would like to match comments, you specify the comment type. You can find a list of all the available node types in the System.Xml.XPath.XPathNodeType
enumerator.
After evaluation
When a rule is evaluated, you get an instance of either EvaluatedParentalRule
or EvaluatedNonParentalRule
class when the rule is derived from ParentalClassRule
or NonParentalRule
class, respectively.
Under the hood
Parser schemas use the XPathNavigator
class to navigate through the document. When the evaluation engine iterates through node children, it calls the bool IsMatch(XPathNavigator)
method to determine whether the child can be parsed (matched) by the rule. If so, the engine calls the T Evaluate(XPathNavigator)
method of the EvaluableRule<T> where T : EvaluatedRule
class derived from the Rule
class.
Using the code
Bookshelf
Imagine you are developing a bookshelf catalogue. To store the book metadata, you save them as the XML document with a defined structure:
="1.0"
<bookshelf>
<information>
<owner>Bob</owner>
</information>
<books>
<book>
<title>God emperor of Dune</title>
<author>Frank Herbert</author>
<rating>FiveStars</rating>
</book>
<book>
<title>Heretics of Dune</title>
<author>Frank Herbert</author>
<rating>FiveStars</rating>
</book>
<book>
<title>Neutron star</title>
<author>Larry Niven</author>
<rating>FiveStars</rating>
</book>
</books>
</bookshelf>
We also have the Book
class representing the book metadata.
namespace TomasDeml.Samples.Bookshelf
{
enum Rating
{
OneStar,
TwoStars,
ThreeStars,
FourStars,
FiveStars
}
class Book
{
public string Title;
public string Author;
public Rating Rating;
public Book() { }
public Book(string title, string author, Rating rating)
{
this.Title = title;
this.Author = author;
this.Rating = rating;
}
public override string ToString()
{
return String.Format(
"Book '{0}' by '{1}' rated {2}.",
this.Title, this.Author, this.Rating);
}
}
}
Now imagine you want load and validate the book metadata. Just create the schema describing the document structure, evaluate it and get the results.
namespace TomasDeml.Samples.Bookshelf
{
class Program
{
private static void LoadBooksUsingOrdinarySchema(
XPathNavigator navigator, List<Book> booksList)
{
ElementRule bookshelf = new ElementRule("bookshelf");
ElementRule information = new ElementRule("information");
information.RequiredChildren.Add("owner",
new TextElementRule("owner"));
bookshelf.RequiredChildren.Add("information",
information);
ElementRule books = new ElementRule("books");
bookshelf.RequiredChildren.Add("books", books);
ElementRule book = new ElementRule("book");
book.RequiredChildren.Add("title",
new TextElementRule("title"));
book.RequiredChildren.Add("author",
new TextElementRule("author"));
book.RequiredChildren.Add("rating",
new TextElementRule("rating",
EnumValueValidator.Create(typeof(Rating))));
books.OptionalChildren.Add("book", book);
ParserSchema schema = new ParserSchema(bookshelf);
EvaluatedParserSchema results =
schema.Evaluate(navigator,
ParserSchema.ParentNodeMatchOption.NavigatorPointsToParentNode);
EvaluatedParentalRule eBooks =
results.ParentNode.LookupParentalChild("books",
EvaluatedParentalRule.LookupLocation.RequiredChildren);
foreach (EvaluatedParentalRule eBook in
eBooks.LookupChildren("book",
EvaluatedParentalRule.LookupLocation.OptionalChildren))
booksList.Add(new Book(eBook.RequiredChildren["title"].Value,
eBook.RequiredChildren["author"].Value,
(Rating)Enum.Parse(typeof(Rating),
eBook.RequiredChildren["rating"].Value)));
}
}
You can also use transforming rules and create Book
objects at the evaluation time. To do this, you have to implement the INodeTransformer
interface.
namespace TomasDeml.Samples.Bookshelf
{
class BookTransformer : INodeTransformer<Book>
{
public Book Transform(EvaluatedRule evaluatedNodeRule)
{
EvaluatedParentalRule rule =
(EvaluatedParentalRule)evaluatedNodeRule;
Book book = new Book();
book.Title = rule.RequiredChildren["title"].Value;
book.Author = rule.RequiredChildren["author"].Value;
book.Rating =
((EvaluatedElementTransformationRule<Rating>)
rule.RequiredChildren["rating"]).TransformationResult;
return book;
}
}
}
Then you can construct and evaluate the schema with transforming rules. Changes are given in bold.
namespace TomasDeml.Samples.Bookshelf
{
class Program
{
private static void LoadBooksUsingTransformingRules(
XPathNavigator navigator, List<Book> booksList)
{
ElementRule bookshelf = new ElementRule("bookshelf");
ElementRule information = new ElementRule("information");
information.RequiredChildren.Add("owner",
new TextElementRule("owner"));
bookshelf.RequiredChildren.Add("information", information);
ElementRule books = new ElementRule("books");
bookshelf.RequiredChildren.Add("books", books);
Rule transforming element 'book' into the Book object
ElementTransformationRule<Book> book =
new ElementTransformationRule<Book>("book",
new BookTransformer());
book.RequiredChildren.Add("title",
new TextElementRule("title"));
book.RequiredChildren.Add("author",
new TextElementRule("author"));
book.RequiredChildren.Add("rating",
new ElementTransformationRule<Rating>("rating",
NodeValueOptions.GetValue | NodeValueOptions.RequireValue,
new EnumValueTransformer<Rating>()));
books.OptionalChildren.Add("book", book);
ParserSchema schema = new ParserSchema(bookshelf);
EvaluatedParserSchema results = schema.Evaluate(navigator,
ParserSchema.ParentNodeMatchOption.NavigatorPointsToParentNode);
EvaluatedParentalRule eBooks =
results.ParentNode.LookupParentalChild("books",
EvaluatedParentalRule.LookupLocation.RequiredChildren);
foreach (EvaluatedElementTransformationRule<Book> eBook
in eBooks.LookupChildren("book",
EvaluatedParentalRule.LookupLocation.OptionalChildren))
booksList.Add(eBook.TransformationResult);
}
}
}
There are also rules parsing processing instructions, attributes and text content.
Building the code
To build the code, you must have at least Microsoft Visual Studio 2005/Microsoft Visual C# Express 2005 BETA 2 installed.
To build the code without Visual Studio, you can use the msbuild tool supplied in the Microsoft .NET Framework 2.0 package. Just open cmd.exe and type:
msbuild /p:Configuration=Release ParserSchemas.sln
Licence
Parser schemas are distributed under the LGPL license.
History
- 7/30/05 – first Alfa.
- 9/1/05 – v1.0.0.0 released.
- 9/1/05 - v1.0.0.1 released (minor bug fix in the evaluation engine).
- 9/3/05 - v1.0.0.2 released (minor perf improvement).
- 10/18/05 - v1.1.0.0 released (added the rule linking support and some events).