Introduction
I think your first thought about this article would be that: "oh, another tool to parse XML like MSXML". In fact, this article is based on MSXML. What I will present for you is not a general XML parser, but a generator to create a specific XML parser. The purpose of my article is not to teach you some knowledge about a grammar parsing technique, but to provide you some idea of auto code generation through an XML parser generator. XML parser may not be of any use in your programming area, but that does not matter, if you could get a fresh feeling at the end of my article, it will also help in your future exciting programming life.
Background
Faced with plenty of XML files, I have to write plenty of code to retrieve information from them. Even powered by MSXML SDK and XPATH technique, I have to say, the work is hard. In fact, it will be quite boring and error-prone to write the following code:
IXMLDOMNodePtr psNode =
m_pXmlDoc->selectSingleNode(_T("/rss/channel/description"));
psNode = m_pXmlDoc->selectSingleNode(_T("/rss/channel/language"));
...
The sample XML snippet is given below:
<rss version="0.91">
<channel>
<description>XML.com features a rich mix of information
and services for the XML community.</description>
<language>en-us</language>
<item>
<title>Normalizing XML, Part 2</title>
</item>
<item>
<title>The .NET Schema Object Model</title>
</item>
</channel>
</rss>
The problem here is that if I need the XML node's value, I have to write down each XPATH to get it. For each kind of XML file, there should be a parser, very simple, but quite boring to implement.
The solution - Auto Code Generation
As a programmer, I am both passionate and lazy. I am too lazy to write a single line of the boring code above but I am very passionate to figure out a way to generate the code automatically. Here is my solution:
- Write an algorithm to generate XPATH from XML files.
- Write a template parser.
- Fill XPATH into the template parser.
Since the XML file's schema is often not at hand, there will be some difficulties about how to figure out whether a XML node (e.g., /rss/channel/item) belongs to a structure. Current solution is that if its occurrence is greater than once, it will be treated as structure, otherwise a single node.
Since my language is C++, in each parser, the structure node's values are put into a STL vector, while the single node's value is retrieved by a defined enum type. The programming language for the parser is not important, and you can modify the generator to generate parsers in VB/Java/C#, whatever you like :-)
Points of Interest
As I said from the beginning, what I present for you is not only a code but most importantly, it is an idea (Auto Code Generation) to save your energy and make our programming life easier and more exciting. So, if you have the same experience (not limited to XML area), please contact me freely, and it will be my pleasure to offer my advice. As it goes: "You have an apple, I have an apple; if we exchange, then we still have one each. You have an idea, I have an idea; if we exchange, then we will have two each!"
History
The algorithm to judge whether a node is a structure was revisited to meet such use case: "/items/item", though it may appear in an XML file only once. I think it is more like a structure than a single node.