Introduction
Often times, we need to extract all heterogeneous data from an XML file and fill out the corresponding structures in one go. For example, reading the configuration data on application's start up or importing the data from an external source. We're not interested in modifying the data or performing random access to its elements. All that is needed is just to parse the XML file once and initialize the corresponding data structures.
This article outlines an approach to apply recursively a simple parsing engine to import the data from the XML file.
Using the Code
Let's consider the following XML file containing Persons
and Cars
:
="1.0"="UTF-8"
<Data>
<Person>
<Name>Joe</Name>
<Age age="18"/>
</Person>
<Person>
<Name>Ray</Name>
<Age age="4"/>
</Person>
<Car make="Honda">
<Year>2015</Year>
<Color code="1">Blue</Color>
</Car>
<Car make="Nissan">
<Year>2014</Year>
<Color code="2">Grey</Color>
</Car>
</Data>
Its counterpart data structures might be defined in this way:
struct Person
{
std::string name;
std::string age; };
struct Car
{
std::string make;
std::string year;
std::string code;
std::string color;
};
For the sake of simplicity, all fields in structures are defined as string
s. Use your favourite method to convert string
s to a number or other type if needed.
It is required to parse the XML file and initialize Persons
and Cars
structures.
XmlParser
engine does just that. As it walks down the XML tree, it invokes a registered parser corresponding to the current XML element.
The parser function for an XML element is defined as a Boost.Function
, which gives a great flexibility of choosing a callable entity type:
typedef boost::function<bool ()> XmlParserFunc;
There is a map between XML element's name and its parser:
typedef std::map<std::string, XmlParserFunc> XmlParserMap;
The XML Parser engine is defined by XmlParser
class. It accepts XmlParserMap
in its constructor.
class XmlParser
{
public:
XmlParser(xmlTextReaderPtr& reader, const XmlParserMap& m);
~XmlParser();
bool Parse();
private:
bool MoveToNextElement(int entry_depth);
private:
xmlTextReaderPtr& m_xmlReader; const XmlParserMap& m_parseMap; };
The XmlParser::Parse()
method will traverse the XML tree starting from the current point and will invoke related parsers.
There are multiple ways to link the invoked parser method to the destination data object. One of them is to aggregate the target data object in a parser's class. For instance, the parser's class for a Person
:
class PersonXmlParser
{
public:
PersonXmlParser(xmlTextReaderPtr& reader, Person& out_entry);
bool Parse();
private:
void InitParserMap();
bool ParseName(); bool ParseAge();
private:
xmlTextReaderPtr& m_xmlReader; xmlparser::XmlParserMap m_parserMap;
Person& m_data; };
Its implementation is as follows:
PersonXmlParser::PersonXmlParser(xmlTextReaderPtr& reader, Person& out_entry) :
m_xmlReader(reader),
m_data(out_entry)
{
InitParserMap();
}
bool PersonXmlParser::Parse()
{
xmlparser::XmlParser parser(m_xmlReader, m_parserMap);
return parser.Parse(); }
void PersonXmlParser::InitParserMap()
{
m_parserMap["Name"] = boost::bind(&PersonXmlParser::ParseName, this);
m_parserMap["Age"] = boost::bind(&PersonXmlParser::ParseAge, this);
}
bool PersonXmlParser::ParseName()
{
return ReadStringValue(m_xmlReader, m_data.name);
}
bool PersonXmlParser::ParseAge()
{
return GetAttribute(m_xmlReader, reinterpret_cast<const xmlChar*>("age"), m_data.age);
}
The following snippet parses a Person
XML element, assuming that the XML Reader cursor is positioned at the beginning of the Person
element:
Person person;
PersonXmlParser parser(xmlReader, person);
parser.Parse();
If the sub-element of the XML element is a complex node itself, another related parser object can be used inside the parsing function. And so on recursively.
Points of Interest
Parse an XML tree containing heterogeneous elements using libxml2
Reader API.
The presented approach can be used to parse the XML tree by using another XML Reader as well. The XML parsing engine class XmlParser
should be adjusted in this case.
History