Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / C++

Parsing XML Tree Containing Heterogeneous Data

4.75/5 (4 votes)
27 Apr 2015CPOL2 min read 15.8K   179  
Parsing XML tree containing heterogeneous data using libxml2 Reader API

Introduction

Often times, we need to extract all heterogeneous data from an XML file and fill out the corresponding structures in one go. For example, reading the configuration data on application's start up or importing the data from an external source. We're not interested in modifying the data or performing random access to its elements. All that is needed is just to parse the XML file once and initialize the corresponding data structures.

This article outlines an approach to apply recursively a simple parsing engine to import the data from the XML file.

Using the Code

Let's consider the following XML file containing Persons and Cars:

XML
<?xml version="1.0" encoding="UTF-8"?>
<Data>
    <Person>
        <Name>Joe</Name>
        <Age age="18"/>
    </Person>
    
    <Person>
        <Name>Ray</Name>
        <Age age="4"/>
    </Person>

    <Car make="Honda">
        <Year>2015</Year>
        <Color code="1">Blue</Color>
    </Car>
    
    <Car make="Nissan">
        <Year>2014</Year>
        <Color code="2">Grey</Color>
    </Car>
    
</Data>

Its counterpart data structures might be defined in this way:

C++
struct Person
{
        std::string name;
        std::string age;  // convert string to integer using your favorite converter.
};

struct Car
{
        std::string make;
        std::string year;
        std::string code;
        std::string color;
};

For the sake of simplicity, all fields in structures are defined as strings. Use your favourite method to convert strings to a number or other type if needed.

It is required to parse the XML file and initialize Persons and Cars structures.

XmlParser engine does just that.  As it walks down the XML tree, it invokes a registered parser corresponding to the current XML element.

The parser function for an XML element is defined as a Boost.Function, which gives a great flexibility of choosing a callable entity type:

C++
// Parser function. 
// Parses the current XML element.
// Returns true in case of success, false otherwise

typedef boost::function<bool ()>      XmlParserFunc;

There is a map between XML element's name and its parser:

C++
// Maps an XML element to its parser function.

typedef std::map<std::string, XmlParserFunc> XmlParserMap; 

The XML Parser engine is defined by XmlParser class. It accepts XmlParserMap in its constructor.

C++
// XML Parser Engine
class XmlParser
{
public:
    XmlParser(xmlTextReaderPtr& reader, const XmlParserMap& m);
    ~XmlParser();

    // Parse the XML tree
    bool Parse();

private:
    // Move to the next element at the same tree depth
    bool MoveToNextElement(int entry_depth);
    
private:
    xmlTextReaderPtr&     m_xmlReader;  // libxml2 XML reader
    const XmlParserMap& m_parseMap;     // provided parsing map
};

The XmlParser::Parse() method will traverse the XML tree starting from the current point and will invoke related parsers.

There are multiple ways to link the invoked parser method to the destination data object. One of them is to aggregate the target data object in a parser's class. For instance, the parser's class for a Person:

C++
// PersonXmlParser.h

    // Parser for a 'Person' XML entry
    class PersonXmlParser
    {
    public:
        // Parameters:
        // reader    - XML reader
        // out_entry - target data entry
        PersonXmlParser(xmlTextReaderPtr& reader, Person& out_entry);

        // Parser for the 'Person' element.
        // Can have operator() signature as well.
        bool Parse(); // XmlParserFunc

    private:
        void InitParserMap();

        // 'Person' parsers.
        // On call the XML Reader cursor is positioned at the element start's.
        bool ParseName(); // parser for the 'Name' element
        bool ParseAge();  // parser for the 'Age'  element

    private:
        xmlTextReaderPtr& m_xmlReader;          // pointer to libxml2 Reader
        xmlparser::XmlParserMap m_parserMap;        // maps the element name to its parser function

        Person&    m_data; // output data
    };

Its implementation is as follows:

C++
// PersonXmlParser.cpp
PersonXmlParser::PersonXmlParser(xmlTextReaderPtr& reader, Person& out_entry) :
        m_xmlReader(reader),
        m_data(out_entry)
{
    InitParserMap();
}

// Parser for a 'Person'
bool PersonXmlParser::Parse()
{
    // use XML parsing engine
    // provide it with the parsing map
    xmlparser::XmlParser parser(m_xmlReader, m_parserMap); 

    return parser.Parse(); // invoke the XML parsing engine
}

// Initialize parser map
void PersonXmlParser::InitParserMap()
{
    // Add bindings to the parsing map
    m_parserMap["Name"]  = boost::bind(&PersonXmlParser::ParseName, this);
    m_parserMap["Age"]  = boost::bind(&PersonXmlParser::ParseAge, this);
}

// Parser for a 'Name' element
bool PersonXmlParser::ParseName()
{
    // ReadStringValue uses libxml2 Reader API
    return ReadStringValue(m_xmlReader, m_data.name); 
}

// Parser for an 'Age' element
bool PersonXmlParser::ParseAge()
{
    // GetAttribute uses libxml2 API
    return GetAttribute(m_xmlReader, reinterpret_cast<const xmlChar*>("age"), m_data.age); 
}

The following snippet parses a Person XML element, assuming that the XML Reader cursor is positioned at the beginning of the Person element:

C++
Person person;
PersonXmlParser parser(xmlReader, person);

parser.Parse(); // will parse the current 'Person' element

If the sub-element of the XML element is a complex node itself, another related parser object can be used inside the parsing function.  And so on recursively.

Points of Interest

Parse an XML tree containing heterogeneous elements using libxml2 Reader API.

The presented approach can be used to parse the XML tree by using another XML Reader as well. The XML parsing engine class XmlParser should be adjusted in this case.

History

  • Initial version

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)