Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / web / HTML

XML: Include a Flexible Parser in Your C++ Applications

4.84/5 (140 votes)
26 Jun 2014CPOL16 min read 2   9.3K  
Free, portable, compiler-independent XML library in C++

 

Includes:

  • XML class source files (XML.CPP , XML.H , MIME.H)
  • Documentation (XML.CHM)
  • Testing project XMLTEST (cpp/sln/vcproj)
  • The following compiled applications: TXML.EXE for Win32, XMLPPC.EXE for Windows Mobile 5+ devices.
  • XDB.CPP, demonstrating the usage of XML library in creating a configuration dialog box.
  • XML.JAVA, experimental implementation of the library in Java. 

Introduction

Are you tired of the many non-portable XML solutions around? Try my library. It works in any OS and in any compiler. No MFC, no COM, no global variables, Plain, pure C++!

Features

  • Portable, basic functionality works in any environment - I've tested it under Microsoft Visual C++ 2005, Borland, GCC, Codewarrior, Pocket PC 2003+ and with Windows, Linux, Windows Mobile and Symbian operating systems
  • Class-based manipulation or INI-style wrappers
  • UTF-8 and (win32) UTF-16 read/write support
  • Import from file, memory, or (Win32) URL
  • (Win32) Read/save encrypted XML files; custom encryption
  • Export to file, memory or (Win32) registry key
  • Database queries (Win32)
  • Allows XML memory compression
  • Tests for integrity
  • Allows for "on-the-fly" variable creation and return value
  • Element/variable/comment/content creation, removal, sort, moving, comparison
  • Allows copy/paste of an XML element to and from Windows Clipboard as text
  • Ability to import database using ADO (Win32)
  • JSON Converter
  • Supports the &vars;
  • Save/load binary data in variables (requires mime.h)
  • Supports the CDATA
  • Supports element Unloading and Reloading
  • XML element updating
  • Supports temporal variables
  • Supports binary input/output
  • HTML help
  • Java Implementation
  • Supports per element encryption/decryption (under Windows), with both symmetric keys and asymmetric (certificates).
  • Supports per element signing/verification (under Windows) with certificates.
  • x64 compatibility
  • Experimental JSON Support. 
  • 2 Versions: With STL / Without STL.

License

Free, for any kind or freeware, shareware, commercial, or whateverware project, as long as you:

  • Credit me in your application's documentation and/or the About box
  • Register at my forum (http://www.turboirc.com/forum) so you receive updates and news about the library
  • Drop me a note via the forum for the name of your application (if it will be released) so I can link to you.

No STL Version

STL and exception handling are not available yet in any C++ implementation - for example, no STL exists in Symbian OS SDK. Therefore, I've decided to use my own Z<> class (which is defined in xml.cpp) in order to manipulate the buffers.

No STL Version with STL extensions 

If you use the non STL Version but still have STL enabled, you can define  XML_USE_STL_EXTENSIONS before including xml.h. This allows XMLVariable and XMLContent to return a std::string of their values (so you don't have to query the size first), with their member GetValueS(). 

STL Version

For new code, you can define XML_USE_STL before including xml.h to use STL. This results in faster and safer code, but note that it is not 100% compatible with the non STL version, so you will have to make some changes if porting code from the non STL version to the STL one.
The STL version uses std::vector, std::string and standard algorithms.

Manipulating XML Using the Classes

There are 7 classes:

  • XML manages the XML file: opens, saves, exports, etc.
  • XMLHeader manages the XML header (XML headers can include comments)
  • XMLComment manages an XML comment
  • XMLContent manages an XML content
  • XMLElement manages an XML element
  • XMLVariable manages an XML variable
  • XMLCData manages an XML custom data

I won't describe all the member functions of these classes because there is already a description in my help file found in the zip. Here, I will demonstrate simple usage of them. Please note that the library doesn't use exception handling or STL because these two are not always found in C++ implementations (i.e. Symbian).

The code below loads an XML file, then checks for integrity and compresses memory. test.xml is the sample file I used. Note that the element/variable names are case sensitive.

C++
XML* a = new XML("f.xml");        // load from file

XML* a = new XML("<blah f="\">",1);    // load from memory from ASCIIZ string

XML* a = new XML("http://www.some.com/files.xml",2);    // load from URL(Win32)

ASSERT(a->IntegrityCheck() == true && a->ParseStatus() == 0);
a->CompressMemory();    
//Get 3rd element's name, and its variable "v" value.
//Also set new variable "tz" with normal and with 'on the fly' mode
char y[100] = {0};

a->GetRootElement()->GetChildren()[0]->GetElementName(y);
// now y == "Cfg"
a->GetRootElement()->GetChildren()[0]->FindVariableZ("v")->GetValue(y);
// now y == "Cfg"
// Create "tz" in the 'normal' mode
XMLElement* e = a->GetRootElement()->GetChildren()[0];
XMLVariable* v = new XMLVariable("tz","some value");
e->AddVariable(v);
// now do not delete v, it is owned by e
// Create "tz" on the fly
a->GetRootElement()->GetChildren()[0]->FindVariableZ(    "tz",true)->
    SetValue("some value");
// FindVariableZ(x,true) creates the var if doesn't exist!
// Create Comments and Contents in the same way.
// Use 0 instead of y to get the # of bytes required </blah>
// <blah f="\">for the returned string.
// Let's save/export:
if (a->IntegrityTest()) 
{ 
a->Save(); // Saves back to file 
a->Save("new.xml"); 
a->Export(fp,1,0,0);    // Save to a fp. You can also export to memory
// or save only one element by calling XMLElement :: Export. 
delete a;
// bye bye; </blah>
// <blah f="\">Note that the destructor doesn't save the file by default,
//unless you call XML :: SaveOnClose(). }</blah> 

Manipulating XML Files using INI-Style Wrappers

Instead of WritePrivateProfileString, you now have some INI-style functions:

  • XMLSetString
  • XMLGetString
  • XMLSetInt
  • XMLGetInt
  • XMLSetBinaryData
  • XMLGetBinaryData
  • XMLSetFloat
  • XMLGetFloat

These functions can work standalone or with an already opened XML object. When a valid XML object is passed to them, they modify it. When a file name is passed to them, they create an XML object, load the file, read/write it and then save it back, pretty much as the INI file functions.

Because these functions accept their elements as a string with \ (for example, Cfg\\Amplify), you can only manipulate XML files that have unique element names. Otherwise, corruption will occur.

Now let's try getting/setting some data to our XML file:

C++
char y[1000] = {0};
XMLGetString("Cfg","v","",y,1000,"test.xml");    
    // This gets 'Bowlingy' to y.
XMLGetString("Cfg\\Amplify","V","not_found",y,1000,"test.xml");
        // This gets 'not_found' to y. Variables/Elements are case sensitive!
XMLSetString("A\\B\\C","v","hahaha","test.xml");    
    // Elements A B C are created!
Fread(y,1,100,some_file);
XMLSetBinaryData("A\\B\\C","v",y,100,"test.xml");    
    // Binary data can be saved/read, encoded with Base64 (mime.h)

Note that all these string functions require a UTF-8 string. In case you are under Windows, you can also call XMLSetString with wchar_t values, which the library automatically converts to UTF-8 by using WideCharToMultiByte.

Unloading and Reloading

Use XMLElement :: UnloadElement, ReloadElement to temporarily save an element to a memory file (check the help file for details). This enables you to manipulate a huge XML file without wasting your RAM.

Soon to implement XML :: PartialLoad().

Item Borrowing

Use XMLElement :: BorrowItem to add a XMLElement* mirror from another XMLElement*. This is an advanced feature that must be used with caution, or you will crash your application with a big nice stack overflow. Read the help file for details.

Temporal Variables

A temporal element is an XMLElement (or an XMLVariable) that simply has the 'temporal' flag on. You can set or query this flag by calling SetTemporal and GetTemporal member functions. By default, elements and variables are not temporal.

You can also create a temporal element or variable by a constructor flag. Also, XMLElement :: FindElementZ() and XMLElement :: FindVariableZ, which can create an element/variable on the fly, can also mark it as temporal.

When you call XMLElement :: RemoveTemporalElements(bool Deep), all temporal children elements are removed. If Deep is true, all temporal elements of all its children are also removed. If an element is temporal and it is removed, then all children of it are of course removed, even if not marked as temporal. Therefore, marking, say, your root element as temporal will result in the destroyment of the entire XML file when you call RemoveTemporalElements().

When you call XMLElement :: RemoveTemporalVariables(bool Deep), all temporal variables of this element are removed. If Deep is set to true, all temporal variables of all its children are also removed.

When you call XML :: RemoveTemporalElements(), it calls XMLElement::RemoveTemporalElements(true) and XMLElement::RemoveTemporalVariables(true) for the root element.

Note that unless you remove the temporal elements manually using the above functions, they are not removed - they are considered normal elements/variables and they are saved or exported normally.

Using Unicode Strings with the Library

The library works with UTF-8, which means that the strings passed and returned are char*s. In order to pass a Unicode string to the library, you must convert it manually WideCharToMultiByte(CP_UTF8,...);. I have created a very simple wrapper that can do that as a class:

C++
class W{public:W(const wchar_t* x){int y = wcslen(x);int wy = y*2 + 100;
we = new char[wy];memset(we,0,wy);WideCharToMultiByte(CP_UTF8,0,x,-1,we,wy,0,0);
}~W(){delete[] we;}operator char* (){return we;}};

So if you have a Unicode string x and you want to use it in my library, you would use W(x) instead, which converts your Unicode string to a UTF-8 string, uses it via the operator and the destructor frees it.

64-bit Compatibility

The library is 64-bit compatible; you will be able to compile and use it under any 64-bit compiler without problems.

Element Updating

You will at times need to update an XMLElement with the elements and variables of another element.

C++
int UpdateElement(XMLElement* NewEl,bool UpdateVariableValues = false);

This will:

  • Try all variables of NewEl. If any of them also exists in your current element, then if UpdateVariableValues == true, the function will update the variable. If the variable of NewEl does not exist into your element, it is copied.
  • Try all elements of NewEl. If any of the elements do not exist in your current element, it is copied. If it exists, then the function calls UpdateElement for that element against the child element of NewEl, resulting in a recursive update of all grand children and variables.

Database Import with ADO

Use XML :: ImportDB() to import a database. You have to mess with two structures:

C++
struct IMPORTDBTABLEDATA   {   char name[256];   char itemname[100];   
int nVariables;   char** Variables;   char** ReplaceVariables;   };
struct IMPORTDBPARAMS   {   char* dbname;   char* provstr;   
int nTables;   IMPORTDBTABLEDATA* Tables;   };

You fill the IMPORTTDBPARAMS with the name (optional) , the provider string (check ADO documentation for details), the number of tables you wish to import, and then you fill an IMPORTDBTABLEDATA structure for each of these tables - containing the table name, the XMLElement* item name to store, the number of variables to take (columns from the table) , and a double pointer to the variable names and the variable names to be stored. For an example, see xmltest.cpp.

Database Queries

You can use XMLElement :: XMLQuery() which accepts an expression to test, the deep to search (-1 if all the child elements are to be queried), and a pointer to the returned XMLElement*s (which are not duplicated; the function just copies their pointers to your array). An example below:

C++
XMLElement* e = ... ; // get this from somewhere
int nC = e->GetAllChildrenNum();
XMLElement* a = new XMLElement*[nC];
memset(a,0,sizeof(XMLElement*)*nC);
int nR = e->XMLQuery("some_var == \"*5*\"",a,-1);

for(int i = 0 ; i < nR ; i++) 
{ ... 
    // use a 
}
delete[] a;

The above code returns pointers to all children elements that have a variable with the name 'some_var' which has a '5' inside (regular pattern expressions are supported). For more, see XML.CHM.

STL Mode Changes to XMLElement

  • Note that there are also minimal changes to XMLHeader, XMLContent and XMLComment and XMLCData, but they are transparent to the application. The following list summarizes the changes you should have in mind if using the STL version.
  • Now the copy constructor and the operator = allows to duplicate or otherwise copy an XMLElement directly.
    C++
     XMLElement(const XMLElement&);      XMLElement& operator =(const XMLElement&); 
  • The following functions use a reference:
    C++
    XMLElement& operator[](int);
    XMLElement& AddElement(const char*,int p = -1,bool Temp = false);
    XMLElement& AddElement(const XMLElement&,int p = -1);int Compare(XMLElement&); 

    These functions work with a reference instead of a pointer. Also, AddElement has been extended to include the insert position (or -1 to add to end), and the InsertElement has been removed.

  • C++
    int RemoveElementAndKeep(unsigned int i,XMLElement* el);

    This stores the removed element to the passed pointer.

  • Item unloading (UnloadElement/ReloadElement) is not supported.
  • Item borrowing is not supported.
  • Item sorting uses std::sort.
  • XMLElement* Duplicate(XMLElement* = 0) still returns a pointer, not a reference.
  • GetComments(), GetContents(), GetVariables(), GetCDatas() and GetChildren() return a std::vector<>. In addition, the functions AddComment(), AddCData(), AddVariable(), AddElement() and AddContent() return a reference to the item added.
  • GetAllChildrenNum() still returns pointers.

Symmetric Element Encryption/Decryption

Starting from version 0x158 under Windows, the library provides symmetric (password-based) encryption/decryption functions for elements. XMLElement provides 4 new functions, two to encrypt/decrypt a child element in place, and two to self-duplicate into encrypted/decrypted forms. The entire element is encrypted, including all children elements, variables, comments and contents.

An encrypted XML element is a normal XML element with 1 content, which contains all the contents of the element in an encrypted form. You can manipulate an XML file in the same way, no matter if there are encrypted elements inside it or not.

  • C++
    XMLElement* XMLElement :: Encrypt(const char* pwd);
  • C++
    XMLElement* XMLElement :: Decrypt(const char* pwd); 

These two functions encrypt/decrypt themselves in a new returned XMLElement (Remember to delete this or assign it to another XMLElement). AES-256 and SHA-1 hashing is used. If for any reason encryption or decryption fails, they return 0.

  • C++
    bool EncryptElement(unsigned int i,char* pwd);
  • C++
    bool DecryptElement(unsigned int i,char* pwd); 

These encrypt the specified child element in place, returning true on success and false on failure.

Each encrypted XML element can be parsed like a normal XMLElement with nothing but 1 content data. Please note that the password is not stored, so if you lose the password, the XML data will be inaccessible.

The library uses the CryptoAPI to encrypt/decrypt data.

Element Signing/Verification

Version 0x15A provides 3 member functions for Signing/Verification:

  • C++
    bool XMLElement::SignElement(unsigned int i,PCCERT_CONTEXT pCert);
    
  • C++
    bool XMLElment::RemoveSignature(unsigned int i);
    
  • C++
    bool XMLElement::VerifyDigitalSignature(unsigned int i,PCCERT_CONTEXT* ppCert);
    

The SignElement signs the element with index i using the supplied Certificate. The element signature is added as a binary value with the name __signature__ in the element. If a signature already exists, the function fails. If the certificate is not valid, or there is no private key, or another error occurs, the function fails.

The RemoveSignature function removes the signature from the element (merely removing the variable with the name __signature__).

The VerifyDigitalSignature function verifies the element with index i and returns the certificate that matched the signature. The certificate is not necessarily trusted; You have to actually check the certificate chain to verify its source.

If index i is -1, then these functions apply to their own element.

Asymmetric Encryption/Decryption

Version 0x15B provides 2 member functions for asymmetric encryption/decryption:

  • C++
    XMLElement* EncryptElement(unsigned int i,PCCERT_CONTEXT* pCert,int nCert);
  • C++
    XMLElement* DecryptElement(unsigned int i,PCCERT_CONTEXT* ppCert);

The EncryptElement encrypts the element with index i (or itself if i == -1) using the supplied Certificate list. The function returns an XMLElement* which contains the encrypted representation of the entire XMLElement, or 0 if an error occurs.

The DecryptElement function decrypts the element with index i (or itself if i == -1), using any certificate found in the "Personal" store. If ppCert is not null, it returns the certificate used to decrypt the element. It returns an XMLElement* that is the original element.

Note that, unlike symmetric encryption that operates on password, asymmetric encryption requires the public key of the certificate to encrypt and the private key to decrypt. Therefore, if you e.g. encrypt an element using one of the CA root certificates in your PC, the element will be encrypted all right, but without the private key you will never be able to decrypt it.

Binary Input/Output

The XML file can be pretty large and it could take a while to load. The library is optimized for speed, but if maximum performance is necessary, you can try the ImportFromBinary and ExportToBinary functions. XML, XMLElement, XMLVariable, XMLComment, XMLContent, XMLCData and XMLHeader, all provide the above two functions:

C++
XML* x = new XML("somefile.xml");
BDC b = x->ExportToBinary();
// BDC is just a data container with member .size() to get the
// size of the data and p() to get a pointer to the data.
x->ImportFromBinary(b);

Using binary input and output will render your binary XML files useless when I upgrade my library in a way that the binary input and output is changed, but for long-term usage you can use it.

JSON Parser

XML is nice but some data still appears in JSON. XML.CPP now includes an experimental JSON converter.  

Java Implementation

OK, C++ is perfect but I am now writing for Android which uses Java. XML.JAVA (included in the zip) provides a Java implementation of the library with very similar classes and methods. Of course, not all C++ methods are supported, but it is anyway a quick solution for your Java needs.

iOS Compatibility 

The library works fine with iOS and XCode. 

Other Features

These are some other current features:

  • Use XML :: ImportDB() to import a database (every ADO database supported) to an XML element (examples are in the help file)
  • Use XML :: Query() to query the database

These are some features I'd like to implement in the future:

  • Case-insensitive functions
  • Partial XML Load/Save
  • XML Compression

Please leave your questions and comments!

Sample Projects

  • XMLTest: A command line demo to discover the mains of the library
  • TXML: The full database/XML file solution for Win32: Exploits all the power of the library. Features:
    • MDI XML loader
    • Load from file, URL, encrypted file (AES 256), import ADO database, load clipboard, TODO: Load Partial
    • Save to file, encrypted file, export text, export to registry key
    • Copy/Cut/Paste, Copy/Cut/Paste to Windows clipboard, Copy/Cut append, Rename, Delete
    • XML Check Integrity, XML Compress
    • View menu: Not yet implemented except 'Toggle View' F4 which toggles to:
      • Plain XML
      • Database - type XML (very useful for storing database-type items like tables). Grid-style editor
      • Plain Text
    • Insert element/variable/comment/content
    • Execute query, opening a new XML file with query results
    • Auto updates from Internet
    • Supports Binary I/O.
    • Supports Element Encryption/Decryption.
    • CAUTION: Does NOT autosave the loaded XML file, and it closes them without checking if they are saved first. Press Ctrl+S manually to save files.
    • CAUTION : No undo. If you mess it up, I am sorry Smile | <img src= " />
  • XMLPPC: The lite XML editor for Windows Mobile 5+. Supports every feature as TXML, except multiple files opening and database queries.
  • Turbo GPS: The port for android uses the new Java implementation. Try it!

History

  • June 26,2014 - Update 0x170 - Added some helpers in both C++/Java versions.
  • November 16, 2012 - Update 0x16B
    • Added Next() and Prev() to XMLElement to get siblings.
    • Added GetValueS() to XMLVariable and XMLContent to get their values as a std::string if XML_USE_STL_EXTENSIONS is enabled.
    • Complete iOS support.
    • Fixed bug in GetAllChildren() which returned children before their parent.
    • Added compatibility with XML second header. Now XMLHeader can contain both the main header and another one below it.
    • Fixed %llu sprintfs to use the standard long-long instead  of the MS-compatible %I64 prefix. 
  • August 8, 2012 - Update 0x169
    • Enabled XML Header <!data >. 
    • Added JSON to main xml.cpp. 
    • Small bugfixes. 
  • January 24, 2012 - Update 0x165
    • STL Index bug fix
    • STL leak on load bug fix
  • October 13, 2011 - Update 0x164
    • Various bugfixes
    • Added json experimental converter
  • December 30, 2010 - Update 0x162
    • Added Binary Input/Output
    • Fixed some GCC compatibilities
    • Fixed some STL issues
    • Added Java Implementation
  • March 09, 2010 - Update 0x15B
    • Fixed STL bug
    • Added asymmetric encryption/decryption based on CryptoAPI Certificates (Win32)
  • March 07, 2010 - Update 0x15A
    • Fixed bug in binary variables
    • Added element signing/verification based on CryptoAPI Certificates (Win32)
  • November 30, 2009 - Update 0x158
    • Element encryption, Content binary items and STL enhancements
  • August 9, 2009 - Update 0x156
    • Cummulative updates and fixes
  • January 15, 2009 - Update 0x150
    • Converted __int64 to long long
    • Fixed small parsing bugs
    • Added STL mode
  • October 20, 2008 - Update 0x143
    • Added optional namespace XMLPP
    • Converted constants of "Savemode" "Loadmode" and "Targetmode" to readable enums
    • Added extra protection to avoid crashes in case of malformed XML.
    • Replaced many 'int' s with 'size_t' allowing further expansion to X64 and eliminating C4267
      Added "Const" to more functions
  • July 23, 2008 - Update 0x141
    • Added ability for comments to have < and >
    • Fixed some crashes due to malformed XML files
    • Added const to some functions
    • Added SetValueInt64 and GetValueInt64
    • Fixed SetValueInt to use %i instead of %u
    • Other minor fixes
  • April 27, 2008 - Update 0x140
    • Added XML :: SetUnicode()
    • Fixed bug in element updating
    • Changed Item param to be 64-bits
    • Fixed small bug in 64-bit Z<>
    • Fixed bug in XMLElement :: MoveElement
    • Added Temporal Elements
  • November 22, 2007 - Update 0x139
    • Some bug fixes
    • Item updating with XMLElement :: UpdateElement
  • October 31, 2007 - Update 0x136
    • Added item borrowing and mirroring
  • October 22, 2007 - Update 0x135
    • XML :: XML, XML :: Load() and XML :: Save() accept Unicode filenames
  • October 13, 2007 - Update 0x132
    • XML is now x64 compatible
    • Added XDB.CPP to demonstrate usage of XMLDialog()
  • September 4 , 2007 - Update 0x12F
    • Implementation of Element Unloading/Reloading
  • August 20, 2007 - Update 0x12E
    • Faster saving to memory
  • August 19, 2007 - Update 0x12D
    • XMLQuery bug fixed
    • Badly formed XML bug fixed
    • Added text-versions of XMLElement :: AddElement, AddVariable, AddComment, AddContent , AddCData
  • August 7,2007 - Update 0x12B
    • CData bug fixed
    • Sequence bug fixed
  • June 22, 2007 - Update 0x129
    • CData bug fixed
    • UTF-16 writing support
  • June 21, 2007 - Update 0x128
    • XMLCData added
    • Some minor bugs for gcc/utf-16 fixed
    • CHM help file format
  • June 08, 2007
    • TXML.EXE for Win32 added, exploiting all the features of the library (Currently binary only, no source)
  • June 01, 2007 - Update 0x125
    • Pocket PC errors fixed, XML::XMLQuery() updated
    • AddBlankVariable bug fixed
    • XMLPPC.EXE Pocket PC sample
  • May 18, 2007 - Update 0x124
    • UTF-16 file reading
    • Linux compilation fixed
    • XML:: ImportDB() fix
    • Other issues
  • May 15, 2007 - Update 0x123
    • Include formatting options in XML.CPP
  • May 9, 2007
    • Original version posted

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)