Click here to Skip to main content
16,004,540 members
Articles / General Programming / Parser

Mini XML - A Powerful XML Parser

Rate me:
Please Sign up or sign in to vote.
5.00/5 (1 vote)
24 Jun 2024CPOL3 min read 2.3K   27   2   1
Single file XML parser with good error support, and associated XPath file.

minixml

A single file but powerful XML parser, and associated XPath engine

Building

It is a single file C source for the XML parser, and another for the XPath engine. Simply take the source files and drop them into your own project.

The code should be completely portable and build anywhere with a C compiler.

Basic usage

XML files have a tree structure.

This is the structure of the nodes.

typedef struct xmlattribute
{
  char *name;                /* attribute name */
  char *value;               /* attribute value (without quotes) */
  struct xmlattribute *next; /* next pointer in linked list */
} XMLATTRIBUTE;

typedef struct xmlnode
{
  char *tag;                 /* tag to identify data type */
  XMLATTRIBUTE *attributes;  /* attributes */
  char *data;                /* data as ascii */
  int position;              /* position of the node within parent's data string */
  int lineno;                /* line number of node in document */
  struct xmlnode *next;      /* sibling node */
  struct xmlnode *child;     /* first child node */
} XMLNODE;

typedef struct
{
  XMLNODE *root;             /* the root node */
} XMLDOC;

So to walk the tree, use the following template code.

void walktree_r(XMLNODE *node, int depth)
{
    int i;
    
    while (node)
    {
        for (i =0; i < depth; i++)
          printf("\t");
        printf("Tag %s line %d\n", xml_gettag(node), xml_getlineno(node));
        
        if (node->child)
            walktree_r(node->child, depth + 1);
        node = node->next;
    }
}

Very simple and easy.

Loading XML files

The loaders are the only non-trivial functions in the file. They are extremely powerful and will load XML files in the main encodings, UTF-8, UTF-16 big endian, and UTF-16 little endian. They don't quite support all of XML but they will load most documents.

There are three loaders

XMLDOC *loadxmldoc(const char *fname, char *errormessage, int Nerr);
XMLDOC *floadxmldoc(FILE *fp, char *errormessage, int Nerr);
XMLDOC *xmldocfromstring(const char *str,char *errormessage, int Nerr);

They return an XML document on success, NULL on fail. xmldocfromstring has to be passed a string encoded in UTF-8, which usually means plain ASCII. The error message is a buffer for diagnostics if thing go wrong, which is often very important for the user.

Here's an example program.

#include "xmlparser2.h"

int main(int argc, char **argv)
{
    XMLDOC *doc;
    char error[1024];
       
    if (argc != 2)
          return EXIT_FAILURE;
    doc = loadxmldoc(argv[1], error, 1024);
    if (!doc)
    {
        fprintf(stderr, "%s\n", error);
        return EXIT_FAILURE;
    }
    walktree_r(xml_getroot(doc), 0);
    killxmldoc(doc);
       
    return 0;
}

Other functions

Access functions

const char *xml_gettag(XMLNODE *node);
const char *xml_getdata(XMLNODE *node);
const char *xml_getattribute(XMLNODE *node, const char *attr);
int xml_Nchildren(XMLNODE *node);
int xml_Nchildrenwithtag(XMLNODE *node, const char *tag);
XMLNODE *xml_getchild(XMLNODE *node, const char *tag, int index);
XMLNODE **xml_getdescendants(XMLNODE *node, const char *tag, int *N);

xml_gettag(), xml_getdata(), and xml_getattribute() return const pointers to the data members of the node. xml_Nchildren() gives the number of direct childen, and xml_Nchildren() gives the number of direct children with a tag. xml_getchild() returns the child with that tag, and the given index. It is a slow but easy way of iterating over children with a given tag. xml_getdescendants is a fishing expedition. It is essentially the XPath query ("//tag"), but implemented far more efficiently. It picks out all descendants with the given tag.

Error reporting functions

The strength of the minixml parser is its error reporting support.

int xml_getlineno(XMLNODE *node);
XMLATTRIBUTE *xml_unknownattributes(XMLNODE *node, ...);

xml_getlineno() is a vital little function when reporting any error in a large XML file to the user. He must know the line at which the bad element occurred, so minixml keeps track of this. And whilst you will naturally detect unknown tags whilst walking the tree, detecting unknown attributes is a little trickier. So minixml provides a handy little function for you.

XLMATTRIBUTE *badattr;
XMLATTRIBUTE *attr;
char *end = NULL;

badattr = xml_unknownattributes(node, "mytag", "faith", "hope", "charity", end);
if (badattr)
{
   printf("Node <mytag> line %d only attributes allowed are faith, hope and charity\n",
                xml_getlineno(node));
   for (attr = badatttr; attr != NULL; attr = attr->next)
   {
      printf("Bad attribute name: %s value: %s\n", attr->name, attr-value);
   }
}

You can free the bad attributes recursively. They are deep copies. Note a quirk of C. You must not pass a raw 0 or even a NULL to a variadic function which expects a character pointer, as it might be treated as 32 bit integer whilst pointers are 64 bits.

Test Code

There is nice suite of test programs which use the parser. Whilst they are mainly written for demonstration purposes, some of them are also hoped to be useful.

Demonstration programs

  • simpletest - a simple test program to walk the tree

  • printxml - load the XML with the parser, then print it back out

  • striptags - strip all tags from XML

  • upperlower - simple markup language with two tags

    These are intended as simple programs to test the parser, and show how to use it.

Format converters

  • xmltojson - XML to JSON converter

  • xmltocsv - XML to CSV converter

    These are two file format converters. They are simple, but intended to be usable for real.

The XML FileSystem project

  • directorytoxml - package a directory as an XML file

  • directory - extract files from a FileSystem XML file packaged by previous program.

  • listdirectory - list files in a FileSystem XML file

    This is a small but very real project. The idea is to package up a directory or folder in a single XML file, so that it can then be embedded in a program as a string, and used as an internal filesystem. You could also use it as a cheap and cheerful alternative to the Unix program tar.

    Check out progress at the Baby X resource compiler. And the FilesSystem format page.

Copyright

All the code is authored by Malcolm McLean

It is available as a public service for any use.

XML Parser docs.

This article was originally posted at https://github.com/MalcolmMcLean/minixml

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
United Kingdom United Kingdom
I started programming when we were taught Basic on the Commodore Pet at school, and was immediately hooked. But my parents were not generous with money, and it was a while before I saved up enough money to buy a second-hand ZX81. Then a friend gave me "Machine Code on your ZX81" by Toni Baker (not "Tony", a lady), and that book changed my life, because it enabled me to master something that most adults couldn't do. And I realised the power of good textbooks and self study. I have written two books on programming in consequence.

Then I want to Oxford to study English Literature, and programming came to an end, except for a brief course on Snobol, and statistical analysis of grammar words (words like "and" and "he"). But the expected job with the Civil Service did not materialise, I needed to earn a living somehow, and so it was to games programming that I turned. But I was never entirely happy as a game programmer. Whilst I enjoy programming games, I'm not so fond of playing them, except for Dungeons and Dragons style games. And for a games programmer, that's a big handicap.

I've got other interests aside from programming, and after I had collected a big pile of cash from games programming, I decided to spend it on doing a second degree, at Leeds University, in biology. That then led naturally to a PhD in computational biochemistry, working on the protein folding problem, and that turned me into a really good programmer.

However there's only one faculty position for every 10 PhDs, and I was one of the unlucky nine, and so I found a job doing graphics programming. Which I kept until unfortunately ill health forced me to give up work. And I am now a full time hobby programmer.

And my main project is Baby X and its attendant subsystems.


Comments and Discussions

 
GeneralMy vote of 5 Pin
Ștefan-Mihai MOGA28-Jun-24 22:39
professionalȘtefan-Mihai MOGA28-Jun-24 22:39 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.