(untagged)

Accessing XML Using Insert/Update/Delete/Query Statements

Marc Clifton

0.00/5 (No votes)

12 Apr 2005

A psuedo-database for XML.

Download source files - 58.6 Kb

Introduction

I was recently making a change to the Advanced Unit Test WinForm runner application, where I wanted to save some state information for each node in the tree of tests. I thought that XML would be a good format for this information, since it is hierarchical, like a tree. As I was thinking about this, I decided that working with XML Document isn't that much fun, and what would be more interesting and probably easier to use in other applications is something that worked more the way database interaction works, with insert, update, delete, and query capabilities. But really, it seems that something like this should have already been done.

Paul Wilson's XmlDataProvider

In my searches, I found an XmlDataProvider by Paul Wilson of WilsonORMapper fame. You can read about his work on his blog and download his project here. It's definitely worth looking at, and my only reason for not using it is that the underlying XML implementation utilizes a DataSet, and thus you get the DataSet XML formatting.

Why I Don't Like The DataSet

The DataSet gives you all sorts of great features, error detection, etc., but its XML format is ugly. If I create a couple rows in a table with columns C1, C2, C3, this is what you get:

<?xml version="1.0" standalone="yes"?>
<NewDataSet>
<Table1>
<C1>1</C1>
<C2>2</C2>
<C3>3</C3>
</Table1>
<Table1>
<C1>4</C1>
<C2>5</C2>
<C3>6</C3>
</Table1>
</NewDataSet>

where, what I really want is something more like this:

<?xml version="1.0" standalone="yes"?>
<Database>
<Table1>
<Row C1="1" C2="2" C3="3"/>
<Row C1="4" C2="5" C3="6"/>
</Table1>
</Database>

More to the point, my requirements are to be able to manipulate XML without being constrained by a specific format, such as is imposed by the DataSet serializer. If I use the XmlDataProvider, I'm constrained to working with XML that has been serialized by the DataSet. So I embarked on the idea of writing a thin wrapper to the XmlDocument class that provides my desired functionality without imposing formatting constraints.

Take Heed

There are some cases where working with general XML can cause problems. For example, XML can be written so that there is no way to uniquely identify individual rows. In the above XML example, consider how you might do an update to change the value of "4" to "5" in the second row. If there isn't any information to uniquely identify that row, then my solution will end up updating the C1 property in all rows.

Some Interesting Advantages

There are some interesting things you can do when XML is acting as a database. XML is hierarchical, whereas a database is relational. If it's appropriate, you can use XML to maintain several databases. Now think what you can do with the hierarchy. You can create different versions of the same database.

<?xml version="1.0" standalone="yes"?>
<Database>
  <Version1>
    <Table1>
      <Row C1="1" C2="2" C3="3"/>
      <Row C1="4" C2="5" C3="6"/>
    </Table1>
  </Version1>
</Database>

You can also create different versions of the same table within a database.

<?xml version="1.0" standalone="yes"?>
<Database>
  <Table1>
    <Version1>
      <Row C1="1" C2="2" C3="3"/>
      <Row C1="4" C2="5" C3="6"/>
    </Version1>
  </Table1>
</Database>

This is interesting! Or you can maintain different sets of data based on the user.

<?xml version="1.0" standalone="yes"?>
<Database>
  <Table1>
    <MarcData>
      <Row C1="1" C2="2" C3="3"/>
      <Row C1="4" C2="5" C3="6"/>
    </MarcData>
    <KarenData>
      <Row C1="11" C2="22" C3="33"/>
      <Row C1="44" C2="55" C3="66"/>
    </KarenData>
  </Table1>
</Database>

There are many times when I need functionality like this for a small set of data.

Very Limited Uses

Cool features aside, keep in mind that using XML as a database is very limited. Indexing, sorting, grouping, table joins, stored procedures, and all the other useful features and performance motivated functionality that one typically counts on doesn't exist. That doesn't mean it can't be done, but it is certainly not something I think is useful. Use the appropriate tools for the job. In some cases, the implementation I'm proposing here may be the appropriate tool for simple configuration state management, maintaining different configuration sets, or working with a very limited amount of data.

Example Usage

I use this tool to save the Checked and Expanded states of the tree view in the unit test runner. To save these values, I recurse through the TreeNode:

XmlDatabase config=new XmlDatabase("Config");
string path=StringHelpers.LeftOf(tvUnitTests.Nodes[0].Text, ',');
config.Insert(path);
foreach(TreeNode node in tvUnitTests.Nodes[0].Nodes)
{
  WriteConfig(config, node, path);
}

...

private void WriteConfig(XmlDatabase config, TreeNode node, string path)
{
  string newPath=path+"/"+node.Text;
  config.Insert(newPath,
  new XmlDatabase.FieldValuePair[]
  {
    new XmlDatabase.FieldValuePair("Checked", node.Checked.ToString()),
    new XmlDatabase.FieldValuePair("Expanded", node.IsExpanded.ToString())
  });
  foreach(TreeNode child in node.Nodes)
  {
    WriteConfig(config, child, newPath);
  }
}

Similarly, to read the last Checked and Expanded states, I recurse through the configuration again, once the tree has been loaded. Note that I test for missing records in the XML database, as the tree might have changed:

private void LoadAssemblyConfig()
{
  XmlDatabase config=new XmlDatabase("Config");
  string path=StringHelpers.LeftOf(tvUnitTests.Nodes[0].Text, ',');
  tvUnitTests.CollapseAll();
  tvUnitTests.Nodes[0].Expand();
  foreach(TreeNode node in tvUnitTests.Nodes[0].Nodes)
  {
    ReadConfig(config, node, path);
  }
}

private void ReadConfig(XmlDatabase config, TreeNode node, string path)
{
  string newPath=path+"/"+node.Text;

  string isChecked=config.QueryScalar(newPath, "Checked");
  if (isChecked != null)
  {
    node.Checked=Convert.ToBoolean(isChecked);
  }

  string isExpanded=config.QueryScalar(newPath, "Expanded");
  if (isExpanded != null)
  {
    if (Convert.ToBoolean(isExpanded))
    {
      node.Expand();
    }
  }

  foreach(TreeNode child in node.Nodes)
  {
    ReadConfig(config, child, newPath);
  }
}

Using the test fixture itself as an example output, this is what the tree configuration file looks like:

<?xml version="1.0" encoding="utf-8"?>
<Config>
  <Marc.XmlConfigurationManagement>
    <XmlConfigUnitTests Checked="True">
      <UnitTests Checked="True" Expanded="True">
        <Setup Checked="True" Expanded="False"/>
        <InsertRecord Checked="True" Expanded="False"/>
        <InsertRecordWithValue Checked="True" Expanded="False"/>
        <InsertRecordWithMultipleValues Checked="True" Expanded="False"/>
        <UpdateField Checked="True" Expanded="False"/>
        <DeleteField Checked="True" Expanded="False"/>
        <InsertMultipleUniqueRecords Checked="True" Expanded="False"/>
        <QueryUniqueRecords Checked="True" Expanded="False"/>
        <DeleteUniqueRecord Checked="True" Expanded="False"/>
        <VerifyMissingRecordNullReturn Checked="True" Expanded="False"/>
        <UpdateMultipleRows Checked="True" Expanded="False"/>
        <DeleteMultipleRows Checked="True" Expanded="False"/>
        <MultiRowMultiFieldUpdate Checked="True" Expanded="False"/>
        <MultirowQuery Checked="True" Expanded="False"/>
        <MultiRowQueryOfSingleField Checked="True" Expanded="False"/>
        <MultiRowQueryOfMultipleFields Checked="True" Expanded="False"/>
        <Save Checked="True" Expanded="False"/>
      </UnitTests>
    </XmlConfigUnitTests>
  </Marc.XmlConfigurationManagement>
</Config>

Implementation

Insert Operations

Inserting a path is a recursive process. If any child node along the specified path does not exist, it will be automatically created. For example, if you call the basic Insert method,

public XmlNode Insert(string path)
{
  if (path==null)
  {
    throw(new ArgumentNullException("path cannot be null."));
  }

  string path2=rootName+"/"+path;
  string[] segments=path2.Split('/');
  XmlNode lastNode=InsertNode(xdoc, segments, 0);
  return lastNode;
}

it calls a protected method that recurses through the path, creating XML nodes as necessary:

protected XmlNode InsertNode(XmlNode node, string[] segments, int idx)
{
  XmlNode newNode=null;

  if (idx==segments.Length)
  {
    // All done.

    return node;
  }

  // Traverse the existing hierarchy but ensure that we create a 

  // new record at the last leaf.

  if (idx+1 < segments.Length)
  {
    foreach(XmlNode child in node.ChildNodes)
    {
      if (child.Name==segments[idx])
      {
        newNode=InsertNode(child, segments, idx+1);
        return newNode;
      }
    }
  }
  newNode=xdoc.CreateElement(segments[idx]);
  node.AppendChild(newNode);
  XmlNode nextNode=InsertNode(newNode, segments, idx+1);
  return nextNode;
}

Update Operations

Update operations are done on all nodes qualified by the XPath path:

public int Update(string path, string field, string val)
{
  if (path==null)
  {
    throw(new ArgumentNullException("path cannot be null."));
  }
  if (field==null)
  {
    throw(new ArgumentNullException("field cannot be null."));
  }
  if (val==null)
  {
    throw(new ArgumentNullException("val cannot be null."));
  }

  XmlNodeList nodeList=xdoc.SelectNodes(rootName+"/"+path);
  foreach(XmlNode node in nodeList)
  {
    node.Attributes[field].Value=val;
  }
  return nodeList.Count;
}

If you specify a "where" clause (as an XPath expression), it will automatically be added as a qualifier to the path:

public int Update(string path, string where, string field, string val)
{
  ...
  return Update(path+"["+where+"]", field, val);
}

You will have to code into the XPath statement yourself any qualifiers that occur at nodes other than the last child.

Delete Operations

Delete operations are similar to updates--you can delete all the nodes qualified by the XPath path, or you can include a "where" clause to filter the qualified nodes:

public int Delete(string path)
{
  if (path==null)
  {
    throw(new ArgumentNullException("path cannot be null."));
  }

  XmlNodeList nodeList=xdoc.SelectNodes(rootName+"/"+path);
  foreach(XmlNode node in nodeList)
  {
    node.ParentNode.RemoveChild(node);
  }
  return nodeList.Count;
}

Query Operations

Query operations fall into two categories: scalar and vector. A scalar query returns a single string representing the value of a particular attribute for a specific record.

Scalar Queries

The typical implementation looks like this:

public string QueryScalar(string path, string field)
{
  if (path==null)
  {
    throw(new ArgumentNullException("path cannot be null."));
  }
  if (field==null)
  {
    throw(new ArgumentNullException("field cannot be null."));
  }

  string ret=null;
  XmlNode node=xdoc.SelectSingleNode(rootName+"/"+path);
  if (node != null)
  {
    XmlAttribute xa=node.Attributes[field];
    if (xa != null)
    {
      ret=xa.Value;
    }
  }
  return ret;
}

The query can be further qualified using a "where" clause or by simply embedding the qualifying information into the XPath string itself.

Vector Queries

A vector query will return a DataTable instance. The columns of the DataTable are initialized to the attribute names. If you look at the code, you'll see that it uses the first returned record to create the column collection. Therefore, if other rows have different attributes, this method will fail.

public DataTable Query(string path)
{
  if (path==null)
  {
  throw(new ArgumentNullException("path cannot be null."));
  }

  DataTable dt=new DataTable();
  XmlNodeList nodeList=xdoc.SelectNodes(rootName+"/"+path);
  if (nodeList.Count != 0)
  {
    CreateColumns(dt, nodeList[0]);
  }
  foreach(XmlNode node in nodeList)
  {
    DataRow dr=dt.NewRow();
    foreach(XmlAttribute attr in node.Attributes)
    {
      dr[attr.Name]=attr.Value;
    }
    dt.Rows.Add(dr);
  }
  return dt;
}

Other overloads allow you to specify the specific columns that you want populated in the DataTable.

Array Query

There's a special query that returns the values of a specific field (as in, XML attribute) for all the qualified rows. Instead of returning a DataTable, the return value is simply an array of strings:

public string[] QueryField(string path, string field)
{
  if (path==null)
  {
    throw(new ArgumentNullException("path cannot be null."));
  }
  if (field==null)
  {
    throw(new ArgumentNullException("field cannot be null."));
  }

  XmlNodeList nodeList=xdoc.SelectNodes(rootName+"/"+path);
  string[] s=null;
  if (nodeList.Count != 0)
  {
    s=new string[nodeList.Count];
    int i=0;
    foreach(XmlNode node in nodeList)
    {
      s[i++]=node.Attributes[field].Value;
    }
  }
  return s;
}

I use this as a convenience instead of having to work with a DataTable for a multi-row query of a single field.

Unit Tests

There are a variety of unit tests written using my AUT test runner. The tests run in sequence, each one testing further functionality from data established by the previous test. I find it much easier to perform certain kinds of testing in this manner, as I don't have to set up the database for every single test--the previous tests do the setup for me.

Documentation

The code is well documented and the download includes an NDoc reference.

Conclusion

The idea behind this is to make it more convenient to use XML documents by creating a hybrid between what we're used to in the database world and XPath statements, without having to deal with the inner workings of the XmlDocument class. I also don't like the DataSet XML serialization, as I wanted something that worked with hopefully many kinds of XML formats, rather than something forced upon me. I also wanted to keep it simple--using XML for a database is absurd, but for certain configuration information, having a tool that treats an XML file similar to the way you'd work with a database is convenient.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here