Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / XML

A beginner's guide to XPath

4.11/5 (10 votes)
27 May 2007CPOL5 min read 1   587  
This article demonstrates how a beginner can start to get to grips with XPath using C#.

Introduction

This article aims to explain what an XPath expression is and why they can be extremely useful to a C# programmer.

Background

When I first started .NET programming, I was immediately exposed to the use of XML. It was everywhere, and I had hardly any exposure to it previously. After understanding why XML documents were used so widely, I took the decision to incorporate XML files into my next application. So off I went, added the System.Xml namespace to my project, and created an XmlDocument object. I called the LoadXml() method and passed in a valid XML string. "Great!" I thought, but soon discovered I had no idea how I would get the data I wanted out of the XmlDocument object.

Using XPath in C#

Throughout this guide, I will refer to the following XML file:

XML
<?xml version="1.0" encoding="utf-8" ?>
<books>
  <book>
    <title>A beginners guide to XPath</title>
    <author>Gary Francis</author>
    <description>A book that explains XPath for beginners</description>
    <data type="Price">12.00</data>
    <data type="ISBN">1234567890</data>
  </book>
  <book>
    <title>Advanced C# Programming</title>
    <author>A. Uther</author>
    <description>Advanced applied C# techniques.</description>
    <data type="Price">47.00</data>
  </book>
  <book>
    <title>Understanding C# for beginners</title>
    <author>Any body</author>
    <description>How to get started with C# and .NET</description>
    <data type="Price">12.00</data>
    <data type="Comment">This was a great book... It helped Loads.</data>
    <data type="Comment">Excellent material if you new to C#.</data>
  </book>
</books>

The above XML file contains information about books. As you can imagine, this file could be a lot more complex, but for the sake of simplicity, we will leave it like this.

Before we can do anything useful with this data, we need to create an XmlDocument object and load the data into it. From within Visual Studio, create a new C# Windows Forms application. On the form, drop a button, and double click the button to bring up its event handler. The following code loads the XML data from a file:

C#
XmlDocument document = null;
XmlNodeList nodeList = null;
XmlNode node = null;
// Try and load xml data into an Xml document object and throw an
// error message if this fails
try
{
    document = new XmlDocument();
    document.Load("Data.xml");
}
catch (Exception ex)
{
    MessageBox.Show("Error loading 'Data.xml'. Exception: " + ex.Message);           
}

In order for the above code to compile, you will need to reference the System.Xml namespace at the top of your code file. This example also assumes that you have a file called "Data.xml" in the output directory of your project. The easiest way to do this is to add a new XML file into your project, insert the above XML data into the file, and name the file Data.xml. Finally, you should set the "Copy To Output Directory" property of the XML file to "Copy Always".

Now that we have successfully loaded the data, we need to reference some data. To do this, we will use an XPath expression. The data we are going to try and retrieve initially is a NodeList of all book elements within the XML file. The XPath expression to achieve this is:

/books/book

Don't worry too much that you don't know what this means as all will be revealed shortly. So we add the following code to the method:

C#
// Try and retrieve all book nodes
nodeList = document.SelectNodes("/books/book");

This will populate our XmlNodeList will all of the book elements. Well, within each of these elements, we know there is going to be a <title> element. So we can use another XPath query to access that. The following code will achieve this for each of the book elements we just retrieved:

C#
foreach (XmlNode book in nodeList)
{
    // Show a message with the book title
    MessageBox.Show(book.SelectSingleNode("title").InnerText);
}

That was simple, right? So, right now, I bet you can imagine all sorts of ways you could use XPath in your applications, and you would be right. There is still the slight problem of the XPath syntax.

Building XPath Expressions

As this is just a beginner's tutorial, I will go over only the basic XPath expressions and what they mean. As time goes on, you might want to try more complex XPath queries such as reading XML data in reverse (going from a child node to a parent node). This is outside the scope of this document, but there are plenty of other references on the internet that should be able to help.

The first important thing you should remember when using XPath is the context you are in when you try to use the expression. For example, if we use the /books expression whilst we are trying to select the title of the book, we would have in fact been looking for a books node within the book node we were already in. For this to have worked, our XML document would have had to look something like:

XML
<books>
  <book>
    <books>
      <book></book>
    </books>
  </book>
</books>

From the above example, I hope you can understand the importance of context as this will become apparent when you try to put XPath to use in your applications.

So the first thing we might need to know how to do using XPath is selecting some nodes. Here are some examples of how you can do this:

ExpressionDescriptionExample
nodenameThis will select all child nodes of the provided node namebooks

This would select all nodes under the "books" element

/This will select all child nodes matching the proceeding expression from the root nodebooks/book

This would select all of the book nodes contained within the books element

//This will select all nodes within the XML document from the current point if they match the proceeding expression books//book

This would also select all of the book nodes contained within the books element, but if there was a book element outside of the books element, they would also be selected

@This is used to select the attributes of the nodesbooks/book/data[@type='Price']

This would select all book elements that have a price attribute

It is also important to note that you can use indexes when you want to select a particular node. For example: /books[1] will return the first "books" element. Notice that indexes are not zero based.

Note: This is the W3C specification. In some browser's, this is implemented incorrectly and the browser treats the indexes as zero based. This should only be a concern if you are copying code that was originally written for certain browsers. I believe IE5 and IE6 fall foul of this, but I have never investigated to be truthful, so this may or may not be the case.

Recommended

I would seriously recommend that you download the source files for this project. It will allow you to play around with different XML data scenarios and XPath expressions easily and will help you to learn. I always say that it is better to learn from your mistakes than it is to not bother trying.

Conclusion

Well, that is about it for my first contribution to CodeProject. If you find this useful, or you would like me to write a follow up article which goes into some more detail on some of the more complex XPath expressions that might be needed, drop me an e-mail. My address can be found on my CodeProject profile page.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)