XPath - Elements and Attributes

sirgilligan

Rate me:

4.44/5 (19 votes)

1 Jul 20043 min read

138.6K

An article on using XPath to select elements of an XML. The XML uses various configurations of elements and attributes to represent identical data

Introduction

This beginners tutorial shows four different ways to represent the same data in XML and how to select that data using XPath. The data represented is the page size of a census recording. The page size depends on the country and the year. Also, there are two sizes (and they may be the same size) for the page, a large size and a small size.

XML

<?xml version="1.0" encoding="utf-8" ?> 
<STUFF>
  <TYPE1>
    <CENSUS COUNTRY="USA" YEAR="1930">
      <PAGE SIZE="SMALL">17x11</PAGE>
      <PAGE SIZE="LARGE">27x19</PAGE>
    </CENSUS>
    
    <CENSUS COUNTRY="USA" YEAR="1880">
      <PAGE SIZE="SMALL">17x11</PAGE>
      <PAGE SIZE="LARGE">19x25</PAGE>
    </CENSUS>
    
    <CENSUS COUNTRY="UK" YEAR="1871">
      <PAGE SIZE="SMALL">9.5x15</PAGE>
      <PAGE SIZE="LARGE">9.5x15</PAGE>
    </CENSUS>
    
    <CENSUS COUNTRY="UK" YEAR="1891">
      <PAGE SIZE="SMALL">11x16</PAGE>
      <PAGE SIZE="LARGE">11x16</PAGE>
    </CENSUS>
  </TYPE1>

  <!-- **************************************************** -->

  <TYPE2>
    <CENSUS>
      <COUNTRY>USA</COUNTRY>
      <YEAR>1930</YEAR>
      <PAGE>
        <SIZE>
          <SMALL>17x11</SMALL>
          <LARGE>27x19</LARGE>
        </SIZE>
      </PAGE>
    </CENSUS>
    
    <CENSUS>
      <COUNTRY>USA</COUNTRY>
      <YEAR>1880</YEAR>
      <PAGE>
        <SIZE>
          <SMALL>17x11</SMALL>
          <LARGE>19x25</LARGE>
        </SIZE>
      </PAGE>
    </CENSUS>
    
    <CENSUS>
      <COUNTRY>UK</COUNTRY>
      <YEAR>1871</YEAR>
      <PAGE>
        <SIZE>
          <SMALL>9.5x15</SMALL>
          <LARGE>9.5x15</LARGE>
        </SIZE>
      </PAGE>
    </CENSUS>
    
    <CENSUS>
      <COUNTRY>UK</COUNTRY>
      <YEAR>1891</YEAR>
      <PAGE>
        <SIZE>
          <SMALL>11x16</SMALL>
          <LARGE>11x16</LARGE>
        </SIZE>
      </PAGE>
    </CENSUS>
  </TYPE2>

  <!-- **************************************************** -->

  <TYPE3>
    <CENSUS>
      <USA YEAR="1930">
        <PAGE SIZE="SMALL">17x11</PAGE>
        <PAGE SIZE="LARGE">27x19</PAGE>
      </USA> 
      
      <USA YEAR="1880">
        <PAGE SIZE="SMALL">17x11</PAGE>
        <PAGE SIZE="LARGE">19x25</PAGE>
      </USA> 
      
      <UK YEAR="1871">
        <PAGE SIZE="SMALL">9.5x15</PAGE>
        <PAGE SIZE="LARGE">9.5x15</PAGE>
      </UK> 
      
      <UK YEAR="1891">
        <PAGE SIZE="SMALL">11x16</PAGE>
        <PAGE SIZE="LARGE">11x16</PAGE>
      </UK> 
    </CENSUS>
  </TYPE3>
  
  <!-- **************************************************** -->

  <TYPE4>
    <CENSUS>
      <COUNTRY>
        USA
        <YEAR>
          1930
          <PAGE>
            <SIZE TYPE="SMALL">17x11</SIZE>
            <SIZE TYPE="LARGE">27x19</SIZE>
          </PAGE>
        </YEAR>
        <YEAR>
          1880
          <PAGE>
            <SIZE TYPE="SMALL">17x11</SIZE>
            <SIZE TYPE="LARGE">19x25</SIZE>
          </PAGE>
        </YEAR>
      </COUNTRY>
      <COUNTRY>
        UK
        <YEAR>
          1871
          <PAGE>
            <SIZE TYPE="SMALL">9.5x15</SIZE>
            <SIZE TYPE="LARGE">9.5x15</SIZE>
          </PAGE>
        </YEAR>
        <YEAR>
          1891
          <PAGE>
            <SIZE TYPE="SMALL">11x16</SIZE>
            <SIZE TYPE="LARGE">11x16</SIZE>
          </PAGE>
        </YEAR>
      </COUNTRY>

    </CENSUS>
  </TYPE4>
</STUFF>

Background

Deciding when to use an element or an attribute to represent XML data is confusing for us beginners. Even more confusing is how to select the data when it is represented in different forms.

Using the Code

Just create a new C# console application called ConsoleXMLTest and replace the body of Class1.cs with the following code. Create a file called data.xml and place the above XML into that file. Place it in the appropriate directory so that your application can locate it. I set a build event under properties to move data.xml from the project directory to the output directory automatically as thus:

copy "$(PRojectDir)data.xml" "$(TargetDir)"

using System;
using System.IO;
using System.Xml;
using System.Xml.XPath;
using System.Collections;

namespace ConsoleXMLTest
{
  /// <summary>
  /// Summary description for Class1.
  /// </summary>
  class Class1
  {
    /// <summary>
    /// The main entry point for the application.
    /// </summary>
    [STAThread]
    static void Main(string[] args)
    {
      string fileName = "data.xml";
      FileStream fs = new FileStream(fileName,FileMode.Open,FileAccess.Read);
      XmlTextReader reader = new XmlTextReader(fs);
      TestOne(reader);

      fs.Seek(0,SeekOrigin.Begin);
      reader = new XmlTextReader(fs);
      TestTwo(reader);

      fs.Seek(0,SeekOrigin.Begin);
      reader = new XmlTextReader(fs);
      TestThree(reader);

      fs.Seek(0,SeekOrigin.Begin);
      reader = new XmlTextReader(fs);
      TestFour(reader);
    }

    static void TestOne(XmlTextReader reader)
    {
      System.Console.WriteLine("TestOne");
      XPathDocument xdoc = new XPathDocument(reader);
      XPathNavigator nav = xdoc.CreateNavigator();
      XPathNodeIterator nodeItor = nav.Select(
       "STUFF/TYPE1/CENSUS[@COUNTRY='USA' and @YEAR='1930']/PAGE");
      nodeItor.MoveNext();
      TraverseSiblings(nodeItor);
      System.Console.WriteLine();
    }

    static void TestTwo(XmlTextReader reader)
    {
      System.Console.WriteLine("TestTwo");
      XPathDocument xdoc = new XPathDocument(reader);
      XPathNavigator nav = xdoc.CreateNavigator();

      XPathNodeIterator nodeItor = nav.Select(
       "STUFF/TYPE2/CENSUS[COUNTRY='USA' and YEAR='1930']/PAGE/SIZE");
      nodeItor.MoveNext();
      TraverseChildren(nodeItor);
      System.Console.WriteLine();
    }
    
    static void TestThree(XmlTextReader reader)
    {
      System.Console.WriteLine("TestThree");
      XPathDocument xdoc = new XPathDocument(reader);
      XPathNavigator nav = xdoc.CreateNavigator();

      XPathNodeIterator nodeItor = nav.Select(
         "STUFF/TYPE3/CENSUS/USA[@YEAR='1930']/PAGE");
      nodeItor.MoveNext();
      TraverseSiblings(nodeItor);
      System.Console.WriteLine();
    }

    static void TestFour(XmlTextReader reader)
    {
      System.Console.WriteLine("TestFour");
      XPathDocument xdoc = new XPathDocument(reader);
      XPathNavigator nav = xdoc.CreateNavigator();

      XPathNodeIterator nodeItor = nav.Select(
        "STUFF/TYPE4/CENSUS/COUNTRY[normalize-space(text())='USA']"+
        "/YEAR[normalize-space(text())='1930']/PAGE/SIZE");
      nodeItor.MoveNext();
      TraverseSiblings(nodeItor);
      System.Console.WriteLine();
    }

    static void TraverseSiblings(XPathNodeIterator nodeItor)
    {
      XPathNodeIterator igor = nodeItor.Clone();
      PrintNode(igor.Current);
      igor.Current.MoveToNext();
      bool more = false;
      do
      {
        PrintNode(igor.Current);
        more = igor.Current.MoveToNext();
      }while(more);    }

    static void TraverseChildren(XPathNodeIterator nodeItor)
    {
      XPathNodeIterator igor = nodeItor.Clone();
      igor.Current.MoveToFirstChild();
      bool more = false;
      do
      {
        PrintNode(igor.Current);
        more = igor.Current.MoveToNext();
      }while(more);
    }

    static void Traverse(XPathNodeIterator nodeItor)
    {
      Stack nodeStack = new Stack();
      nodeStack.Push(nodeItor.Clone());

      while(nodeStack.Count > 0)
      {
        XPathNodeIterator igor = (XPathNodeIterator)nodeStack.Pop();
        //PrintNode(igor.Current);

        if(igor.Current.HasChildren == false)
        {
          PrintNode(igor.Current);
        }
        else
        {
          //PrintNode(igor.Current);

          //push each child
          XPathNodeIterator egor = igor.Clone();  
              //don't want to move the current in igor.
          egor.Current.MoveToFirstChild();
          
          //we want the items on the stack in reverse order, 
          //so push them on a temp stack
          //and pop them back off of the temp stack and 
          //push them on the real stack
          Stack reverseStack = new Stack();
          reverseStack.Push(egor.Clone());
          while(egor.Current.MoveToNext() == true)
          {
            reverseStack.Push(egor.Clone());
          }
          while(reverseStack.Count > 0)
          {
            nodeStack.Push(reverseStack.Pop());
          }
        }    
      }
    }

    static void PrintNode(XPathNavigator nav)
    {
      System.Console.WriteLine(nav.Name + ":" + nav.Value + 
          " Type : " + nav.NodeType.ToString());
    }
  }
}

Points of Interest

Learning how to select nodes using XPath is not very difficult. Since I like to learn by example, I made this code to reinforce the things I learned from studying MSDN and various web sites.

To select a node that has a particular attribute:

XPathNodeIterator nodeItor = nav.Select(
  "STUFF/TYPE1/CENSUS[@COUNTRY='USA' and @YEAR='1930']/PAGE");

The above query selects all PAGE nodes that have a CENSUS parent with attributes of USA and 1930.

To select a node that has a particular value:

XPathNodeIterator nodeItor = nav.Select(
  "STUFF/TYPE2/CENSUS[COUNTRY='USA' and YEAR='1930']/PAGE/SIZE");

The above query selects all SIZE nodes of the PAGE nodes that have a CENSUS parent that has COUNTRY and YEAR children with the respective values of USA and 1930.

TestFour is of particular interest because I have XML elements that have a value and have children. During my studying of XML, I didn't come across any examples of this and at first I didn't think it could be done. Here is the XML data for TestFour.

XML

<TYPE4>
  <CENSUS>
    <COUNTRY>
      USA
      <YEAR>
        1930
        <PAGE>
          <SIZE TYPE="SMALL">17x11</SIZE>
          <SIZE TYPE="LARGE">27x19</SIZE>
        </PAGE>
      </YEAR>
      <YEAR>
        1880
        <PAGE>
          <SIZE TYPE="SMALL">17x11</SIZE>
          <SIZE TYPE="LARGE">19x25</SIZE>
        </PAGE>
      </YEAR>
    </COUNTRY>
    <COUNTRY>
      UK
      <YEAR>
        1871
        <PAGE>
          <SIZE TYPE="SMALL">9.5x15</SIZE>
          <SIZE TYPE="LARGE">9.5x15</SIZE>
        </PAGE>
      </YEAR>
      <YEAR>
        1891
        <PAGE>
          <SIZE TYPE="SMALL">11x16</SIZE>
          <SIZE TYPE="LARGE">11x16</SIZE>
        </PAGE>
      </YEAR>
    </COUNTRY>

  </CENSUS>
</TYPE4>

When selecting the YEAR node and displaying the value, I would get all of the whitespace around the value as well. I learned to use this code:

XPathNodeIterator nodeItor = nav.Select(
 "STUFF/TYPE4/CENSUS/COUNTRY[normalize-space(text())='USA']"+
 "/YEAR[normalize-space(text())='1930']/PAGE/SIZE");

The query selects the COUNTRY node that has the text equal to USA with the whitespace stripped away. It does the same for the YEAR.

Additionally, I have written several recursive routines to traverse an XML tree during my studies. In this code, I decided to use a non-recursive solution using a Stack and a while loop.

Notice that I have a habit of naming iterator variables igor. It came from seeing so many named itor and I couldn't help but think of Igor from Young Frankenstein. So you will see some Igors and some Egors in the code.

The results of the code is this:

TestOne
PAGE:17x11 Type : Element
PAGE:27x19 Type : Element

TestTwo
SMALL:17x11 Type : Element
LARGE:27x19 Type : Element

TestThree
PAGE:17x11 Type : Element
PAGE:27x19 Type : Element

TestFour
SIZE:17x11 Type : Element
SIZE:27x19 Type : Element

References

Here are some references concerning XPath:

History

Version 1.0

License

This article has no explicit license attached to it, but may contain usage terms in the article text or the download files themselves. If in doubt, please contact the author via the discussion board below.

A list of licenses authors might use can be found here.

Written By

sirgilligan

Software Developer (Senior)

United States

Master Degree in C.S. .NET, Unix, Macintosh (OS X, 9, 8...), PC server side, and MFC. 17 years experience. Graphics, Distributed processing, Object Oriented Methods and Models.
Java, C#, C++. Webservices. XML. Real name is Geoffrey Slinker.

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.