Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / XML

LINQ to XML

4.72/5 (71 votes)
14 Mar 2008CPOL8 min read 1   4.9K  
An exploration of LINQ and XML in .NET Framework 3.5.

Introduction

Working with XML using Microsoft's .NET Framework version 2.0 and below is a cumbersome task. The API available follows the W3C DOM model, and is document-centric. Everything begins with the document; you can't create elements without having a document; even a fragment of XML is a document.

In the latest release of the .NET Framework, however, this has changed. XML is now element-centric. With features like object initialization and anonymous types, it's very easy to create XML. Add to this the features of LINQ, and we now have a very easy to use and powerful tool for XML.

In this article, I will explore some of the features available in .NET Framework release 3.5 related to XML and LINQ. This is, of course, not an extensive discussion of either subject, merely a familiarization and stepping stone for more learning and exploration.

The LINQ part

When discussing LINQ to XML, or LINQ to whatever, the first thing that needs to be discussed is, of course, LINQ.

LINQ

Language-Integrated Query, or LINQ, is an extension to the .NET Framework in version 3.5 that makes queries and set operations first class citizens of .NET languages such as C#. It has been further defined as, "A set of general purpose standard query operators that allow traversal, filter, and projection operations to be expressed in a direct yet declarative way in any .NET language."

Getting started

This is an example of a very basic LINQ query:

C#
string[] names = new string[] { "John", "Paul", "George", "Ringo" };
var name = names.Select(s => s);

Two things noticeable here are the var keyword and the strange-looking operator =>.

Var keyword

var is new data type that has been introduced in 3.5. Although it looks similar to the var data type in VB or JavaScript, it isn't quite the same. In VB and JavaScript, var represents a variant data type, one that can be used to represent just about anything. In LINQ, however, var is more of a placeholder; the actual data type is set at compile time, and is inferred by the context it is used in.

In the above example, name is a resolved to System.Linq.Enumerable.SelectIterator<string,bool>

C#
var name = "Hello, World"; 

In this example though, name is resolved to a string.

This ambiguity is useful when you are unsure of what exactly will be returned from a query, and the fact that it is not necessary to cast the variable to another type before using it, is very convenient.

Lambda expressions

Lambda expressions were first introduced in 1936 by mathematician Alonzo Church as a short hand for expressing algorithms. In .NET 3.5, they are a convenient way for developers to define functions that can be passed as arguments, and are an evolution of Anonymous Methods introduced in .NET 2.0.

The => operator is used to separate input variables on the left and the body of the expression on the right.

C#
string[] names = new string[] { "John", "Paul", "George", "Ringo" };
var name = names.Select(s => s.StartsWith("P"));

In this example, each string in the names array is represented by the variable s. It's not necessary to declare a data type because it is inferred from the type of the collection, names in this case.

These two statements would be somewhat analogous:

C#
var name = names.Select(s => s);
foreach(string s in names) { }

The body of the expression, s.StartsWith("P"), just uses the string method to return a boolean value. Select is an extension method (more on that shortly), for that takes as its parameter a Func object.

Func and Action

Func and Action are two new methods available in .NET 3.5, and are used to represent delegates.

C#
Func<TSource, TResult>

This is used to represent a delegate that returns a value, TResult.

C#
Action<T>

On the other hand, this is used to represent a delegate that does not return a value.

The example we have been using can be rewritten as below:

C#
Func<string, bool> pOnly = delegate(string s) { return s.StartsWith("P"); }; 
string[] names = new string[] { "John", "Paul", 
                                "George", "Ringo" }; 
var name = names.Select(pOnly);

Sequences

Running the demo code from this article, you will notice that all of the examples above do not return a single value. Rather, they return a collection of boolean values indicating whether each element in the input collection matched the specified expression. This collection is referred to as a sequence in LINQ.

Image 1

If we wanted the single value that matched the expression, we would use the Single extension method.

C#
string name = names.Single(pOnly); 

Notice here that the name variable is typed as a string. Although we could still use var, we know that the return value is, or should be, a string.

Extension Methods

Extension Methods are a feature of .NET 3.5 that allows developers to add functionality to existing classes without modifying the code for the original class. A useful scenario when you want to provide additional functionality and don't have access to the code base, such as when using third-party libraries.

Extension Methods are static methods on static classes. The first parameter of these methods is typed as the data type for which it is extending, and uses the this modifier. Notice that this is being used as a modifier, not as a reference to the current object.

C#
public static class StringExtensions
{
   public static int ToInt(this string number)
   {
      return Int32.Parse(number);
   }
   public static string DoubleToDollars(this double number)
   {
      return string.Format("{0:c}", number);
   }
   public static string IntToDollars(this int number)
   {
      return string.Format("{0:c}", number);
   }
}

When this class is compiled, .NET applies the System.Runtime.CompilerServices.Extension to it, and when it is in scope, Intellisense can read this information and determine which methods apply based on the data type.

Image 2

Image 3

As we can see here, in the first example, Intellisense knows that the ToInt method applies to strings, and only DoubleToDollars applies to doubles.

Query expression and methods

There are two ways to execute LINQ queries: query expression and dot-notation. The former resembles a SQL query, except that the select clause is last.

C#
string[] camps = new string[]{"CodeCamp2007","CodeCamp2008","CodeCamp2009"};
var currentCamp = from camp in camps
   where camp.EndsWith(DateTime.Now.Year.ToString())
   select camp;
string currentCamp = camps.Single(c => c.EndsWith(DateTime.Now.Year.ToString()));

These two statements produce the same results because the query expression format is converted to methods at compile time. There are several ways to produce results with methods. Each of the below will produce the same results.

C#
string currentCamp2 = camps.Where(c => c.EndsWith(DateTime.Now.Year.ToString())).Single();
string currentCamp3 = camps.Single(c => c.EndsWith(DateTime.Now.Year.ToString()));
string currentCamp4 = camps.Select(c => c).Where(
       c => c.EndsWith(DateTime.Now.Year.ToString())).Single(); 

The XML part

Now that we have an understanding of LINQ, it's time to move on to the XML part.

For this article, we will be using this XML file:

XML
<?xml version="1.0" encoding="utf-8" ?>
<employees>
   <employee id="1" salaried="no">
      <name>Gustavo Achong</name>
      <hire_date>7/31/1996</hire_date>
   </employee>
   <employee id="3" salaried="yes">
      <name>Kim Abercrombie</name>
      <hire_date>12/12/1997</hire_date>
   </employee>
   <employee id="8" salaried="no">
      <name>Carla Adams</name>
      <hire_date>2/6/1998</hire_date>
   </employee>
   <employee id="9" salaried="yes">
      <name>Jay Adams</name>
      <hire_date>2/6/1998</hire_date>
   </employee>
</employees> 

The old way

In the previous versions of the .NET Framework, XML was document-centric; in other words, to create any structure, you first had to start with an XMLDocument.

C#
public class OldWay
{
   private static XmlDocument m_doc = new XmlDocument();
   
   public static void CreateEmployees()
   {
      XmlElement root = m_doc.CreateElement("employees");
      root.AppendChild(AddEmployee(1, "Gustavo Achong", 
                       DateTime.Parse("7/31/1996"), false));
      root.AppendChild(AddEmployee(3, "Kim Abercrombie", 
                       DateTime.Parse("12/12/1997"), true));
      root.AppendChild(AddEmployee(8, "Carla Adams", 
                       DateTime.Parse("2/6/1998"), false));
      root.AppendChild(AddEmployee(9, "Jay Adams", 
                       DateTime.Parse("2/6/1998"), false));
      m_doc.AppendChild(root);
      Console.WriteLine(m_doc.OuterXml);
   }

   private static XmlElement AddEmployee(int ID, string name, 
                  DateTime hireDate, bool isSalaried)
   {
      XmlElement employee = m_doc.CreateElement("employee");
      XmlElement nameElement = m_doc.CreateElement("name");
      nameElement.InnerText = name;
      XmlElement hireDateElement = m_doc.CreateElement("hire_date");
      hireDateElement.InnerText = hireDate.ToShortDateString();
      employee.SetAttribute("id", ID.ToString());
      employee.SetAttribute("salaried", isSalaried.ToString());
      employee.AppendChild(nameElement);
      employee.AppendChild(hireDateElement);
      return employee;
   }
}

Smart developers would create helper methods to ease the pain, but it was still a verbose, cumbersome process. An XMLElement can't be created on its own, it must be created from an XMLDocument.

C#
XmlElement employee = m_doc.CreateElement("employee"); 

Trying to do this generates a compiler error:

C#
XmlElement employee = new XmlElement(); 

Looking at the above example, it is also difficult to get an idea about the scheme for this document.

The new way

Using the classes from the System.Xml.Linq namespace and the features available in .NET 3.5, constructing an XML document is very easy and very readable.

C#
public static void CreateEmployees()
{
   XDocument doc = new XDocument(
      new XDeclaration("1.0", "utf-8", "yes"),
      new XComment("A sample xml file"),
      new XElement("employees",
         new XElement("employee",
            new XAttribute("id", 1),
            new XAttribute("salaried", "false"),
               new XElement("name", "Gustavo Achong"),
               new XElement("hire_date", "7/31/1996")),
         new XElement("employee",
            new XAttribute("id", 3),
            new XAttribute("salaried", "true"),
               new XElement("name", "Kim Abercrombie"),
               new XElement("hire_date", "12/12/1997")),
         new XElement("employee",
            new XAttribute("id", 8),
            new XAttribute("salaried", "false"),
               new XElement("name", "Carla Adams"),
               new XElement("hire_date", "2/6/1998")),
         new XElement("employee",
            new XAttribute("id", 9),
            new XAttribute("salaried", "false"),
               new XElement("name", "Jay Adams"),
               new XElement("hire_date", "2/6/1998"))
       )
   );
}

Constructing a document in this way is possible because of the functional construction feature in LINQ to XML. Functional construction is simply a means of creating an entire document tree in a single statement.

C#
public XElement(XName name, Object[] content)

As we can see from one of the constructors for XElement, it takes an array of objects. In the example above, the employees element is constructed from four XElements, one for each employee, which in turn is constructed from XAttributes and other XElements.

In the above example, we could have replaced XDocument with XElement if we removed the XDeclaration and XComment objects. This is because the constructor XDocument used takes a XDeclaration instance, rather than the XName that the XElement constructor takes.

C#
public XDocument(XDeclaration declaration,Object[] content) 

Another thing to note when running the demo is how both documents are printed to the console window.

Image 4

As we can see, the old method just streams the contents of the document to the console. The method does that also; however, it is nicely formatted with no extra effort.

Namespace support

Namespaces are, of course, supported through the XNamespace class.

C#
XNamespace ns = http://mycompany.com;
XElement doc = new XElement(
   new XElement(ns + "employees",
   new XElement("employee", 

One thing to note is that if one element uses a namespace, they all must use one. In the case above, we can see that an empty xmlns attribute will be added to the employee element:

XML
<employees xmlns="http://mycompany.com">
   <employee id="1" salaried="false" xmlns="">

Explicit conversion

One of the many nice things with the new XML support is support for explicit conversion of values.

Previously, all XML values were treated as strings and had to be converted as necessary.

C#
// Must be string, or converted to string
//idElement.InnerText = 42;
idElement.InnerText = "42";
int id = Convert.ToInt32(idElement.Value); 

With the new API, this is much more intuitive:

C#
XElement element1 = new XElement("number", 42);
// It doesn't matter it the value is a string or int
XElement element2 = new XElement("number", "42");

int num1 = (int)element1;
int num2 = (int)element2; 

Traversing an XML tree

Traversing an XML tree is still very easy.

C#
foreach(var node in doc.Nodes()) 

We can use the nodes in the collections of the document, or root element. Note here, however, that this will traverse the entire tree, including all children, not just the sibling nodes.

C#
foreach(var node in doc.Nodes().OfType<XComment>()) 

This method can be used to traverse specific node types, comments in this case. Or we can get to specific child nodes this way.

C#
foreach(var node in doc.Elements("employees").Elements("employee").Elements("name")) 

This is an improvement over nested iterations or obtaining an XMLNodeList with an XPath query.

XPath

XPath support has been built into the API through the use of Extension Methods, such as:

  • Descendents
  • Ancestors
  • DescendentsAndSelf
  • AncestorsAndSelf
  • ElementsBeforeSelf
  • ElementsAfterSelf

This is not an extensive list, so check the documentation for all the others available.

Transforming XML

Transforming an XML document or element is still possible using the methods I'm sure we are all familiar with.

C#
//Load the stylesheet.
XslTransform xslt = new XslTransform();
xslt.Load(stylesheet);

//Load the file to transform.
XPathDocument doc = new XPathDocument(filename);
//Create an XmlTextWriter which outputs to the console.

XmlTextWriter writer = new XmlTextWriter(Console.Out);

//Transform the file and send the output to the console.
xslt.Transform(doc, null, writer, null);
writer.Close(); 

However, with the new API, we can make use of function construction and LINQ queries to transform a document:

C#
XElement element = new XElement("salaried_employees", from e in doc.Descendants("employee") 
          where e.Attribute("salaried").Value == "true" 
          select new XElement("employee", 
             new XElement(e.Element("name")) ) ); 

Conclusion

XML is a fantastic construct that has been deeply ingrained into just about everything. Having the ability to easily construct, query, transform, and manipulate XML documents is an invaluable service that will improve the speed in which applications can be built and the quality of those applications.

This article is not an exhaustive investigation of LINQ to XML; there have been many other articles, snippets, and blogs written on the subject. It mainly is just a taste and familiarization of what is possible using .NET 3.5.

References

History

  • Initial release: 3/14/08.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)