(untagged)

Extending LINQ to XML

Anton Minko

0.00/5 (No votes)

8 Sep 2013

Set of extension methods to LINQ to XML

Introduction

Linq to XML is a great and neat API. However, having working with it for quite a while, I have several concerns to it. Let’s consider them and then take a look at what we can do to make this API even better.

Using the code

As an example XML we’ll parse this excerpt from Amazon search response:

<Item>
  <ASIN>059035342X</ASIN>
  <SmallImage>
    <URL>http://ecx.images-amazon.com/images/I/51MU5VilKpL._SL75_.jpg</URL>
    <Height Units="pixels">75</Height>
    <Width Units="pixels">51</Width>
  </SmallImage>
  <ItemAttributes>
    <Author>J.K. Rowling</Author>
    <EAN>9780590353427</EAN>
    <ISBN>0439708184</ISBN>
    <ListPrice>
      <Amount>1099</Amount>
      <CurrencyCode>USD</CurrencyCode>
      <FormattedPrice>$10.99</FormattedPrice>
    </ListPrice>
    <PublicationDate>1999-10-01</PublicationDate>
    <Title>Harry Potter and the
Sorcerer's Stone (Book 1)</Title>
  </ItemAttributes>
</Item>

Parsing XML

If I have XElement variable with XML above, I would like to get its properties in the following way:

var title = item.Element("ItemAttributes").Element("Title").Value;
var imageUrl = new Uri(item.Element("SmallImage").Element("URL").Value);
var imageHeight = (int) item.Element("SmallImage").Element("Height");
var publicationDate = (DateTime) item.Element("ItemAttributes").Element("PublicationDate");

The problem here is that almost all nodes in this response are optional. I can assume that some of them, such as Author and Title, will always be specified for any book. Most of the others are really optional. Some books might not have images, others are out of stock and don’t have price, yet another available only for pre-order and don’t have publication date yet. Another problem is that conversion to value types might not go smooth and you’ll get FormatException with a little information about context of the error.

So real-life defensive code will look like this:

var smallImageElement = item.Element("SmallImage");
if (smallImageElement != null)
{
    var urlElement = smallImageElement.Element("URL");
    if (urlElement != null && Uri.IsWellFormedUriString(urlElement.Value, UriKind.Absolute))
    {
        var imageUrl = new Uri(urlElement.Value);
    }
}

It’s not nice at all. We can hide it into helper method or property. It is likely that this XML will be hidden into DTO type which will be responsible for XML parsing and it will expose properties for every value. Anyway, I would prefer to not write a lot of such ugly code.

I would like to write code that:

Can easily and safely be chained.
Mark elements and attributes as mandatory or optional. If mandatory element is missing, throw meaningful exception, but do not throw exceptions for missing optional nodes.
Get value of specified type. If element is optional, try get value and use some default if value is missing.

Below is re-written code that uses several extension methods:

var title = item.MandatoryElement("ItemAttributes").MandatoryElement("Title").Value;

var imageUrl = item.ElementOrEmpty("SmallImage").ElementOrEmpty("URL").Value(value => 
                  Uri.IsWellFormedUriString(value, UriKind.Absolute) ? new Uri(value) : null);

var imageHeight = item.ElementOrEmpty("SmallImage").ElementOrEmpty("Height").Value<int>(0);

var publicationDate = item.MandatoryElement("ItemAttributes").ElementOrEmpty("PublicationDate").Value<DateTime>();

This code doesn’t require any checks for null or catching NullReferenceException

However, if mandatory element is missing, the MandatoryElement method will throw XmlException with message “The element 'Item' doesn't contain mandatory child element 'ItemAttributes'.”. That is exactly what we want – detect that element we’re expecting to be there is missing. On the other hand, ElementOrEmpty method will newer throw exceptions. If element is missing (e.g. Element method returns null), it will create and return empty element with specified name so chaining can be continued (the NullObject pattern).

The Value methods also might throw XmlException like this: “The element 'Amount' has value 'Five' which cannot be converted to the value of type 'int'.” which has much more context information (the original FormatException is included as inner exception). Error handling is made in a way that all exceptions have XmlException type and include necessary context information for easy location and reproduction of the occurred problem.

Let’s take a look at some extension methods implementation.

public static XElement MandatoryElement(this XElement element, XName name)
{
    XElement childElement = element.Element(name);
    if (childElement == null)
    {
        throw new XmlException(string.Format("The element '{0}' doesn't contain mandatory child element '{1}'.", element.Name, name));
    }
    return childElement;
}

The MandatoryElement checks result of Element method and throws exception if null was returned. Nothing fancy, it just gets the null value handling out of your parsing logic.

The ElementOrEmpty:

public static XElement ElementOrEmpty(this XElement element, XName name)
{
    if (element != null)
    {
        XElement childElement = element.Element(name);
        return childElement ?? new XElement(name);
    }
    return new XElement(name);
}

The Value<T> method with custom conversion function:

public static T Value<T>(this XElement element, Func<string, T> convert)
{
    if (element == null)
    {
        return default(T);
    }
    return convert(element.Value);
}

In this case, the error handling of incorrect values is the responsibility of the convert method as demonstrated above.

The last method is the Value<T> method that parses a lot of pre-defined value types and enum values:

public static T Value<T>(this XElement element, T defaultValue = default(T)) where T : struct, IConvertible
{
    if (element == null || string.IsNullOrEmpty(element.Value))
    {
        return defaultValue;
    }
    string value = element.Value;
    try
    {
        Type typeOfT = typeof(T);
        if (typeOfT.IsEnum)
        {
            return (T)Enum.Parse(typeOfT, value, ignoreCase: true);
        }
        return (T)Convert.ChangeType(value, typeof(T));
    }
    catch (Exception ex)
    {
        throw new XmlException(string.Format("The element '{0}' has value '{1}' which cannot be converted to the value of type '{2}'.", element.Name, element.Value, typeof(T).Name), ex);
    }
}

This method allows parsing of int, uint, byte, sbyte, short, ushort, char, long, ulong, float, double, decimal, bool or DateTime values. Moreover, it can handle custom enumerations, including flag enumerations. Suppose, we have Color enum:

[Flags]
private enum Colors
{
    None = 0,
    Red = 1,
    Green = 2,
    Blue = 4
}

Having that, you can use Value<T> method to deserialize enum values:

var enumElement = new XElement("color", Colors.Red | Colors.Green);   // Value = "Red, Green"
Colors colors = enumElement.Value<Colors>();

Generating XML

Let’s take a look at generation of XML:

var itemXml = new XElement("Item",
                new XElement("ASIN", item.Asin),
                new XElement("SmallImage",
                   new XElement("URL", item.SmallImage.Url),
                   new XElement("Height",
                      new XAttribute("Units", item.SmallImage.Units),
                      item.SmallImage.Height),
                   new XElement("Width",
                      new XAttribute("Units", item.SmallImage.Units),
                      item.SmallImage.Width)),
                new XElement("ItemAttributes",
                   new XElement("Author", item.ItemAttributes.Author),
                   new XElement("Binding", item.ItemAttributes.Binding)));

This code looks fine if you always need to output all elements and attributes. Problems start when you either have hierarchical data with optional structures (like item.SmallImage in this example which might be null) and/or you don’t want to output empty nodes. In order to handle these issues, we will use couple of static methods.

itemXml = new XElement("Item",
            new XElement("ASIN", item.Asin),
            XmlExtensions.NewOptionalXElements("SmallImage", () => item.SmallImage != null,
               new XElement("URL", item.SmallImage.Url),
               new XElement("Height",
                  new XAttribute("Units", item.SmallImage.Units),
                  item.SmallImage.Height),
               new XElement("Width",
                  new XAttribute("Units", item.SmallImage.Units),
                  item.SmallImage.Width)),
            new XElement("ItemAttributes",
               new XElement("Author", item.ItemAttributes.Author),
               XmlExtensions.NewOptionalXElement("Binding", item.ItemAttributes.Binding)));

The NewOptionalXElement tests value for null and if it is null, it doesn’t output node at all:

public static XElement NewOptionalXElement(XName name, object value)
{
    if (value != null)
    {
        return new XElement(name, value);
    }
    return null;
}

The NewOptionalElements accepts function that tests whether other its content has to be added to output:

public static XElement NewOptionalXElements(XName name, Func<bool> condition, params object[] value)
{
    if (condition())
    {
        return new XElement(name, value);
    } 
    return null;
}

It allows adding some condition to the group of nodes.

Other goodies

The attached project has several more extension methods which you may find useful.

XAttribute MandatoryAttribute(this XElement element, XName name);

XAttribute AttributeOrEmpty(this XElement element, XName name, string defaultValue = null);

T Value<T>(this XAttribute attribute, T defaultValue = default(T));

T Value<T>(this XAttribute attribute, Func<string, T> convert);

void AssertName(XElement element, XName expectedName);

XAttribute NewOptionalXAttribute(XName name, object value, object defaultValue = null);

As you see, with a little bit of adjusting, the LINQ to XML API becomes even better. Feel free to use these extensions in your projects.

Happy coding!

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here