Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / web / ASP.NET

Simple High Performance XML Serialization using RestDirector™

3.00/5 (5 votes)
2 Aug 2008CPOL14 min read 1   87  
RestDirector provides performance and readability to XML.
Image 1

Overview

XML is a simple and human-readable way to communicate and store data. As a file it is frequently possible to examine and even modify the data without having other knowledge of the application that uses it. This is extremely difficult with a binary file. There is some inefficiency when using XML compared to binary, but much of that is implementation, not XML itself.

RestDirector’s XML serialization technique puts a little more burden on the developer to code in exchange for greater control over the serialization, elimination of startup delays, and overall performance increases.

XML Background

SGML is designed to be “all things to all people”, but in doing so it is too complex and therefore not widely used. HTML is a simple subset of SGML that is easily understood and widely used. It is HTML’s simplicity that helped catapult it to become the markup language of the web.

XML is also derived from SGML and designed to be a simple markup language that is easy-to-read for human and computer. Like HTML, XML has also become widely used.

Over time both HTML and XML have seen its simple beginnings turn complex. However, probably the bulk of applications still use only the basic features of both languages. In the case of XML the term POX or Plain Old XML has gained ground as the simple way to implement XML.

Simplifying XML

Much of XML’s complexity comes from namespaces. Namespaces are a great tool to allow interoperability with application A’s data intermixed with application B’s data. Many applications benefit from this design, but most probably have no need for it as they control all of the data. Namespaces allow any child of an arbitrary node to be fully independent in its definition to its parent.

Namespaces are not necessary when your application defines all nodes or nodes defined outside your application are fully contained in a node within your application, as in when your application is simply a container for the data of the outside application.

Schemas can also be of great use, especially when working with data provided by a third party or a very formal environment. However, when you have complete control of the data, much of what schemas provide is redundant. Any necessary validation can happen during serialization by your code, rather than code outside of your application.

Advantages of RestDirector’s Implementation of XML

Microsoft’s XML serialization in the .Net Framework is simple and inventive. However, when the data becomes large or complex, performance suffers. Because so much is done for you, if you desire to do something specific you put a lot of contortions in your code. Earlier implementations of software based on RestDirector when it used Microsoft’s XML serialization had an 18 second startup time on a 2007 vintage desktop. Once RestDirector implemented its own serialization, the startup time became one-quarter second. The difference was easily noticed.

Data is also required to be public properties and some items, like dictionaries, can’t be serialized directly. RestDirector serializes anything, including private values, dictionaries, data from other classes, and calculated values.

RestDirector also allows your application to directly control the order of rendering so that the attributes and elements appear as your code desires, not what the auto-generated code generates.

Since RestDirector leaves virtually everything in the hands of your application, the door is open to unique and imaginative implementations. For example, parsed data can be put into an existing instance. Conditional rendering also becomes simple.

You can also debug the serialization as it is your code and not autogenerated.

Disadvantages of RestDirector’s Implementation of XML

It is not all gain with RestDirector’s XML serialization as some of the things that are done for you are now in your hands to code. Techniques have been explored to automate this code while providing mechanics for custom code. If this project is commissioned, it could also provide automatic schema generation for all but custom coded nodes.

RestDirector’s Serialization

Serialization of a given class is done by the class implementing DirectorWare.File.IXml which contains four methods:

void XmlParseAttribute(DirectorWare.File.XmlFile file);

void XmlParseElement(DirectorWare.File.XmlFile file);

void XmlRenderAttributes(DirectorWare.File.XmlFile file);

void XmlRenderElements(DirectorWare.File.XmlFile file);

Clearly two are for parsing and two are for rendering and each has a method for attributes and elements.

Serialization is not done because a property is public or decorated. Rather these simple methods are called in the class to parse and render. Supporting methods in RestDirector make serializing most nodes only one line of code.

Rendering

A given instance is rendered to XML by calling “XmlRenderAttributes” and “XmlRenderElements.” The methods are simply called in sequence and do not restrict the class from outputting attributes or elements. However, by breaking the rendering into two methods there are numerous advantages.

· It distinguishes in code where the attributes are being rendered and where the elements are being rendered.

· Attributes must be rendered first and it helps reinforce this.

· Classes can be inherited allowing new attributes and elements to be added without concern about ordering attributes and elements.

Simple Rendering Example

Assume that your class is required to produce the following XML snippet:

XML
<Element1 Attribute1="Value1" Attribute2="123">

  <Element2/>Element2’s data</Element2/>

  <Element3/>Element3’s data</Element3/>

</Element1>

To render the attributes, XmlRenderAttributes is called. In the example below the two attributes are rendered by calling the RenderAttribute Method. Typically you will use a constant for the attribute name, but since you are calling a method, you have the option of programmatically generating the attribute name.

The second argument is the value of the attribute which can come from any available source. It does not have to be public, nor does it have to be in this particular class. The value is rendered as a string and therefore you can pass a string to allow you total control of what the attribute contains. The rendering methods are overloaded to allow direct handling of value such as integer, decimal, DateTime, Version, or a variety of others. The result is a method implementation that typically looks no more complicated than this example.

public void XmlRenderAttributes(DirectorWare.File.XmlFile file)

{

file.RenderAttribute("Attribute1", this.value1);

file.RenderAttribute("Attribute2", this.value2);

}

To render the elements, XmlRenderElements is called. In the example below the two elements are rendered by calling the RenderElement Method. This is virtually identical to rendering attributes with the same flexibility.

When the value is a string you are simply rendering an element with text. Just like rendering attributes, the method is overloaded. Of particular note is that you can pass an object that implements the DirectorWare.File.IXml interface. By doing so, you now have the mechanics to nest child nodes as deep as required.

public void XmlRenderElements(DirectorWare.File.XmlFile file)

{

file.RenderElement("Element2", this.element2);

file.RenderElement("Element3", this.element3);

}

Rendering Elements for Readability

Often it is preferable to group items as attributes within an element rather than as separate elements. This can greatly aid in human readability as well as reducing the XML size. An example of this comes from the diagnostics in RestDirector. The Environment class provides a detailed list of the application’s environment and could be lengthy. Information about the host operating systems could be rendered as elements such as:

<Environment>

<OSPlatform>Win32NT</OSPlatform>

<OSVersion>6.0.6001.65536</OSVersion>

</Environment>

To shrink the XML and make it more readable we render this:

<Environment>

<OS Platform="Win32NT" Version="6.0.6001.65536"/>

</Environment>

The Environment class has over a dozen nodes, but just two are shown in this example. Clearly doing this on a large scale can radically shrink the XML and improve human readability.

To render this within one class we do the following in the Environment class:

public void XmlRenderElements(DirectorWare.File.XmlFile file)

{

file.RenderElement("OS", this.xmlRenderOS);

}

private void xmlRenderOS(DirectorWare.File.XmlFile file)

{

file.RenderAttribute("Platform", this.osPlatform);

file.RenderAttribute("Version", this.osVersion);

}

This gives us nested rendering by passing a method based on the XmlFileMethod delegate. The delegate allows us to call methods with the XmlFile argument. We can now add child elements to our class’s element without having to instantiate additional classes. When we need to pass an object to the method we can use the XmlFileMethodValue delegate.

Rendering Lists

Rendering a list can be as simple or complex as desired. This example comes from RestDirector’s logging of an exception.

public void XmlRenderElements(DirectorWare.File.XmlFile file)

{

foreach (ExceptionStackTrace stackTrace in this.stackTraceList)

file.RenderElement("Stack", stackTrace);

}

In this case, a series of elements named “Stack” are produced as required. Being required to code the rendering can open the door to other processing during the rendering, such as filtering and ordering.

Rendering Dictionaries

Since we are iterating through our list ourselves during rendering, iterating though a dictionary is now possible. This can be done in a multitude of ways and we will show one example. This again comes from RestDirector’s diagnostics where the list of assemblies in the application is rendered.

In this case the dictionary is hash-based and therefore is ordered for internal seeking-performance, not human performance. So with the slight penalty of having to do a sort before rendering, we will take our dictionary and turn it into a SortedList to render. Fortunately the code is simple.

public void XmlRenderElements(DirectorWare.File.XmlFile file)

{

SortedList<string, Assembly> list

= new SortedList<string, Assembly>(this.dictionary);

foreach (Assembly assembly in list.Values)

file.RenderElement("Assembly", assembly);

}

We can keep it this simple because the key value for our dictionary is also in the value object. If that wasn’t the case you could add the key as an attribute in the element you are rendering by using the XmlFileMethod delegate shown earlier.

Parsing

Parsing uses a similar but complementary approach. Each attribute is parsed by calling the class’s XmlParseAttribute method and each element by XmlParseElement. Both methods work in a similar way just with different data. In both cases the XmlFile contains a field called Name that contains the current node being parsed. So returning to our first XML example:

<Element1 Attribute1="Value1" Attribute2="123">

<Element2>Element2’s data</Element2>

<Element3>Element3’s data</Element3>

</Element1>

Parsing of attributes occurs first and in order of appearance. Typically you would examine the “file.Name” value and act accordingly. In most cases is it easiest to read doing a list of compares with returns when a match occurs rather than using “else if” statements. This allows for easy reordering or commenting out. The attribute value can be returned in its raw form or converted such as a long as shown below.

public void XmlParseAttribute(DirectorWare.File.XmlFile file)

{

if (file.Name == "Attribute1")

{ this.attribute1 = file.ParseAttributeString; return; }

if (file.Name == "Attribute2")

{ this.attribute2 = file.ParseAttributeLong; return; }

}

Parsing of elements is identical in concept but also addresses nested elements. In the case of “Element2” below, we are expecting a simple text element so we can take the element’s text and return. More typically the element is processed by parsing a new object. We show “Element3” as an object that is instantiated and then parsed.

public void XmlParseElement(DirectorWare.File.XmlFile file)

{

if (file.Name == "Element2")

{ this.element2 = file.ParseElementString; return; }

if (file.Name == "Element3")

{ file.Parse(this.element3 = new Element3()); return; }

}

Parsing Nested Elements within a Class

Previously we rendered the following from a single class:

<Environment>

<OS Platform="Win32NT" Version="6.0.6001.65536"/>

</Environment>

To parse the Environment class we can do the following:

public void XmlParseElement(DirectorWare.File.XmlFile file)

{

if (file.Name == "OS")

{ file.Parse(this.xmlParseAttributeOS, null); return; }

}

private void xmlParseAttributeOS(DirectorWare.File.XmlFile file)

{

if (file.Name == "Platform")

{ this.osPlatform = file.ParseAttributeString; return; }

if (file.Name == "Version")

{ this.osVersion = file.ParseAttributeString; return; }

}

What we have is similar to the nested rendering and again uses a method based on the XmlFileMethod delegate. We find an element named “OS” and then call the “Parse” method which will parse this element. We pass to it two methods, one to parse the attributes, and a null since we are not going to parse any elements. The xmlParseAttributeOS method will then handle the attributes we expect and put them in the current instance.

Parsing Lists

We can parse the list from our stack trace example as follows:

public void XmlParseElement(DirectorWare.File.XmlFile file)

{

if (file.Name == "Stack")

{

ExceptionStackTrace stackTrace = new ExceptionStackTrace();

file.Parse(stackTrace);

this.stackTraceList.Add(stackTrace);

return;

}

}

As a “Stack” element arrives, we instantiate a new ExceptionStackTrace class and let it parse the element. We then take that populated instance and add it to the list in our class.

Parsing Dictionaries

As with parsing, since we have full control of the code we are able to do other processing to the data as it arrives. We can now handle dictionaries as suits our needs. In our rendering example, we took our hash-based dictionary and made a SortedList to render. Since we don’t care how the data comes in to put it into our dictionary, it is a simple matter of placing the value into the dictionary.

The “Assembly” class contains the key, so we can parse the Assembly and use the instance itself to retrieve the key. We use the assignment statement rather than the “Add” method of the dictionary to be tolerant of identical keys.

public void XmlParseElement(DirectorWare.File.XmlFile file)

{

if (file.Name == "Assembly")

{

Assembly assembly = new Assembly();

file.Parse(assembly);

this.dictionary[assembly.Name] = assembly;

return;

}

}

Handing the root nodes

Now that we can handle the internal nodes, we must process the root node. Methods exist to serialize to and from a stream or file.

The root node is rendered to a file with the “RenderToFile” method or to a stream with the “RenderToStream” method.

file.RenderToFile("c:/File.xml", this);

We are passing either the path, or the stream, and the IXml object to render. We are using the same technique and methods as before but we have one requirement when rendering the root in that there can only be one root element. Since we are using method-calls instead of decorations, we can dynamically decide on everything including the root name.

public void XmlRenderElements(DirectorWare.File.XmlFile file)

{

file.RenderElement("Root", this.xmlRenderRoot);

}

Parsing is just as simple with “ParseFromFile” and “ParseFromStream” methods. However we now have additional flexibility in that we are using our same methodology to parse the root name. This allows us to dynamically handle the file depending on the root name. Such as:

public void XmlParseElement(DirectorWare.File.XmlFile file)

{

if (file.Name == "RootA")

{ file.Parse(null, this.xmlParseRootA); return; }

if (file.Name == "RootB")

{ file.Parse(null, this.xmlParseRootB); return; }

}

Future Enhancements

There are several enhancements that are being considered for RestDirector’s XML serialization. Some things are simple helper methods to make the developer’s life a little easier.

For example when version 2 of an application adds some new nodes, reading of the old nodes should cause no issues. When version 1 reads a file written by version 2 the new nodes would be ignored. However if the version 1 now saves an updated file, the version 2 nodes are lost. The solution is to save unrecognized nodes as strings and render them as such. This can be done in your code, but it would be nice if you could simply ask RestDirector to do that for you.

To be simple the first iteration of RestDirector’s XML serialization uses “System.Xml.XmlDocument”during parsing but not rendering. This added an unnecessary inefficiency that should be removed. For this part it was correctness first and performance next.

RestDirector gives you a substantial performance boost and flexibility, but for simple rendering it would still be nice to have at least some auto-generated code. This could also open the door to at least semiautomatic schema generation. Options are being reviewed for this.

Conclusion

We have shown that XML serialization using RestDirector can be simple or complex as required and can produce a more readable result with much higher performance and no startup time for serialization assembly generation. The result can be improvement in application performance and user satisfaction.

RestDirector’s serialization is not an “all things to all people” solution, but will handle the bulk of applications where simplicity and performance are vital.

Full information on RestDirector™ and sample applications are available at http://RestDirector.com.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)