Table of Contents
In document-centric environments where data is persisted as XML documents, the need often arises to provide business logic at various stages during the processing of these documents. One way to do this is to use the System.Xml.XmlDocument
class to 'front' the XML document and provide a means to modify the data. However, XmlDocument
essentially models the XML structure of the document, and not the data structure of the document itself. Another way to do this would be to provide a data-access layer with classes that model the actual data structure and content itself, and provide serialization support to read and write the data to XML streams.
The advantages of the second approach are significant to the code developer and maintainer:
- The classes can be re-used in the context of other data composition classes.
- The developer works directly with classes and data types that are mostly domain-based, which is semantically easier to understand, more concise to code, and usually the code is a lot easier to read and maintain.
- The code is usually easier to update as the data definitions change.
- Visual Studio automatically provides intellisense prompting for these data types, so coding is faster and less error prone.
The mapping of W3C XML schemas (XSDs) to code classes and class attributes, although complex, is consistent, and the automated generation of code is entirely possible and has been provided as a standard set of functionality in Visual Studio. Bundled with Visual Studio is a utility called XSD.EXE, which amongst other functionality, can generate language code from XSDs. The utility provides a console-based application to generate both C#.NET and VB.NET code.
This functionality is provided for the user to generate his/her own code classes, but the functionality behind this utility is also used in Visual Studio to generate serialization map classes for ASP.NET web-services.
If the XSD.EXE generated classes are viewed as data-access classes, looking at the code, one realizes that the code is fairly complex to use and encapsulation is not really provided as all attributes are defined as public fields. In fact, the code, although simple, is quite ugly, and does not conform to many of the practices applied in the production of robust application code.
The purpose of this article is to demonstrate how the XSD tool output code can be modified to provide better encapsulation, robustness, and ease-of-use.
There are a few approaches to 'upgrading' the output of the XSD tool:
- By using the
System.Reflection
namespace functionality to obtain the definitions of the classes, and applying a code template generator to generate a set of new classes that either 'wrap' the generated classes, or replace them completely. The advantage of this is, the template can provide specific and complex application code additions (that can be provided by users at runtime!) to generate the new code classes. The disadvantage is that a template must be produced for every target code language. There are a few examples of these on the Internet, some doing the job programmatically and using C# and VB language templates, others using XSL style-sheets to perform a similar job. - The other, more versatile method is to create a CodeDOM of the XSD using the same functionality that the XSD tool uses, and then to modify the CodeDOM to provide the desired output. The advantage of this route is that code can, in theory, be generated for any supported .NET language, directly from the CodeDOM. The main disadvantage is that providing language code constructs at the CodeDOM level is very tedious, complex, and error-prone (CodeDOM programming reminds me of assembly-programming - you don't get a lot of bang for your buck!). Currently, there is a distinct lack of real
ICodeParser
implementations in .NET (including those found in the Rotor and Mono projects) that can generate more than just interface code. This means that currently you can only add code realistically via the CodeDOM route. (ICodeParser
implementers will read source-code and produce CodeDOM program graphs. This provides a means of writing code in one .NET language, easily converting it into a CodeDOM, which can then be used to generate code in any of the other .NET languages.)
In this article, I describe how the CodeXS tool essentially follows the second approach. The essence of the CodeXS code generation functionality is described in this MSDN article:
A tool which provides some of the enhanced functionality available in CodeXS can also be downloaded here:
Code-XS adds the following features to the code generated by XSD.EXE:
CodeXS is Built using an Extensible Architecture
The basis for generating code is to produce a CodeDOM from a schema (XSD), and this can be done as follows:
public static CodeNamespace Process(string xsdFile, string targetNamespace)
{
XmlSchema xsd;
using(FileStream fs = new FileStream(xsdFile, FileMode.Open))
{
xsd = XmlSchema.Read(fs, null);
xsd.Compile(null);
}
XmlSchemas schemas = new XmlSchemas();
schemas.Add(xs);
XmlSchemaImporter importer = new XmlSchemaImporter(schemas);
CodeNamespace ns = new CodeNamespace(targetNamespace);
XmlCodeExporter exporter = new XmlCodeExporter(ns);
foreach(XmlSchemaElement element in xsd.Elements.Values)
{
XmlTypeMapping mapping =
importer.ImportTypeMapping(element.QualifiedName);
exporter.ExportTypeMapping(mapping);
}
}
The CodeXS tool provides the basis for reading XML schemas and generating the CodeDOM for the schema classes. The tool then provides a facility to hook up 3rd party code modifier assemblies that provide implementers of a tool-defined ICodeModifier
interface. The CodeDOM is then passed to these code modifiers in a structured fashion, and these implementers actually provide the code extensions to the CodeDOM generated by the XSD tool. Once these code implementers have completed, the tool then splits up the CodeDOM into separate CodeDOMs for each schema file (in the case of included schemas), and then generates the code output for each file.
The CodeXS generator exists as a class library. It was decided to invest effort in providing the functionality as an online tool. The CodeXS generator uses an assembly containing a standard set of code modifiers (appropriately called 'StandardCodeModifier.dll') and fronted by an ASP.NET web-service named CodeXS. The CodeXS online tool is an ASP.NET client that is serviced by the CodeXS web-service. The online tool can be found here:
Corrected Use of Elements With Reserved-Keyword Type Names
Some schema definitions define elements or types which are reserved keywords or type names. Global Justice XML (GJXML) defines a schema complex type to wrap textual data called 'string
'.
It is possible to use 'string
' as a class name in your code as follows, for C#:
public class @string
{
..
};
and for VB:
Public Class [string]
..
End Class
The XSD tool code generation does provide this class definition, but produces code that does not compile correctly in some cases. CodeXS corrects this error and also extends the functionality to provide for collections of objects of this class. CodeXS supports the redefinition of most reserved types defined by .NET, in this way.
Support for Multi-Included Schemas Using Relative schemaLocation Directory Specifiers
CodeXS provides for the implicit inclusion of all included schemas referenced through the schemaLocation
attribute in the target schema root element, and recursively, also those included in the included schemas. The only constraint is that the target schema path is regarded as the root location (either URL or directory path) for all the included schemas, and that their schemaLocation
s are specified as relative paths. This is true in the majority of published schemas.
Support for Single or Multi-Namespace, Multi-File Schemas
CodeXS will correctly manage the recursive inclusion of all schemas which have a target namespace different from the target schema. CodeXS will also correctly manage the correct XmlSchema
object generation for multiple schema-files sharing the same target namespace, by iteratively building up the schema from the source schema files. This is managed in the following method:
private XmlSchemas IncludeSchemas(XmlSchema Parent, Uri SourceUri,
XmlSchemas Schemas, Hashtable AddedSchemas)
{
foreach(XmlSchemaExternal externalSchema in Parent.Includes)
{
try
{
Uri schemaUri = new Uri(SourceUri, externalSchema.SchemaLocation);
string uriPath = this.GetUriPath(schemaUri);
XmlSchema schema = this.ReadSchema(uriPath);
if(AddedSchemas[uriPath] == null)
{
if(Schemas[schema.TargetNamespace] != null)
{
XmlSchema compSchema = Schemas[schema.TargetNamespace];
foreach(XmlSchemaObject schemaObj in schema.Items)
{
try
{
compSchema.Items.Add(schemaObj);
}
catch { }
}
}
else
{
Schemas.Add(schema);
}
AddedSchemas[uriPath] = schema;
this.IncludeSchemas(schema, schemaUri, Schemas, AddedSchemas);
}
}
catch { }
}
return Schemas;
}
Multi-File Code Generation Where Each Schema File Has a Corresponding Code File
Initially, CodeXS supported the XSD tool default and generated one code file for the entire schema set.
The generated file size is over 3 MB in both the C# and VB cases. Editing these files in Visual Studio is difficult enough in the C# editor - in the VB editor, this was just about impossible as the editor became so slow that interaction was non-existent. It was decided to divide the file according to the location of the element and type definitions in the schema files, and to produce code files that effectively corresponded to the schema files in this way. In most cases, the file size problem is solved. An added benefit is that the definitions are nicely partitioned, especially if the schema designer did the data definition partitioning carefully. This is evident in the Amber Alert schema.
In that, the whole US NCIC database definition (and other US National data definitions) is separated from the rest of the GJXML definitions, making the structure of complex schemas easier to understand.
Fields Are Made Private and Set/get Properties Defined for Each Field
This is probably the first thing an enhanced XSD tool should provide. Obviously, data protection for the actual data fields is important, and by declaring the fields private
and providing get
/set
properties, this is achieved. This also means that business rules can be coded in one place (the parent property for the object) when the field value is modified.
Un-Bounded Schema Element Sets Are Held in Typed Collections
This is then the second thing that a decent XSD extension tool should allow - a means to manage multiple element objects of the same type in a real typed collection, as opposed to the un-initialized type array generated by the XSD tool. CodeXS generates typed collections sub-classed from System.Collections.CollectionBase
which support the standard operations of add, insert, and remove, as well as complete enumeration using constructs like foreach(..)
, and full array indexing functionality.
Default Construction of Elements if They Are Referenced but Not Yet Created
CodeXS generates automatic (default) construction of elements and attributes as they are referenced via the parent object's 'get
' property and the object does not already exist. This is done wherever possible - for situations where the child element is a schema choice element, explicit construction by the application code has to be provided. A snippet from a generated code file illustrates this:
public IncidentType Incident
{
get
{
if ((this._incident == null))
{
this._incident = new IncidentType();
}
return this._incident;
}
set
{
this._incident = value;
}
}
This makes the schema code class easier to use as the application code mostly does not have to construct objects as they are required.
Correct Handling of Defaulted Schema Attribute Values
The XSD tool adds the DefaultValueAttribute
to a schema attribute which has a defined default value. In the generated code, the assigned value is always correct, but when the attribute is also required, the standard serialization neglects to add the attribute in the XML, seeing it as an optional attribute. This often means that serialization fails. CodeXS addresses this by removing the DefaultValueAttribute
from the field (or property) definition for the attribute.
Correct Handling of Both Qualified and Unqualified Schema Element and Attribute Forms
The XSD tool does not handle element non-/-qualified forms correctly for schema elements nor schema attributes in all cases. This can lead to serialization errors, particularly for very complex schemas where both elements and attributes are often qualified. CodeXS corrects this problem and appears to work correctly in most cases.
Correct Generation of the schemaLocation or noNamespaceSchemaLocation Attribute in the Root Element
CodeXS attempts to generate these root element attributes correctly in the resultant XML output from the generated code. The code that performs this is in the Serializer
class (Serializer.cs or .vb) and generates the correct URL or local disk path. This means that if you are validating XML files generated by CodeXS code classes in an XML editor such as XMLSpy, you do not have to point it to the correct location of the schema file. W3C regards these attributes as hints only: you are not required to provide the real location of the schema file. The consequence of this is that you spend a lot of time finding the schemas to validate against - a real frustration while developing.
Some Language Fix-Up Constructs to Avoid Compile Errors
Other than the 'inconsiderate' use of .NET reserved type names by some published schemas, other reserved language keywords also cause problems. This seems more prevalent for VB code generation due in part to the fact that the language definitions are not case-sensitive. There is no real way to overcome this other than scanning and changing the language definition names as required, comparing against a keyword dictionary for the language - a very ambitious task and one that CodeXS does not attempt. Instead, these are coded in an ad-hoc basis using specific ICodeModifier
implementer plug-ins.
Standard Serialization Support to Easily Serialize to and From XML Strings
CodeXS provides a separate code file (Serializer.cs or .vb) that provides for common serialization support, using the System.Xml.Serialization.XmlSerializer
class. It provides basic serialization support to and from XML format string
s. The intent is that the Serializer
class can easily be modified to provide much more functionality, or even non-XML serialization, if desired.
Extensive VS/nDoc/MSDN Compliant Documentation Comments
CodeXS automatically generates VS/nDoc compliant documentation for every class and method that is generated. The intent is that these may be modified manually after code generation. Another possibility that is currently being developed is to add the schema annotations/comments as documentation, although many published schemas do not make use of these. Currently, CodeXS does not provide additional documentation for enumerated types.
Fully Compliant with ASP.NET Web-services
XSD tool generated code classes can be used as parameter types and return value types to ASP.NET web-service web-methods. The code generated by CodeXS provides exactly the same functionality.
- 11th September 2004: Version 0.50 ß: First release
License
This article has no explicit license attached to it, but may contain usage terms in the article text or the download files themselves. If in doubt, please contact the author via the discussion board below.
A list of licenses authors might use can be found here.