Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / XML

Parsing and Interpreting XSD Using LINQ

5.00/5 (6 votes)
6 May 2012CPOL2 min read 43K   1K  
How to use a Linq2Xsd generated object to directly manipulate XmlSchema

Background

For all it's verbosity, slow-parsing and ambiguous data layout ... XML is still one of just a few openly enforceable data transport mechanisms we have in our coders tool chest. What I'm talking about is XSD or XML Schema. Schema interpreting has traditionally been a black art. I wrote an article on using MS Schema Object Model (SOM). After I finished this article, I immediately realized the SOM has many foibles, it's an exceedingly weakly typed and awkward data model. The pattern; "Get, is type, cast, use" repeats over and over and introduces misinterpretation bugs.

This article is a more modern, far more robust rework of that first attempt using a strongly-typed more general approach.

Schema-Schema

XmlSchema.xsd defines the format for schema... schema schema. LinqToXsd is as a powerful Xsd to linq-friendly code generator. XmlSchema.xsd is a little special as core definitions are handled by including the much older file definition language DTD (XMLSchema.dtd & datatypes.dtd).

C:\projects>linqtoxsd.exe xmlschema.xsd
[Microsoft (R) .NET Framework, Version v4.0.30319]
Generated xmlschema.cs... 

See XmlSchema.cs in source zip

Parsing

Parsing is not quite as trivial as "just loading", you need to assemble the source xsd you are reading for yourself. That means recursively including the subsequent xsd's and building their object maps.

C#
/// <summary>
///  Load a schema using the include file resolver, so you can find any resource
/// </summary>
/// <param name="files"></param>
/// <param name="resolver"></param>
/// <returns></returns>
public static schema Load(IEnumerable<string> files, IncludedFileResolver resolver)
{
    schema mtr = new schema();
    foreach(var fil in files)
    {
	var sch = schema.Load(fil);
	Merge(mtr, sch);

	// Combine import & include
	var incs = sch.import.Select(q => q.schemaLocation).ToList();
	incs.AddRange(sch.include.Select(s=>s.schemaLocation));

	var resolved = incs.Select(inc => resolver(fil, inc))
	    .Where(q=>null!=q);

	if (resolved.Count() > 0)
	    Merge(mtr, Load(resolved));
    }

    return mtr;
}

Storing the Objects

As we walk through the schema, it's important that we merge object references as we go.

/// <summary>
///  combines multiple schemas into one large schema object
/// </summary>
/// <param name="mstr"></param>
/// <param name="src"></param>
private static void Merge(schema mstr, schema src)
{
    mstr.element = MergeList<element>(mstr.element, src.element);
    mstr.attribute = MergeList<attribute>(mstr.attribute, src.attribute);
    mstr.complexType = MergeList<complexType>(mstr.complexType, src.complexType);
    mstr.simpleType = MergeList<simpleType>(mstr.simpleType, src.simpleType);
    mstr.include = MergeList<include>(mstr.include, src.include);
    mstr.import = MergeList<import>(mstr.import, src.import);
}

/// <summary>
///  Generic list of stuff merge mechanism.  Simple ref copy if null, otherwise append
///   ignores duplicate.
/// </summary>
/// <typeparam name="T"></typeparam>
/// <param name="dest"></param>
/// <param name="src"></param>
/// <returns></returns>
private static IList<T> MergeList<T>(IList<T> dest, IList<T> src)
{
    if (null == dest)
	return src;

    if (null != src && src.Count() > 0)
	src.Where(q=>!dest.Contains(q)).ToList().ForEach(s => dest.Add(s));

    return dest;
}

And through the magic of LinqToXsd, that's it for parsing.

Interpretation; but what does it mean?!?

What XML Schema means is non-trivial. Rather than taking a stab at describing it, I'l reference better sources:

Doing Something Useful

We've loaded all the objects, mapped them, found useful purposes for them and now let's re-interpret them into something different, yet still meaningful. In this case, I created a "new" language I call "SKA" (comes from SKemA). It's just a more natural c-like re-interpretation of Xsd for demonstration.

Some of the basic rules in English:

  • A symbol on it's own line is the root element.
  • "Type [name] {" is a complex type definition. Same of Choice, Enum, etc...
  • Simple types are expanded out into their constituent base element.
  • Elements are default typed to a complex type with the same name + "Info"
C#
//
// Ska(c) 2011, Bruce Meacham - An intuitive Xml Schema language
// DO NOT EDIT - This is file was generated on 5/4/2012 6:51:18 AM by ska.exe
//


Type CubInfo {
	!Id 
	!First 
	!Last 
	@Place 
	}

Type GroupInfo {
	Cub [0-n]
	@Name 
	}

Type RacersInfo {
	Group [0-n]
	}

Type ResultInfo {
	!CubId 
	!Time 
	}

Type RaceInfo {
	Result [0-n]
	}

Type RacesInfo {
	Race [0-n]
	}

Type DerbyInfo {
	Racers 
	Races 
	}

Derby

The language trades readability and succinctness for self-afined xml formatting and the robustness of the formal XmlSchema standard. Try loading some more complex examples. The IRS 1040 MeF Schema was my hard-core test schema and it is quite interesting to see in Ska.

If you run the example program provided, it will produce this output.

Summary

Like my last article on schemas this is really just a starting point for many XSD related capabilities. Code generators, translators or validaters.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)