Background
For all it's verbosity, slow-parsing and ambiguous data layout ... XML is still one of just a few openly enforceable data transport mechanisms we have in our coders tool chest. What I'm talking about is XSD or XML Schema. Schema interpreting has traditionally been a black art. I wrote an article on using MS Schema Object Model (SOM). After I finished this article, I immediately realized the SOM has many foibles, it's an exceedingly weakly typed and awkward data model. The pattern; "Get, is type, cast, use" repeats over and over and introduces misinterpretation bugs.
This article is a more modern, far more robust rework of that first attempt using a strongly-typed more general approach.
Schema-Schema
XmlSchema.xsd defines the format for schema... schema schema. LinqToXsd is as a powerful Xsd to linq-friendly code generator. XmlSchema.xsd is a little special as core definitions are handled by including the much older file definition language DTD (XMLSchema.dtd & datatypes.dtd).
C:\projects>linqtoxsd.exe xmlschema.xsd
[Microsoft (R) .NET Framework, Version v4.0.30319]
Generated xmlschema.cs...
See XmlSchema.cs in source zip
Parsing
Parsing is not quite as trivial as "just loading", you need to assemble the source xsd you are reading for yourself. That means recursively including the subsequent xsd's and building their object maps.
public static schema Load(IEnumerable<string> files, IncludedFileResolver resolver)
{
schema mtr = new schema();
foreach(var fil in files)
{
var sch = schema.Load(fil);
Merge(mtr, sch);
var incs = sch.import.Select(q => q.schemaLocation).ToList();
incs.AddRange(sch.include.Select(s=>s.schemaLocation));
var resolved = incs.Select(inc => resolver(fil, inc))
.Where(q=>null!=q);
if (resolved.Count() > 0)
Merge(mtr, Load(resolved));
}
return mtr;
}
Storing the Objects
As we walk through the schema, it's important that we merge object references as we go.
private static void Merge(schema mstr, schema src)
{
mstr.element = MergeList<element>(mstr.element, src.element);
mstr.attribute = MergeList<attribute>(mstr.attribute, src.attribute);
mstr.complexType = MergeList<complexType>(mstr.complexType, src.complexType);
mstr.simpleType = MergeList<simpleType>(mstr.simpleType, src.simpleType);
mstr.include = MergeList<include>(mstr.include, src.include);
mstr.import = MergeList<import>(mstr.import, src.import);
}
private static IList<T> MergeList<T>(IList<T> dest, IList<T> src)
{
if (null == dest)
return src;
if (null != src && src.Count() > 0)
src.Where(q=>!dest.Contains(q)).ToList().ForEach(s => dest.Add(s));
return dest;
}
And through the magic of LinqToXsd, that's it for parsing.
Interpretation; but what does it mean?!?
What XML Schema means is non-trivial. Rather than taking a stab at describing it, I'l reference better sources:
Doing Something Useful
We've loaded all the objects, mapped them, found useful purposes for them and now let's re-interpret them into something different, yet still meaningful. In this case, I created a "new" language I call "SKA" (comes from SKemA). It's just a more natural c-like re-interpretation of Xsd for demonstration.
Some of the basic rules in English:
- A symbol on it's own line is the root element.
- "Type [name] {" is a complex type definition. Same of Choice, Enum, etc...
- Simple types are expanded out into their constituent base element.
- Elements are default typed to a complex type with the same name + "Info"
Type CubInfo {
!Id
!First
!Last
@Place
}
Type GroupInfo {
Cub [0-n]
@Name
}
Type RacersInfo {
Group [0-n]
}
Type ResultInfo {
!CubId
!Time
}
Type RaceInfo {
Result [0-n]
}
Type RacesInfo {
Race [0-n]
}
Type DerbyInfo {
Racers
Races
}
Derby
The language trades readability and succinctness for self-afined xml formatting and the robustness of the formal XmlSchema standard. Try loading some more complex examples. The IRS 1040 MeF Schema was my hard-core test schema and it is quite interesting to see in Ska.
If you run the example program provided, it will produce this output.
Summary
Like my last article on schemas this is really just a starting point for many XSD related capabilities. Code generators, translators or validaters.