Introduction
This is an implementation of a subset of the XML classes provided in .NET. This may be useful since it does run faster than the .NET classes, although it has less features and has some bugs. If you know exactly what data you need from a document, this is a fast approach to pull it out. It is also useful if you don't have access to these .NET libraries for some reason such as if you are using silverlight or embedded devices, with limited libraries, XBox, windows phone 7 etc etc.
If you've never used the .NET classes XmlDocument
and XPathNavigator
, you will need to know the select syntax. If you know, then feel free to skip this section. If not, first read this page.
Essentially, Select
just lets you navigate through the trees of XML nodes. However, there are also attributes in XML such as ex.
<shape><triangle sid="1/></shape> .
The sid
is an attribute and to select this, the path would be:
ex1)
Select("shape/triangle")
Evaluate( "@sid" )
returns 1
ex2) (no attribute)
<car><wheel>spinning</wheel></car> :
Select = "car"
evaluate "wheel" // no @ needed
returns "spinning
"
OR the fastest way, but with no scope protection:
Just simply evaluate("@sid
") with no select at all will search out the first occurrence @sid
and MoveNext()
does what you'd think. This example is not the same as .NET and obviously increases performance massively but you can only do it when you know the document.
Scope
When I say Scope detection, I really mean limiting the evaluation to the node scope. The .NET classes have protection and will return String.Empty in cases where the scope is wrong. This class disables it so you can optimize your search without specifying any scope. In reality, it doesn't really disable it, inherent in the parsing algorithm is a key difference. This algorithm uses searches by simple indexof(). It does not compile and cache anything except two integers begin and end index. This approach uses almost no memory and is in every case faster but in cases with ambiguous scope it CAN return the wrong data. Therefore, you need to know the scope is unique in your selects.
Usage
Loading up some values from a document is similar to .NET:
public static void GetSingleValueFXmlDoc()
{
string xml = "<car value=\"1\"><wheel sid=\"1\" bolts=\"4\">
ddd</wheel><wheel sid=\"2\" bolts=\"4\">dddd</wheel></car>";
DateTime dt = DateTime.Now;
FXMLDocument doc = new FXMLDocument();
doc.LoadXml(xml);
doc.Select( "car/wheel" );
string sid = doc.GetValueString("@sid");
double df = doc.GetValueDouble("@bolts");
string diff = DateTime.Now.Subtract(dt).ToString();
Console.WriteLine("single values FXML Time\t\t={0}", diff);
}
It's basically the same as .NET but no navigator. Here, we load the document, select the path in syntax the same as xpath navigator and call get value. There is also an evaluate method but getvalue
just returns the type you want with a simple wrapper.
Reading all attributes in a selected node
for (int i = 0; i < count; ++i)
{
FXMLDocument doc = new FXMLDocument();
doc.LoadXml(xml);
doc.Select("car/wheel");
foreach (FXMLDocument x in doc)
{
string sid = x.GetValueString("@sid");
double df = x.GetValueDouble("@bolts");
}
}
How It Works
for (int i = 0; i < nodes.Length; i++)
225 {
226 string nodeStart = String.Empty;
227 string nodeEnd = String.Empty;
228
229 if ((nodes.Length > i + 1) && nodes[i + 1].IndexOf("@") == 0)
230 {
231 nodeStart = string.Format("{0}=\"", nodes[i + 1].Replace("@", ""));
232 nodeEnd = "\"";
233 isAttribute = true;
234 }
235 else if ((nodes.Length == 1) && nodes[i].IndexOf("@") == 0)
236 {
237 nodeStart = string.Format("{0}=\"", nodes[i].Replace("@", ""));
238 nodeEnd = "\"";
239 isAttribute = true;
240 }
241 else
242 {
243
244 nodeStart = string.Format("{0}{1}", "<", nodes[i]);
245 nodeEnd = string.Format("{0}{1}{2}", "</", nodes[i], ">");
246 isAttribute = false;
247 }
248
249 GetBetweenExcludeTokens(xmlDocument, nodeStart, nodeEnd, ref begin, out end);
250
251 if (isAttribute)
252 {
253 return;
254 }
255 int indexNextNode = (xmlDocument.IndexOf(">", begin)) + 1;
256 int indexofEquals = xmlDocument.IndexOf("=", begin);
257 if (indexofEquals != -1 && (indexofEquals < indexNextNode))
258 {
259 if (begin < indexNextNode)
260 {
261 begin += indexNextNode - begin;
262 }
263 else
264 {
265 begin += begin - indexNextNode;
266 }
267 }
268 else
269 {
270 if (xmlDocument[begin] == ('>'))
271 {
272 ++begin;
273 }
274 }
275 }
276 }
This is the main loop of the entire process. Basically, it just loops through all the nodes of the select
statement and gets the text between them and that's it. Of course, there is special handling for attributes which really makes what would have been an elegant system a real nightmare to handle. I can bet you that there were some arguments in the office at Microsoft the day some smart guy wanted attributes. In any case, GetBetweenExcludeTokens
is doing the meat of the entire operation, everything else is just syntax support.
Points of Interest
This runs only around 30% faster than the .NET classes which for most people are not going to be worth it since it is not fully compatible. However in example GetSingleValueFXmlDoc
, there is a secret to speed things up. Because we know the document contains only wheels->bolts ( bolts is unique), why bother selecting car/wheel, with this you can just call doc.Evaluate( @bolts )
without selecting anything which will boost performance by at least 100% over .NET classes. This was actually all it was intended for but, as I went I hit a lot of gotchas and this minimizes them slightly when you turn ValidateSyntax
on.
My first version was just a static
method which was probably better and I wish I saved it, however you can't MoveNext
very nicely in a static
implementation (although it's nice not to have to new it). You could modify getSubNodes
to be static
quite easily.