(untagged)

SVG Polygons

Samuel Cragg

0.00/5 (No votes)

24 Jan 2012

What could be taking so long when parsing an SVG file?

I was recently parsing a 43.3MB SVG file that was filled with polygons and it was taking a long time (around 3.6 seconds). I figured I was I/O bound and, for kicks, decided to see how long it took to simply count the XML nodes in the file using the System.Xml.XmlReader. It completed in only half a second, seven times faster. I wasn’t I/O bound but CPU bound. What could be taking so long?

Turns out it was the parsing of the points that took so long, in particular converting text to numbers. I was using float.TryParse, specifying InvariantCulture, so decided to try writing my own based on the specification. It was well worth the effort – the time to parse the file was now down to 1.7 seconds, well under half the time of the original method.

The point of this post isn’t to complain about the performance of the TryParse method, as it can handle a variety of inputs and my parser is specialized, but rather what I found out when reading the specification. Take a look at the following polygons – they all draw the same triangle:

<polygon points="0, 0  -10, 0  -5, -10" />
<polygon points="0,0 -10,0 -5,-10" />
<polygon points="0,0,-10,0,-5,-10" />
<polygon points="0 0 -10 0 -5 -10" />
<polygon points="0 0-10 0-5-10" />
<polygon points="0-0-10-0-5-10" />

I was quite surprised that you can join the negative numbers together like that, but it works. The format is particularly adept for the C runtime function strtod that will parse as much of the input as it can, returning the parsed value plus how much of the string was consumed. Unfortunately, there isn’t a similar function for .NET – you can only parse the whole string.

Here’s a quick implementation in C++ (note that it’s not very idiomatic C++ as it doesn’t use iterators but it will make converting it to C# easier).

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here