Introduction
One of my sites allows people to post their short stories, and I'd always been bothered by the fact that when the titles were sorted alphabetically, I would have a long list under T, where all the titles beginning with "The" would congregate.
Traditional libraries use a sort order called the grammatic order, where titles are categorized based on the first significant word. In theory, this word could be any one in the title, but in practice nowadays, it'll usually be the first word that isn't an article (i.e. A/An or The). So instead of this plain alphabetical list:
- A Tale of Two Cities
- The Bostonians
- The Importance of Being Earnest
- War and Peace
The titles should be sorted as follows:
- The Bostonians
- The Importance of Being Earnest
- A Tale of Two Cities
- War and Peace
(In fact, to reduce confusion, the titles could be listed as, for example, "Tale of Two Cities, A", but I'll just concentrate on sorting in this particular article.)
When I was looking into methods to achieve this on my web site, I soon discovered that the XSL <xsl:sort>
element was the solution to my problem, as it can be used to create a completely custom sort order for a set of XML data. Similar solutions to this one can be found elsewhere on the web (here for example), but they don't often include explanations of how the methods work. The rest of this article explains a simple way to build up a select
statement to do this type of title sorting.
Customising xsl:sort
In the examples below, I will assume that we are working with an XML source that has the following structure (a complete XML file and the corresponding stylesheet are included in the source zip file):
<Stories>
<Story>
<Title>War and Peace</Title>
</Story>
<Story>
<Title>The Bostonians</Title>
</Story>
...
</Stories>
In order to sort these elements by the complete title, you would simply use code like this:
<xsl:for-each select="Story">
<xsl:sort select="Title"/>
<xsl:value-of select="Title"/>
</xsl:for-each>
When an xsl:sort
element is present, the XSLT processor determines the sort order by evaluating the result of its select
statement for each element to be sorted. In this case, as it iterates through the Story elements, the processor will check the value of Title and work out where that particular element should go in the sorted list. The result will be a straightforward alphabetical list similar to the first list in the Introduction.
However, we actually want some of the titles to be sorted according to the second word, not the first, and in those cases, we need the processor to evaluate the Title string starting with the first space. The substring-after
function is ideal for this purpose:
<xsl:sort select="substring-after(Title, ' ')"/>
The result of substring-after(Title, ' ')
will be everything after the first space in the Title element. As it iterates through the elements, the XSLT processor will be sorting this set:
- and Peace
- Bostonians
- Importance of Being Earnest
- Tale of Two Cities
Unfortunately, while this will indeed put The Bostonians under "B", it will also put War and Peace under "A". What's more, titles which do not contain any spaces end up unsorted at the beginning of the list, because the function doesn't return anything at all in that case, and the processor therefore doesn't include them in the sorted list.
We need to be more specific about which titles need to be sorted by their second word. Fortunately, the substring
function can help with this. The result of the following function will be everything after the first space, but only if the Title element starts with "The " (we need to include the space after "The" so that titles like "Thesaurus" aren't included as well):
substring(substring-after(Title, ' '), 0 div starts-with(Title, 'The '))
The second parameter of the substring
function normally takes a number determining where the substring should start, but has the added benefit that if an invalid number is given, the function returns nothing at all. Here it is combined with a starts-with
function, which returns a boolean true
if the string starts with "The " and false
if not. true
and false
evaluate to 1 and 0 when converted to numbers.
In this case, dividing 0 by the boolean value returned by the starts-with
function toggles the value between 0 (0 div 1) and NaN (0 div 0 - not a number). If the value is 0, the function returns the title minus its leading word, so it will be sorted by the second word as described above. If the value is NaN, however, the function returns nothing, so the title isn't sorted and appears in the same position as in the XML document.
At this point, we can sort the titles beginning with "The " correctly, so we now need to sort the other titles as well. This is done using a similar substring
function:
substring(Title, 0 div not(starts-with(Title, 'The ')))
In this case, the first parameter is simply Title, since we do want these titles to be sorted by the whole value of the element. The second parameter relies on the same evaluation as above to produce 0 or NaN values depending on whether the title doesn't start with "The ".
We now have functions to sort ordinary titles alphabetically and titles starting with The by their second word. The next step is to put them together. Since the two substring
functions are mutually exclusive, we can use the concat
function to stick them together:
<xsl:sort select="concat(substring(substring-after(Title, ' '),
0 div starts-with(Title, 'The ')),
substring(Title, 0 div not(starts-with(Title, 'The '))))"/>
The result is a list in which the titles are sorted correctly. This functionality can be extended further by simply adding extra criteria to the substring
functions as in the following example:
<xsl:sort select="concat(substring(substring-after(Title, ' '),
0 div boolean(starts-with(Title, 'A ') or starts-with(Title, 'An ')
or starts-with(Title, 'The '))),
substring(Title, 0 div not(starts-with(Title, 'A ')
or starts-with(Title, 'An ') or starts-with(Title, 'The '))))"/>
Points of interest
While it can be cumbersome and rather verbose at times, the XSL language makes up for it by being incredibly powerful. While designing my website, I was able to achieve things which would have been extremely difficult with just straight ASP and ADO. I hope this example has been a useful introduction to custom sorting.
History
- January, 2006 - First version.