Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

Sorting Book Titles in XSLT

0.00/5 (No votes)
28 Jan 2006 1  
An example of using XSL to achieve a custom sort order - in this case, sorting titles by the first word that isn't an article.

Introduction

One of my sites allows people to post their short stories, and I'd always been bothered by the fact that when the titles were sorted alphabetically, I would have a long list under T, where all the titles beginning with "The" would congregate.

Traditional libraries use a sort order called the grammatic order, where titles are categorized based on the first significant word. In theory, this word could be any one in the title, but in practice nowadays, it'll usually be the first word that isn't an article (i.e. A/An or The). So instead of this plain alphabetical list:

  • A Tale of Two Cities
  • The Bostonians
  • The Importance of Being Earnest
  • War and Peace

The titles should be sorted as follows:

  • The Bostonians
  • The Importance of Being Earnest
  • A Tale of Two Cities
  • War and Peace

(In fact, to reduce confusion, the titles could be listed as, for example, "Tale of Two Cities, A", but I'll just concentrate on sorting in this particular article.)

When I was looking into methods to achieve this on my web site, I soon discovered that the XSL <xsl:sort> element was the solution to my problem, as it can be used to create a completely custom sort order for a set of XML data. Similar solutions to this one can be found elsewhere on the web (here for example), but they don't often include explanations of how the methods work. The rest of this article explains a simple way to build up a select statement to do this type of title sorting.

Customising xsl:sort

In the examples below, I will assume that we are working with an XML source that has the following structure (a complete XML file and the corresponding stylesheet are included in the source zip file):

<Stories>
  <Story>
    <Title>War and Peace</Title>
  </Story>
  <Story>
    <Title>The Bostonians</Title>
  </Story>
...
</Stories>

In order to sort these elements by the complete title, you would simply use code like this:

<xsl:for-each select="Story">
  <xsl:sort select="Title"/>
    <xsl:value-of select="Title"/>
</xsl:for-each>

When an xsl:sort element is present, the XSLT processor determines the sort order by evaluating the result of its select statement for each element to be sorted. In this case, as it iterates through the Story elements, the processor will check the value of Title and work out where that particular element should go in the sorted list. The result will be a straightforward alphabetical list similar to the first list in the Introduction.

However, we actually want some of the titles to be sorted according to the second word, not the first, and in those cases, we need the processor to evaluate the Title string starting with the first space. The substring-after function is ideal for this purpose:

<xsl:sort select="substring-after(Title, ' ')"/>

The result of substring-after(Title, ' ') will be everything after the first space in the Title element. As it iterates through the elements, the XSLT processor will be sorting this set:

  • and Peace
  • Bostonians
  • Importance of Being Earnest
  • Tale of Two Cities

Unfortunately, while this will indeed put The Bostonians under "B", it will also put War and Peace under "A". What's more, titles which do not contain any spaces end up unsorted at the beginning of the list, because the function doesn't return anything at all in that case, and the processor therefore doesn't include them in the sorted list.

We need to be more specific about which titles need to be sorted by their second word. Fortunately, the substring function can help with this. The result of the following function will be everything after the first space, but only if the Title element starts with "The " (we need to include the space after "The" so that titles like "Thesaurus" aren't included as well):

substring(substring-after(Title, ' '), 0 div starts-with(Title, 'The '))

The second parameter of the substring function normally takes a number determining where the substring should start, but has the added benefit that if an invalid number is given, the function returns nothing at all. Here it is combined with a starts-with function, which returns a boolean true if the string starts with "The " and false if not. true and false evaluate to 1 and 0 when converted to numbers.

In this case, dividing 0 by the boolean value returned by the starts-with function toggles the value between 0 (0 div 1) and NaN (0 div 0 - not a number). If the value is 0, the function returns the title minus its leading word, so it will be sorted by the second word as described above. If the value is NaN, however, the function returns nothing, so the title isn't sorted and appears in the same position as in the XML document.

At this point, we can sort the titles beginning with "The " correctly, so we now need to sort the other titles as well. This is done using a similar substring function:

substring(Title, 0 div not(starts-with(Title, 'The ')))

In this case, the first parameter is simply Title, since we do want these titles to be sorted by the whole value of the element. The second parameter relies on the same evaluation as above to produce 0 or NaN values depending on whether the title doesn't start with "The ".

We now have functions to sort ordinary titles alphabetically and titles starting with The by their second word. The next step is to put them together. Since the two substring functions are mutually exclusive, we can use the concat function to stick them together:

<xsl:sort select="concat(substring(substring-after(Title, ' '), 
       0 div starts-with(Title, 'The ')),
       substring(Title, 0 div not(starts-with(Title, 'The '))))"/>

The result is a list in which the titles are sorted correctly. This functionality can be extended further by simply adding extra criteria to the substring functions as in the following example:

<xsl:sort select="concat(substring(substring-after(Title, ' '), 
  0 div boolean(starts-with(Title, 'A ') or starts-with(Title, 'An ') 
  or starts-with(Title, 'The '))),
  substring(Title, 0 div not(starts-with(Title, 'A ') 
  or starts-with(Title, 'An ') or starts-with(Title, 'The '))))"/>

Points of interest

While it can be cumbersome and rather verbose at times, the XSL language makes up for it by being incredibly powerful. While designing my website, I was able to achieve things which would have been extremely difficult with just straight ASP and ADO. I hope this example has been a useful introduction to custom sorting.

History

  • January, 2006 - First version.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here