Introduction
In this short article, I will explain how to sort strings in the natural numeric order. The .NET System.IO.Directory
class returns file and directory names ordered in alphabetic order (Win32 API functions for dealing with files order the strings alphabetically), whereas Windows Explorer shows them in natural numeric order. The table below shows the difference between these two orderings:
Alphabetic sort |
Natural numeric sort |
DOS (CMD prompt) style |
Windows Explorer Style |
1.txt 10.txt 3.txt a10b1.txt a1b1.txt a2b1.txt a2b11.txt a2b2.txt b1.txt b10.txt b2.txt |
1.txt 3.txt 10.txt a1b1.txt a2b1.txt a2b2.txt a2b11.txt a10b1.txt b1.txt b2.txt b10.txt |
In the alphabetic order '3.txt' comes before '10.txt' whereas in the natural numeric order '10.txt' comes after '3.txt', which is what we would expect. Windows Explorer uses the natural numeric order for files.
Thanks to Richard Deeming (see comments) I now know that there is a similar function in shlwapi.dll, called StrCmpLogicalW
used by Windows Explorer that works only in XP. My current .NET implementation emulates StrCmpLogicalW
in pure C# code so it can be used with the .NET version in any version of Windows where .NET runs. My implementation is, however, not fully compatible with the one in Windows Explorer. There are some very slight differences which I will explain after the examples.
The class StringLogicalComparer
in my C# code emulates StrCmpLogicalW
, and NumericComparer
is a class implementing the System.Collections.IComparer
interface to be used to sort collections.
Using the code
The natural numeric order comparer for strings is defined in a class named NumericComparer : IComparer
and can be found in the source code of this article. I will give several examples here of how the NumericComparer
class can be used in code.
Example 1 - Ordering an Array of Strings
This example shows how to use the NumericComparator
class to order strings. I will suppose, the strings are in a string[]
array as shown next:
string[] files = System.IO.Directory.GetFiles();
NumericComparer ns = new NumericComparer();
Array.Sort(files, ns);
string[] dirs = System.IO.Directory.GetDirectories();
ns = new NumericComparer();
Array.Sort(dirs, ns);
While we can reuse the same NumericComparer
object instance more than once, it is more efficient to create a new object when the set of strings to order changes (I will explain the implementation of NumericComparer
later, after the examples).
Example 2 - Ordering Items in a ListView Control
There are several ways to order elements in a ListView
control. I will show only how to order the elements responding to a column head click. To define a generic custom way to order rows of a ListView
control, when we click the ListView
headers, we need to respond to the ColumnClick
event and set the ListViewItemSorter
property of the ListView
control to a class that implements IComparer
. The custom ListComparer
comparer will usually take other arguments in the constructor, e.g. the column header being clicked that will serve as the index for sorting the ListView
elements, as demonstrated by the code snippet below:
private void lstFiles_ColumnClick(object sender,
System.Windows.Forms.ColumnClickEventArgs e)
{
...
ListComparer lc = new ListComparer(e.Column, ...);
lstFiles.ListViewItemSorter = lc;
...
}
Inside the custom ListComparer
, we will have code to order the ListView
elements depending on the clicked column header. If we suppose the first column (index 0) contains the strings that need to be ordered in the natural numeric order, then we can use the StringLogicalComparer
class that comes with NumericComparer
directly inside ListComparer
as follows (the code is simplified and has no error checking):
internal class ListComparer : IComparer
{
...
public int Compare(object x, object y)
{
ListViewItem lx = (ListViewItem)x;
ListViewItem ly = (ListViewItem)y;
switch(column)
{
case 0:
int c = StringLogicalComparer.Compare(lx.SubItems[0].Text,
ly.SubItems[0].Text);
...
return c;
...
}
}
}
The strings (file names) in the first column of the ListView
control will now be in the natural numeric order.
Example 3 - Ordering Dictionary Entries
Sometimes we need to associate a data object with a key string and we may need to order the string keys and then access the data objects ordered by the keys.
If the keys are unique, we can use a Hashtable
to keep them and the associated data objects. .NET offers the possibility to order the keys of a Hashtable
and then retrieve the data objects with it:
Hashtable hash = new Hashtable();
...
hash.Add(key, data);
...
NumericComparer nc = new NumericComparer();
SortedList list = new SortedList(hash, nc);
foreach(DictionaryEntry de in list)
{
...
}
A more interesting situation arises when the keys are not unique, that is when we can have different data objects that map to the same key. In this case, we have two choices.
- We can keep the data objects as
ArrayList
s associated with the string keys in a Hashtable
. The order of data objects inside the ArrayList
s does not matter because they have the same key.
- We can build a simple data structure to keep the key and the value data objects or we can use a
System.Collections.DictionaryEntry
structure. We can now store our data as DictionaryEntry
elements of an ArrayList
: ArrayList list = new ArrayList();
...
list.Add(new DictionaryEntry(fileName, data));
To order the elements in this case, we need also to create a custom (generic) comparer:
public class DictionaryEntryComparer : IComparer
{
private IComparer nc = null;
public DictionaryEntryComparer(IComparer nc)
{
if(nc == null) throw new Exception("null IComparer");
this.nc = nc;
}
public int Compare(object x, object y)
{
if((x is DictionaryEntry) && (y is DictionaryEntry))
{
return nc.Compare(((DictionaryEntry)x).Key,
((DictionaryEntry)y).Key);
}
return -1;
}
}
We can now order the items of list
according to the keys, in the natural numeric order, using:
list.Sort(new DictionaryEntryComparer(new
NumericComparer()));
Of course, we can use any other IComparer
with the DictionaryEntryComparer
class we created.
Points of Interest
The complete code can be found in files StringLogicalComparer.cs and NumericComparer.cs in the source code files of this article.
A small difference with Windows Explorer
There is currently a small difference with Windows Explorer. My code will order files that start with special characters based on the code table order. Windows Explorer uses another order. For example:
Windows Explorer: |
(1.txt, [1.txt, _1.txt, =1.txt |
My code: |
(1.txt, =1.txt, [1.txt, _1.txt |
I cannot think of any reason why one order can be better than the other. So I see no reason why my code should emulate this specific order. My code uses the current profile to find the order of chars in the code page. Note that, the difference exists only in the first character. If such a special character is inside the file name, Windows Explorer gives the same order as my code.
A practical implication of this special behavior for the first character is that both StrCmpLogicalW
and my code work better with file names not with full paths. Use code similar to Example 3 to order file names and keep the directory information.
Implementation
If you are a developer and want just to use the code, then there is no need to read beyond this point. If you are a student in an introductory computer science course, then the following could be interesting.
There are several ways to order strings in the numeric natural order. The problem is when a list of N items is sorted using quick sort then the Compare
function will be called more than N times which means that it would be nice to optimize the implementation if any.
The first version of my code used another implementation (see below). I thought the code was nicely optimized and the results of computation were cached. However, the comments of Richard Deeming below, made me wonder if I had it right. In the beginning I thought that the problem was with RegEx, and the Hashtable
may be because of deadline :), and even Richard did not do better :) to guess it right (see comments). To understand the problem with my first solution, I will list several possible implementations very shortly:
- A simple technique is to use padding with a special character �/�. This character has several nice properties. It is not used in file paths in Windows and its ASCII code is smaller than the one for digits. It can be used to pad the numeric parts of two strings so that they have the same length. Example: a10.txt and a1.txt will become a10.txt and a/1.txt. Then the alphabetical order can be used and a/1.txt will be smaller than a10.txt. Finally the '/' padding needs to be removed. This method works, but has some serious limitations. The '/' can be used only for file paths in Windows. If the strings contain �/�, this method will not work. This method requires also too many passes over the string and cannot be implemented with fixed char arrays. The method, however, treats numbers somehow uniformly and is different from the rest so it is interesting per se.
- The two strings to be compared are split into lists with alphabetical and numeric parts, the parts are then compared one by one. One optimization of this technique would be to remember the split in parts (cache it in a
Hashtable
). Numeric parts can be converted to numbers and compared. The number conversions can also be cached. My first implementation used this technique.
The implementation is, however, slower than StrCmpLogicalW
despite the caching (it would be even slower without dynamic programming). As I read the Richard Deeming comments about the speed of code compared to StrCmpLogicalW
, I did not understand the problem at first. The technique is, however, na�ve for two reasons. First, it does eager evaluation. The split of the strings is complete and so is the numeric conversion. When two strings are compared the comparison will be often interrupted before all parts are needed. So the eager evaluation consumes a lot of time. The second problem is that numeric parts are explicitly converted to numbers (long
). This not only consumes time, it also is an error-prone method because numeric parts that are longer than a long
number will throw an exception. (So if you used the previous code it is time to replace it with the new implementation.)
- One solution to the problems above is that numeric parts should be compared as special strings not as numbers. Second, using lazy evaluation would also remove the cost of over splitting. The lazy evaluation code for splitting can, however, be complicated.
- The full splitting is, however, rarely needed so we can be optimistic and avoid caching. This is similar to using
StrCmpLogicalW
. The current implementation of StringLogicalComparer
only parses the two strings at the same time and stops parsing at the moment the result of the comparison in known. The technique is also very fast (I hope Richard will test the code again) because it works using fixed-size char arrays. The only look-ahead is to find the end index of the current numerical parts in both strings.
History
- 03 August 2005
- New version. Several bugs corrected.
- 02 August 2005
- New version. Replaced the old one.
- 15 July 2005
- Modified
NumericComparer
to allow cache size initialization.
- Modified the article to show a Hashlist example and use
DictionaryEntry
.
- Several minor article corrections.