Introduction
While searching for how to speed up string sorting in .NET collections, I came across this article: Fast String Sort in C# and F#
I downloaded the StringSort class from the article, tested the speed, and found it sorts 50,000 strings in about 40 to 50% less time than the Generic IEnumerable
object sorting methods. This is when the compare method is used with the Ordinal option. Otherwise results are much worse for standard QuickSort string sorting.
So, to build on the previous project by
Stefan Savev 2, I have derived a new
sfList
class from
List<string>
.
Using the code
For your
List<string>
objects, you simply use sfList instead. The only thing that is different is the method it uses to sort strings.
using System;
using System.Collections.Generic;
using Sorting.CSharpStringSort;
namespace SortTests.Sorting
{
public class sfList : List<string>
{
public sfList() : base() { }
public sfList(int size) : base(size) { }
public sfList(IEnumerable<String> aArray) : base(aArray) { }
public new void Sort()
{
string[] a = this.ToArray();
this.Clear();
StringSort.Sort(ref a);
this.AddRange(a);
}
}
}
From my initial testing, I could see that it is faster to copy the
List
to a
string[]
, clear the list, sort the array using
StringSort.Sort
and refill the list with the result. This is what is done by replacing the
List.Sort
method with my call to
StringSort.Sort
. A very small amount of time is used to do the two extra copies compared to the sort time.
The StringSort
class Sort method has been changed to do an in-place Sort
on string[]
.
There is also a overloaded version of Sort that takes an unsorted List<string> parameter and returns a copy of a of a sorted list.
public static void Sort(string[] inout)
{
InPlaceSort(inout, 0, 0, inout.Length);
}
public static List<string> Sort(List<string> sList)
{
string[] aCopy = sList.ToArray();
InPlaceSort(aCopy, 0, 0, aCopy.Length);
return new List<string>(aCopy);
}
Other experimental overloads and unused private methods have been removed from my original posted version.
The first time SortTests.exe is ran, it generates a randomized list of values for sorting by using multiple reads of dic.txt. Random numbers are also appended to the original string values.
More Generic Collections
I have included a KeyedList<TKey, TValue>
Class that is derived from List<T>
.
I implemented the fast string sorting algorithm for this class' Sort method with good results.
The combined time of filling the KeyedList with values and then sorting it is much faster than adding the same values to a SortedList<string, string> collection, for example.
Here is the sort method that uses a Sort method overload to sort an array of KeyValuePair on string keys:
public new void Sort()
{
if (typeof(TKey) == typeof(String))
{
KeyValuePair<TKey, TValue>[] kvpArray = this.ToArray();
this.Clear();
this.Sort(kvpArray); this.AddRange(kvpArray);
if (!this.SortAscd)
{
this.Reverse();
}
}
else
{
this.Sort(CompareKey);
}
_IsSorted = true;
}
public void Sort(KeyValuePair<TKey, TValue>[] sList)
{
InPlaceSort(sList, 0, 0, sList.Length);
}
void InPlaceSort(KeyValuePair<TKey, TValue>[] input, int depth, int st, int ed)
{
}
For more on KeyedList, see my code sample here:
Derive a KeyedList<TKey, Tvalue> generic classStopwatch Tests
I improved that Stopwatch class in .NET. by deriving a new class from it call TimeCounter. Total Ticks are converted to milliseconds and rounded to the nearest 10 ms. This gives more consistent results. The practical resolution of Stopwatch is no more than 10 ms so I don't want to show a bunch of extra digits that don't add any real information. What interests me in these test is actually the relative differences among the different sort tests.
Sort test with 397540 elements
Milliseconds Relative Differences
#Array.Sort<string>(string[], CompareFun) 980 1.0000
#List<string>.Sort(CompareFun) 860 0.8776
*StringSort.Sort(List<string>) 550 0.5612
sfList.Sort() 540 0.5510
StringSort.Sort(string[]) 530 0.5408
Sedgewick.Sort(string[]) 450 0.4592
# Without CompareFun with Ordinal option this method is slower by about 3x
* Returns a new copy of List<string>
So from these elapsed ms tests, you can see that both Array.Sort()
and List.Sort()
(original Microsoft sort methods) are around 45% slower than StringSort. With my improved timing method, I found that Sedgewick is somewhat faster than StringSort (over 50% faster than standard Array.Sort
). Sorting strings using Microsoft quicksort implementation seems to have a large overhead. Replacisng List.Sort()
in a derived class or using the StringSort.Sort(List<string>)
overload yields a measurably quicker solution.
I also did tests on a 64-bit Windows 7 system to see if the relative difference results were different. These test results are in the download under SortTest.xsl.
History
First published July 4, 2012
Added Stopwatch tests results for Windows 7 64-bit system - July 7, 2012
Improved Stopwatch timing method and added relative differences calculation. Implemented a version of StringSort in derived KeyedList class - July 17, 2012