Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

Fast List<String> Sort in C#

0.00/5 (No votes)
23 Aug 2012 2  
How to get faster sorting in List(T) string collections

Introduction

While searching for how to speed up string sorting in .NET collections, I came across this article: Fast String Sort in C# and F# 

I downloaded the StringSort class from the article, tested the speed, and found it sorts 50,000 strings in about 40 to 50% less time than the Generic IEnumerable object sorting methods.  This is when the compare method is used with the Ordinal option. Otherwise results are much worse for standard QuickSort string sorting.

So, to build on the previous project by Stefan Savev 2, I have derived a new sfList class from List<string>.

Using the code 

For your List<string> objects, you simply use sfList instead. The only thing that is different is the method it uses to sort strings.
using System;
using System.Collections.Generic;
using Sorting.CSharpStringSort;

namespace SortTests.Sorting
{
  public class sfList : List<string>
  {
    public sfList() : base() { }

    public sfList(int size) : base(size) { }

    public sfList(IEnumerable<String> aArray) : base(aArray) { }

    public new void Sort()
    {
      //StringSort.Sort(this);
      string[] a = this.ToArray();
      this.Clear();
      //sort array and refill contents of this (faster than directly sorting List)
      StringSort.Sort(ref a);
      this.AddRange(a);
    }
  }
}

From my initial testing, I could see that it is faster to copy the List to a string[], clear the list, sort the array using StringSort.Sort and refill the list with the result. This is what is done by replacing the List.Sort method with my call to StringSort.Sort. A very small amount of time is used to do the two extra copies compared to the sort time.

The StringSort class Sort method has been changed to do an in-place Sort on string[].

There is also a overloaded version of Sort that takes an unsorted List<string> parameter and returns a copy of a of a sorted list.

public static void Sort(string[] inout)
{
    InPlaceSort(inout, 0, 0, inout.Length);
}
public static List<string> Sort(List<string> sList)
{
  string[] aCopy = sList.ToArray();
  InPlaceSort(aCopy, 0, 0, aCopy.Length);
  return new List<string>(aCopy);
}

Other experimental overloads and unused private methods have been removed from my original posted version.

The first time SortTests.exe is ran, it generates a randomized list of values for sorting by using multiple reads of dic.txt. Random numbers are also appended to the original string values.

More Generic Collections

I have included a KeyedList<TKey, TValue> Class that is derived from List<T>.

I implemented the fast string sorting algorithm for this class' Sort method with good results.

The combined time of filling the KeyedList with values and then sorting it is much faster than adding the same values to a SortedList<string, string> collection, for example.

Here is the sort method that uses a Sort method overload to sort an array of KeyValuePair on string keys: 

    /// <summary>
    /// Use for default sorting on int or string keys
    /// </summary>
    public new void Sort()
    {
      if (typeof(TKey) == typeof(String))
      {
        KeyValuePair<TKey, TValue>[] kvpArray = this.ToArray();
        this.Clear();
        this.Sort(kvpArray); //sort String key type
        this.AddRange(kvpArray);
        if (!this.SortAscd)
        {
          this.Reverse();
        }
      }
      else
      {
        this.Sort(CompareKey);
      }
      //this.Sort(CompareKey);
      _IsSorted = true;
    }
public void Sort(KeyValuePair<TKey, TValue>[] sList)
{
  InPlaceSort(sList, 0, 0, sList.Length);
}

void InPlaceSort(KeyValuePair<TKey, TValue>[] input, int depth, int st, int ed)
{
   //this is all the same as StringSort except that input[indx].Key.ToString() is the sort key
}


For more on KeyedList, see my code sample here: Derive a KeyedList<TKey, Tvalue> generic class

Stopwatch Tests 

I improved that Stopwatch class in .NET. by deriving a new class from it call TimeCounter. Total Ticks are converted to milliseconds and rounded to the nearest 10 ms. This gives more consistent results. The practical resolution of Stopwatch is no more than 10 ms so I don't want to show a bunch of extra digits that don't add any real information. What interests me in these test is actually the relative differences among the different sort tests.

Sort test with 397540 elements		
                                          Milliseconds	  Relative Differences
#Array.Sort<string>(string[], CompareFun)  980            1.0000
#List<string>.Sort(CompareFun)	           860	          0.8776
*StringSort.Sort(List<string>)	           550	          0.5612
sfList.Sort()	                           540	          0.5510
StringSort.Sort(string[])	           530	          0.5408
Sedgewick.Sort(string[])	           450	          0.4592

# Without CompareFun with Ordinal option this method is slower by about 3x		
* Returns a new copy of List<string>		


 

So from these elapsed ms tests, you can see that both Array.Sort() and List.Sort() (original Microsoft sort methods) are around 45% slower than StringSort. With my improved timing method, I found that Sedgewick is somewhat faster than StringSort (over 50% faster than standard Array.Sort). Sorting strings using Microsoft quicksort implementation seems to have a large overhead. Replacisng List.Sort() in a derived class or using the StringSort.Sort(List<string>) overload yields a measurably quicker solution.

I also did tests on a 64-bit Windows 7 system to see if the relative difference results were different. These test results are in the download under SortTest.xsl.

History

First published July 4, 2012

Added Stopwatch tests results for Windows 7 64-bit system - July 7, 2012

Improved Stopwatch timing method and added relative differences calculation. Implemented a version of StringSort in derived KeyedList class - July 17, 2012

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here