Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / desktop / WPF

Dynamite: High Performace Dynamic Sorting Using Expressions

4.89/5 (19 votes)
25 Sep 2008CPOL9 min read 1   1K  
Easy-to-use and high performance dynamic sorting of most type of sequences with SQL-like syntax, developed using System.Linq.Expression classes.

Introduction

The last few weeks, I have been exploring the System.Linq.Expression namespace (new in .NET 3.5) to see how it can be used to build code on-the-fly. The result is a small utility library that can be used to perform dynamic sorting of various common collection structures such as arrays, lists, DataTables, and enumerations. Dynamic sorting, in this context, means that the order can be specified at run-time using a string expression with comma-separated field or property names (same syntax as the SQL ORDER BY). Thanks to extension methods, this can be achieved very easily, as shown in the following code snippet:

C#
Person[] persons;
// ...
persons.Sort("Name.Length, BirthDate DESC");

IEnumerable<Person> sortedPersons = 
  persons.OrderBy("Sex ASCENDING , Name DESCENDING ");

Dynamic sorting can be very useful in situations where ordering is specified by the user or in a configuration file. The article "Generic List Sort Function" originally inspired me to look at sorting, and to find a quicker way to do it. Several others have presented other solutions (see the References section). The following list summarizes the features of the solution presented here:

  1. Easy to use: New overloads of existing extension methods, Sort and OrderBy methods.
  2. High performance, thanks to dynamically compiled comparison expressions.
  3. Support for arrays, List<T>, IEnumerable<T>, IQueryable<T>, and DataSet.s
  4. Supports all public fields and properties, including Nullables.
  5. Supports both reference (class) typed and value typed (structs) properties/fields.
  6. Support for nested property expressions, such as Mother.Name.Length, with proper handling of null values.
  7. Case-insensitive string comparison.
  8. Open design: Can build multi-field Comparison<T> delegates, IComparer<T> and non-generic IComparer to be used in other comparison and sorting scenarios.

Background

Expressions

The classes in the System.Linq.Expression namespace can be used to dynamically build functions (lambda expressions) from primitives arranged in an expression tree that can be interpreted and translated by a consumer (e.g., as in LINQ to SQL), or be compiled to executable function delegates that can be executed with high performance. If we know at compile time which delegate we want, we can let the compiler build the expression tree like this:

C#
Expression<Comparison<String>> comparisonExpr = 
   (String s1,String s2) => String.Compare(s1,s2);

This is equivalent to creating an expression explicitly:

C#
// Create ParameterExpression representing the parameters of the lambda 
// function to create
ParameterExpression s1 = Expression.Parameter(typeof(String), "s1");
ParameterExpression s2 = Expression.Parameter(typeof(String), "s2");

// Build the body of lambda expression.
Expression body = Expression.Call(typeof(String), "Compare", new Type[] {},
                                  s1, s2);

// Finally, create the expression that represents the lambda function.
LambdaExpression comparisonExpr = 
      Expression.Lambda(typeof(Comparison<String>), body, s1, s2);

Although much more verbose, the latter code snippet allows us to substitute components in the expression tree as desired. In the Dynamite library presented here, this is used to introduce calls to the properties specified in the sort expression. Subsequently, the Expression is compiled to a Comparison<String> delegate that can be used in various existing sorting methods:

C#
// Compiles the expression to an executable delegate
Comparison<String> comparison = 
   (Comparison<String>)comparisonExpr.Compile();

// Invokes the created delegate to compare two strings
int res = comparison("Hello", "World"); // res = -1

Using the code

To enable dynamic sorting in your solution, you only have to add a reference to the Dynamite project (or the Dynamite.dll built from it) and import the Dynamite.Extensions namespace. This will immediately give you access to new overloaded versions of the Sort methods of any array or List<T> instance. All these extension methods take a sortExpression argument of type String. The syntax of sortExpression is same as the SQL ORDER BY clause, i.e., a comma separated list of property/field names. Descending order can be specified by adding the DESC or DESCENDING keyword after a field name.

C#
using Dynamite.Extensions;
...
List<Person> persons = new List<Person>();
...
persons.Sort("BirthDate DESC, Name");

As shown in the very first example, Sort also works on arrays. Worth to note is that these Sort methods perform unstable sorting, which means that the order among items with equal sort keys are not preserved if sorting is repeated on the same sequence. In contrast, OrderBy shown below performs stable sorts, where existing order between items with the same sorting keys is preserved.

LINQ style sorting

The Dynamite.Extensions.ComparerExtensions class also has extension method overloads for the LINQ OrderBy function that can be applied to any sequence that implements the IEnumerable<T> interface, including arrays and generic List<T>s:

C#
// Print names with shortest names first.
foreach( String name in persons.OrderBy("Name.Length, Name").
                                Select(p => p.Name) )
{
    Console.WriteLine(name);
}

The example above demonstrates the use of a nested field expression. Name.Length refers to the property Length of the String typed property Name in the class Person. By the way the comparison method is dynamically built, null values will also be handled. Thus, the above code will not throw a NullPointerException even if Name is null for any Person. Instead, these persons will be placed first, because null values are, by definition, regarded to be less than any other value. To do the above using a hard-coded lambda function, we have to write:

C#
// Print names with shortest names first.
foreach( String name in persons.OrderBy(p => p.Name == null ? new int?() : 
            new int?(p.Name.Length)).
           .ThenBy(p => p.Name, StringComparer.CurrentCultureIgnoreCase)
           .Select(p => p.Name)
{
    Console.WriteLine(name);
}

Above, you can see that using the dynamic sorting OrderBy often results in shorter and more readable code. However, remember that you lose compile-time error checking when specifying field names as string literals. Due to the time to parse and compile the comparer, performance will also be worse than hard-coding the ordering (see Performance below for more details).

Dynamic sorting of Queryables

Dynamite not only supports dynamic sorting of data sources implementing the IQueryable<T> interface used in, for instance, LINQ to SQL, but also for other custom data providers. The usage in this case is very similar to the case with IEnumerables, but the library will translate the field references to a LambdaExpression instead, which is then interpreted by the data provider. This means that we can write:

C#
// Creates a new data context based
// on the common example database Northwind.
Northwnd nw = new Northwnd(@"northwnd.mdf");

foreach (Customer customer in 
         nw.Customers.OrderBy("CompanyName.Length, CompanyName DESC") )
{
    Console.WriteLine(customer.CompanyName);
}

Since nw.Customers implements the IQueryable<Customer> interface, this will be translated to the following database request, when executed:

SQL
SELECT ... FROM Customer ORDER BY LEN(CompanyName), CompanyName DESC

Although, using OrderBy on IQueryable<T>s is very similar to using ordinary IEnumerable<T>s, there are some minor differences that may affect the result. Firstly, to enable translation of the sort expression with nested property expressions such as Company.Length to SQL, no checks for null values are added to the dynamically created key selector. Instead, we rely on the data provider (or the DBMS) to do that for us, if required. Secondly, as the standard SQL data provider does not seem to support the custom IComparable<T> argument of OrderBy, case-insensitive string comparisons are not explicitly specified. The exact string sorting behavior will thereby be determined by the data provider (or the DBMS).

Dynamic sorting of DataSets

To support dynamic sorting of typed and untyped DataSets, there is a third overloaded version of the OrderBy extension method. This makes it possible to specify a sort expression containing column names:

C#
using Dynamic.Data.Extensions;
...
PersonDataSet ds;
...
// Enumerates persons in the Persons table sorted by name and birthdate.
foreach(var row in ds.Persons.AsEnumerable().OrderBy("NAME, BIRTHDATE DESC") )
{
    Console.WriteLine(row.Field<String>("Name"));
}

For DataTables, it is not possible to use nested specifiers such as Name.Length.

Points of interest

Using dynamically created comparers in other situations

Although the examples above use the extension methods defined in the ComparerExtensions class, most work is actually done by the ComparerBuilder<T> class. This generic class defines several public methods to dynamically create Comparison<T> delegates, IComparable<T> and non-generic IComparable implementations, based on a sort expressions with multiple nested properties or single property names.

Improving sorting of collection views in Windows Presentation Foundation (WPF)

As an example of usage without using the extension methods, I will show an alternative way to perform dynamic sorting of collection views in WPF. Dynamic sorting can already be performed using the SortDescriptions property, like this:

C#
List<Person> person;
...
ICollectionView personView = CollectionViewSource.GetDefaultView(persons);
personView.SortDescriptions.Add(new SortDescription("Name",
                             ListSortDirection.Ascending));

Using the Dynamite library, we can do the same thing using the following code:

C#
ListCollectionView personView = 
   (ListCollectionView)CollectionViewSource.GetDefaultView(persons);
             
personView.CustomSort = (System.Collections.IComparer)
   ComparerBuilder<Person>.CreateTypeComparer("Name");

In the latter case, we set the CustomSort property of the ListCollectionView class to a dynamically created type comparer which implements the IComparer interface. Using this option makes it really easy to sort on multiple columns (without messing with several SortDescription instances), and also enables sorting on nested expressions such as Name.Length. Last but not least, using the ComparerBuilder can increase performance compared to using SortDescriptions. In one case, I reduced the time of sorting a list with 100,000 items from 70 seconds to 0.7 seconds, i.e., almost a hundredfold improvement!

Increase performance of enums sorting by avoiding boxing

Something else that I realized while testing sorting performance was that sorting of enums can be increased significantly by simply treating it as an integer (or its underlying numeric type, in general) by casting, as in the following example:

C#
// Sort persons by enum typed property AttractedBy
sortedPersons = new List<Person>(persons.OrderBy(p => p.AttractedBy));

// The following gives the same result but performs better  
sortedPersons = new List<Person>(persons.OrderBy(p => (int)p.AttractedBy));

The reason is likely that enums only implement the non-generic IComparable interface which requires boxing to be used. By casting to an int, which implements the generic IComparable<T> interface, boxing can be avoided. This fact is used in the Dynamite library, but can be used in other cases of sorting, or comparison of enums by value as well.

Performance

The Dynamite library has been developed very much with performance in mind:

  1. Sort expressions are compiled to comparison functions that, once compiled, are as fast as compile-time generated code. This makes each comparison required in the sort algorithm much faster than dynamic sorting strategies based on Reflection and calling Invoke for each comparison to be made, as in ref [1] and [2] (about 40 times slower).
  2. Compiled comparison functions are cached to reduce the cost of generating and compiling the Expression tree on multiple calls with the same sorting key. Otherwise, the time to compile the expression would exceed the time gained compared to pure Reflection for sorting of short lists (5-50 items).
  3. Generic classes and methods are used as much as possible to reduce the need for boxing and unboxing of value types.

Expression versus Emit

Before .NET 3.5, the only way to create code on the fly was to use the classes in the System.Reflection.Emit namespace, which requires low-level knowledge about the low-level MSIL instructions. Ref [4] uses this method to create much optimized sort comparison routines dynamically. In the Dynamite library, I included my own version of this approach in the method ComparerBuilder<T>.CreatePropertyComparisonThroughEmit. Here is a snippet of the code from this method, that creates a comparison for string typed properties:

C#
...
DynamicMethod dynMethod = new DynamicMethod("Compare" + propName, typeof(int),
                                            new Type[] { t, t }, t);
ILGenerator ilgen = dynMethod.GetILGenerator();

if (propertyType == typeof(String))
{
    // Call String.Compare(s1,s2,StringComparison.CurrentCultureIgnoreCase)
    ilgen.Emit(OpCodes.Ldarg_0);
    ilgen.EmitCall(OpCodes.Callvirt, propGetMethod, null);
    ilgen.Emit(OpCodes.Ldarg_1);
    ilgen.EmitCall(OpCodes.Callvirt, propGetMethod, null);
    ilgen.Emit(OpCodes.Ldc_I4_1); // 1 = CurrentCultureIgnoreCase
    ilgen.EmitCall(OpCodes.Call, stringCompareMethod, null);
    if (ascending == false)
    {
        ilgen.Emit(OpCodes.Neg);
    }
    ilgen.Emit(OpCodes.Ret);
}
...
return (Comparison<T>)dynMethod.CreateDelegate(typeof(Comparison<T>));

When comparing performance of these two approaches, I conclude that the actual time to dynamically build and compile a single dynamic comparison method is much less when using Emit, by almost a factor of 10, compared to using Expressions. Using Expressions, on the other hand, is much easier and safer. In addition, thanks to caching, the extended compile time will normally not be an issue. Furthermore, when comparing total sorting times of large lists, I normally did not see any significant improvements by using Emit. If absolute top performance is required, however, Emit might in some cases be used to create highly optimized multi-field comparison routines that is even faster than C# compiler-generated code (see the DynamicComparer in ref [4], but remember that DynamicComparer does not support nested properties).

What you can do with Expressions is also much more limited than what you can do with Emit. As Expressions only support lambda expressions, you cannot use local variables and assignments. In the Dynamite library, I circumvented these limitations by using invocations of subfunctions with arguments to store intermediate results.

References

  1. Generic List Sort Function - Dynamic sorting using extension methods and Reflection.
  2. Generic Multi-Field/Property Sorting for Lists of Business Objects - Another dynamic sorting using "accelerated" Reflection.
  3. Sorting Collections by Multiple Properties - Another dynamic sorting using MergeSort and Reflection.
  4. Dynamic Reflection Library with DynamicComparer that can create extremely fast-running comparers using Emit.
  5. LINQ Dynamic Query Library - Powerful example library from Microsoft that can create advanced dynamic LINQ sort and filtering queries using complex C# or Visual Basic syntax.

History

  • Sep 25, 2008 - Original article published.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)