Introduction
The goal was to achieve improved sorting performance by spawning a limited number of threads to use as much of the available processing power on multi-core and/or multi-processor machines as possible. The result is an improvement of 40% on dual core and more than 100% on a 4 processor machine.
Background
As part of a team involved in importing RDBMSs into an Associative Knowledge base, I recently inherited some code that hashed through a large amount of relational data, aggregating it before pushing it onto the server.
This article though, really has nothing to do with that. I was just looking for an excuse to generate heat. In fact I was quite embarrassed that the code was consuming a full 25% of the available processing on the quad core server, so I decided to do something about it.
After 2 months of cursing, I now have a new found respect for threaded architecture. There are pitfalls galore, even if you read all the documents. Nuances, such as code that runs on one machine and not on another. A lock that just doesn't seem to lock, and locks that make your code run like molasses.
The number one lesson I've learned is not to spawn threads unless there is a clear benefit to doing so. Even cheap threads are expensive when you have thousands of them around.
The number two lesson is to minimize contention or to avoid locking if you can.
Okay, so applying these lessons to sorting gives the following:
- Split your big unsorted list by the number of available processors.
- Spawn one thread per split that sorts them using your favourite sorting routine.
- Merge pairs of result lists using Mergesort until you have no more results to merge.
(You will find this logic applied in the ParallelSorter
method of Jigger.Parallel<T>
).
Perfectionist Disclaimer
I do NOT claim that the source code provided has much merit as far as programming style goes. The sorting logic was extracted from a working class and placed in a test and benchmarking solution for the sole purpose of proving that it was worth doing. Memory consumption was taken into consideration but not to the detriment of performance.
Suggestions
However, having said that, if you have anything (con/de)structive to say, I'm all ears.
Source and Contributing
The zip contains the source, unit test and benchmarking files. I hope you have Visual Studio 2008, otherwise you will have to do some copy-pasting.
You will also find a set of results from my tests on my Dual Core notebook and on a Dual core, Dual processor server. I would be interested to hear from those of you that have other configurations. To add to the list, just install Ntime.
Using the Code
You will find many examples in the Parallel.Sort.Test.ParallelTester
class. I have extracted what's pertinent and put it below:
List<KeyValuePair<int, string>> stringDupesList=
new List<KeyValuePair<int, string>>();
ManyStringAdder(ref stringDupesList, 20);
Parallel<KeyValuePair<int, string>> parallelDo =
new Parallel<KeyValuePair<int, string>>(new KeyValueComparer<int, string>());
stringDupesList = parallelDo.MergeSort(stringDupesList));
stringDupesList = parallelDo.MSQuickSort(stringDupesList));
stringDupesList = parallelDo.MergeInsertionSort(stringDupesList));
Points of Interest
Building a solution that's worthy of being published as an article requires quite an effort. I had already benchmarked my sort with a console application, but when you need to show your results to others you have to be more formal and certain. For unit testing, I had already settled on Visual Studio 2008 but for micro benchmarking I had to do some research.
Micro-benchmarking
After a few days of rummaging around the interweb, wondering whether I should translate my own JavaScript profiler and checking to see if profiling tools could benchmark to the degree I require, I ended up with three 3rd party candidates.
- Nperf is a neglected but cool tool that provides graphs. Maybe part of Gallio ?
- Ntime provides an nUnit interface, is feature full and was easy to implement
- Microbenchmarking by Jon Skeet is C# source that you compile in your class
I went with NTime because I had little time to figure out the other candidates. Jon Skeet's tool provides a subset of the functionality in NTime and NPerf was temperamental to say the least. I still think I should translate my JavaScript profiler because it outputs to XML, ready to graph. But that's for another day.
Testing
The magic here is in what to test. I knew that my sort was running in production but I had never tested some obvious combinations of data types, values and thread count because my application never went near them. But, if you want other people to run your software, you are going to have to think about more uses for your code and test it. In the end, I spent half a day on the sort class and more than a week writing the tests and benchmarking in the hopes that I'll be able to re-use some of it in the future.
Thanks
In no particular order to:
- NTime (by AdamSlosarski) for saving my time
- Nick Taylor for being a sounding board
- Kevin Fleming for inspiring usage of sound practices that I sometimes use
- My wife for going with the flow
- Ron Everett for not saying no when I said I wanted to write this article
- Everyone else that I know
History
- 2009-03-05
- First release of
Jigger.Parallel<T>.Sort