Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

Searching, Sorting, and Multitasking Comparisons

0.00/5 (No votes)
31 May 2004 1  
Visually displays various searching and sorting techniques using different multitasking techniques.

Sample screenshot

Introduction

Searching and sorting collections of data are at the heart of almost all programming projects. Fortunately, the .NET Framework includes a large number of tools which can help you to both reduce your development time and at the same time write programs which are more stable and perform better. This program demonstrates examples of many different searching and sorting techniques. The code for each technique can easily be borrowed to help you speed development of your applications.

This WinForms program visually compares the performance of nearly a dozen different approaches to searching and sorting data. The program also allows you to compare running these data manipulations using different multitasking techniques and settings. The multitasking settings will let you to see the degree to which the foreground program remains responsive to user interaction while performing the background data processing.

Background

Almost all programs must gather and manipulate sets of data. There are many very familiar ways to store, search, sort, and iterate through these collections of data. I have found that choosing the most appropriate technique for a particular task will greatly simplify coding while also improving the program's performance. For instance, I recently rewrote a complex production scheduling and optimizing routine that had originally been written in C++ (for speed, don't you know). The original program stored its data in a tangled collection of objects and linked lists which had to be frequently searched and sorted. The program was both slow and so mind-numbingly complex that even minor changes frequently broke the program in quite unexpected ways. As a result, none of my client company's developers wanted to touch the darn thing for fear it would explode. After studying the program and the problem, it became clear that using a set of Queues and Hashtable Collections would be much more elegant fit for the core scheduling and optimizing problem. The program is now much smaller, far less complex, much more stable, and runs about twenty times faster even though it is no longer coded in C++.

This sample program will allow you to compare the performance and underlying code for many different data manipulation techniques running with varying amounts of data. I have found that this is a very handy workbench when choosing the best tool to use for a coding problem. It is also a good source of code snippets to help speed development.

The program presents a list of search and sorting techniques in a ListView. I have used Alan Anderson's outstanding Glacial List (see: article) in place of the standard ListView. This very nice free control supports many enhanced features which allowed me to provide the user with a richer and more interactive experience. The control makes it very easy for me to embed progress bars to display the current processing status. I have also replaced the standard progress bar with Alan Zhao's nifty Color Progress Bar (see: article) for much of the same reasons.

The user may start a timed search of sort by clicking on the corresponding cell of the ListView. The cell will then display a progress bar as the processing proceeds, and then will display the number of seconds needed to complete the operation. Clicking the cells one by one will allow you to compare the amount of time needed for each operation. If one of the multithreading modes is set, then you may click on as many multiple cells for concurrent processing. Naturally, this will greatly increase the individual searching and sorting times as they will reflect the total load on the CPU from all the chosen tasks rather than that of just the single selected task. The advancing progress bars do make rather nice eye candy. More importantly, they can help you to gauge the responsiveness of the foreground window as the processing load grows.

You may also click on any of the descriptions in the Method column to see the data for that object. For instance, clicking on the "ADO Detached Recordset" cell yields the following popup screen:

Sample screenshot

As you can see, our automatically generated sample data is structured as follows:

Public Structure TestDataStructure
    Dim lUnique As Integer
    Dim sUnique As String
    Dim lNonUnique As Integer
    Dim sNonUnique As String
End Structure

The lUnique field is a 32 bit Integer that is a unique key for the data. The sUnique field is a unique String while the lNonUnique and sNonUnique are non unique Integer and String respectively. While coding for this compound structure is a bit more complicated than would be the case for the more commonly used Key and Item, it is also far more flexible. The data structure could be easily expanded to have any number of fields of any data type. The TestDataStructure is essentially a table of data.

You will notice that our searching and sorting methods fall into three categories:

  1. those that rely upon ADO and ADO.NET services;
  2. those that use arrays; and
  3. those that rely upon the .NET Framework System.Collections classes.

The .NET Framework includes a remarkably broad and flexible set of tools in its Systems.Collections namespace that also tend to be rock solid and of very high performance. These classes and interfaces are so wide ranging that a decent argument could be made that a developer should use them almost exclusively. The other technique do have some advantages for certain problems. For instance, arrays tend to be the fastest to load, scan, and in some cases to use for retrieving a data item.

The advantages of the ADO.NET facilities may be less obvious, but in my mind are even more compelling for certain types for problems. While ADO.NET is often an order of magnitude slower than some of the System.Collections classes, it still is a very high performance tool. Many developers would be surprised at just how fast ADO.NET is for local data manipulations. As long as we do not need to access data on a server or some form of file, ADO.NET is a screamer. For instance, on my middle of the road 2Ghz test box, it is able to perform over 100,000 keyed lookups in a second and sort over 150,000 records a second. For comparison, the best performing HashTable collection was able to perform about 600,000 keyed lookups per second. Most programs manipulate much smaller working data sets.

ADO.NET is able to crunch the average sized working data table in an imperceptibly small amount of time. In the case of a WinForm application, the processing is performed exclusively on the user's workstation so the extra processing time cannot slow any other users. In such cases, other factors should often be more important than raw performance gains particularly when the performance gain would never be noticed by a user. One such consideration is the time it takes the programmer to code and maintain the solution. ADO.NET can speed development by reducing and simplifying the code need. For instance, retrieving one or more sorted records can be as simple as:

Dim rows As DataRow() = m_dt.Select("sNonUnique = 'Fifth'", "lUnique")

Much more code is need for this example when working with collections or arrays. The extra code will require more time to write and debug. Having the data already in an ADO.NET structure also simplifies other tasks such as storing the data into a database server or outputting it into an XML file. It is also very easy to create multiple copies or DataViews which can be manipulated independently. It is also fairly easy to layer in data integrity and table relationship features that are alien to the System.Collections world.

You may wonder why I included support for the old ADO Recordsets. I have used detached ADO RecordSets in many Visual Basic 6 projects for local data manipulations. I wanted to see how ADO.NET stacked up against the trusty old ADO. The answer is -- quite well. Detached ADO RecordSets required the developer to manually create indexes in order to archive decent performance. ADO.NET seems to dynamically create its own indexes. Loading, scanning, sorting, and most searches are much faster with ADO.NET.

One surprise was how slow the VisualBasic.Collection class is for iterating through the data. While keyed searches are fast, simply scanning the collection seemingly takes forever. Sorts are also very slow. I assume that this class exists mostly for backward compatibility with VB6 programs. I recommend avoiding this class whenever possible.

Another surprise was the behavior of the SortedList collection. It performed reasonably well when loading small numbers of records. The performance exponentially degraded when loading larger number of records. 100,000 records took over 80 seconds while the ArrayList collection loaded the data in 0.01 second. The SortedList should only be used for fairly small sets of data.

It was not surprising to see that the HashTable is the fastest way to perform keyed searches. I was surprised the scans are much slower than is the case with other System.Collection classes.

The following table summarizes some key features of our data manipulation techniques:

Method Duplicates Resizing Sorting
ADO Detached RecordSet User defined (automatic) Good
ADO Detached RecordSet (Indexed) User defined (automatic) Good
ADO.NET DataTable User defined (automatic) Fast
ADO.NET DataView User defined (automatic) Fast
Array(X) of Structures Allowed No Good
Array(X,3) of Objects Allowed No Slow
Collection Not Allowed (automatic) Slow
ArrayList Collection Allowed (automatic) Good
SortedList Collection Not Allowed (automatic) (automatic)
HashTable Collection Not Allowed (automatic) (automatic)
Queue Collection Allowed (automatic) n/a (FIFO only)

Method Searching Insert Remove
ADO Detached RecordSet Slow Flexible Flexible
ADO Detached RecordSet (Indexed) Good Flexible Flexible
ADO.NET DataTable Good Flexible Flexible
ADO.NET DataView Very Good Flexible Flexible
Array(X) of Structures Slow No No
Array(X,3) of Objects Slow No No
Collection Very Fast Flexible Flexible
ArrayList Collection Fast Flexible Flexible
SortedList Collection Fast n/a (sorting always forced) Flexible
HashTable Collection Fastest n/a (sorting always forced)  Flexible
Queue Collection Slow At the end only From top only

Using the code

The primary data class is contained in the TestData class. This class is responsible for storing, populating, and comparing the data. The class implements IComparer routines to support the search and sort routines.

Public Class lUnique_IComparer
   Implements IComparer
 
   Function Compare(ByVal x As Object, ByVal y As Object) As Integer _
     Implements IComparer.Compare
     Return (DirectCast(x, TestData.TestDataStructure).lUnique) - _
            (DirectCast(y, TestData.TestDataStructure).lUnique)
   End Function
 End Class
 
 Public Class lNonUnique_IComparer
   Implements IComparer
 
   Function Compare(ByVal x As Object, ByVal y As Object) As Integer _
     Implements IComparer.Compare
     Return (DirectCast(x, TestData.TestDataStructure).lNonUnique) - _
            (DirectCast(y, TestData.TestDataStructure).lNonUnique)
   End Function
 End Class
 
 Public Class sUnique_IComparer
   Implements IComparer
 
   Function Compare(ByVal x As Object, ByVal y As Object) As Integer _
     Implements IComparer.Compare
   Return StrComp((DirectCast(x, TestData.TestDataStructure).sUnique), _
                    (DirectCast(y, TestData.TestDataStructure).sUnique), _
                    CompareMethod.Binary)
   End Function
 End Class

Each data management type is handled by its own class. These classes are responsible for populating, scanning, searching, sorting, and displaying their data. Each of these are aware of the multitasking mode and can support the ability to cancel a looping task, update the screen with progress information, and where necessary, release control to the foreground window via a DoEvents() call. The class exposes the following core functions:

Public Function Create() As String 
Public Function DatasetScan() As String 
Public Function Find_1_Key_Field() As String 
Public Function Find_All_NonKey_Values() As String 
Public Function Find_n_Fields() As String 
Public Function Sort() As String

The Main form contains the dispatching code. Most of this is straightforward. It does contain delegates to support the multithreading operations which may confuse those who are seeing this for the first time. The code can be easily extended to support the searching and sorting method of your choice by adding just a few lines to the Main form and copying and customizing one of the existing searching/sorting classes.

Credits

My thanks to Alan Anderson and Alan Zhao for their handy controls. I am particularly indebted to Rob Macdonald and his excellent Serious ADO: Universal Data Access with Visual Basic book from which some of the ideas for this program where drawn.

History

I will be adding more details to the article soon. For now, enjoy.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here