Introduction
The very first week in my new group, I was asked if I could improve the performance of a loop in a method that was being called from the business logic façade of an application. The purpose of the loop was to synthesize data back and forth between custom objects fetched from the data layer into similar but incompatible custom objects in the business layer. I know what you're thinking… pretty standard OOP stuff from around 2001, but you probably also know that this stuff requires good code familiarity due to the symbiotic relationship that exists between all of the moving parts. To make things even more difficult, some of the properties in those objects were actually returning collections of other incompatible objects that needed to be synthesized back and forth as well.
Using the Code
I decided that the best course of action would probably be to focus on the loop itself. I knew that the loop iterated through a collection of objects and converted them to a collection of other objects, and I knew that the collection could be potentially large, so I figured that I could get some immediate performance improvement by simply populating the objects in the loop asynchronously versus in-line.
The first thing I needed was a baseline performance count. I need to get a count, in milliseconds, of how long the loop alone took to complete without the fetch operation, which could skew the statistics by as much as 50%. In the old days of 1.1, I used to have to import the QueryPerformanceCounter
class in the Kernel32 library and run start, stop, and clear operations and calculate the totals: http://msdn2.microsoft.com/en-us/library/ms979201.aspx.
Today, however, we can do the same thing with the Stopwatch
class in the System.Diagnostics library with two commands: Start
and Stop
. Don't forget to write the output from the stopwatch using a formatted string and to ToString()
the ElapsedMiliseconds
property to avoid boxing operations.
Let's look at a mock up of the code:
public sealed class SolutionEntityMapper
{
public static FooDataCollection MapObject(FooEntityCollection fooEntityCollection)
{
FooDataCollection fooDataCollection = new FooDataCollection();
#if DEBUG
System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();
sw.Start();
#endif
foreach (FooEntityItem fooEntityItem in fooEntityCollection)
{
FooDataItem fooDataItem = new FooDataItem ();
fooDataItem.SomeProperty = fooEntityItem.SomeProperty;
fooDataCollection.Add(fooDataItem);
}
#if DEBUG
sw.Stop();
System.Diagnostics.Debug.WriteLine(String.Format(
"Operation Executed in {0} miliseconds.", sw.ElapsedMilliseconds.ToString()));
#endif
return fooDataCollection;
}
}
Looks pretty straightforward. Start the watch, run the loop, make the new object, fill in the properties (remember, some properties return collections, so they have to call other helper functions to populate the collection), and stop the watch. To prepare the collection, I created a unit test that would pre-populate the collection with 1000 Entity objects. The loop ran in just under 6000 milliseconds; for conversion sake, we'll call it 6 seconds. That's barely tolerable for a small company, but for a large company serving 30,000+ users, it just doesn't scale well at all.
OK, so let's try it asynchronously. The first thing I wanted to accomplish was to have the loop set the data item in motion, move on to the next, set it in the motion, and then wait for everything to finish and return the data. The code to synthesize data was pretty long, and I tend to favor chunky methods that isolate behavior, so I moved the code to a method named ProcessEntity
and focused on the asynchronous operations in the MapObject
method. Next, I had to create a delegate for the ProcessEntity
method so I could call it asynchronously. Lock the shared method:
public sealed class SolutionEntityMapper
{
Static object myLock = new object();
delegate FooDataItem ProcessEntityDelegate (FooEntityItem fooEntityItem);
private static FooDataItem processEntity(FooEntityItem fooEntityItem)
{
lock (myLock);
{
FooDataItem fooDataItem = new FooDataItem();
fooDataItem.SomeProperty = fooEntityItem.SomeProperty;
return fooDataItem;}
}
}
}
Now that I've got the methods separated, let's think about how to call the loop asynchronously. We know that if we call BeginInvoke
and then call EndInvoke
during our loop, we may as well just run the loop in line because EndInvoke
blocks until the thread is completed. The only way to run a loop asynchronously and achieve the type of performance we need is to use a callback method. This presents its own problems because we also need the callback method to return a filled in fooDataItem
and then add the item to a collection and return the collection back to the calling method.
We could use out
parameters, but then how do we get the output into a new collection when they're returning asynchronously? We could declare a Member
collection and populate that, but keeping track of member collections that we fill in asynchronous callback methods is not something I want to do in future generations of the code. Here is where the beauty of C# anonymous methods come in. If we create an anonymous delegate for the callback method, we can access the data item in one single method. One consideration if we use an anonymous method for the callback is to ensure that we don't return control to the calling method before the collection is filled up. For this, I created a while
loop that ensures the collection returned is the same size as the collection processed. I check the Count
property on each collection because each time a member is added to the collection, the Count
property is updated and the member is stored for later use. This isn't the best choice, but we'll fix it later in the post.
while (fooDataCollection.Count != fooEntityCollection.Count)
{
}
Now let's write the code:
public sealed class SolutionEntityMapper
{
Static object myLock = new object();
delegate FooDataItem ProcessEntityDelegate (FooEntityItem fooEntityItem);
delegate void callBackDelegate(IAsyncResult ar);
public static FooDataCollection MapObject(fooEntityCollection fooEntityCollection)
{
ProcessEntityDelegate processEntityDelegate =
new ProcessEntityDelegate SolutionEntityMapper.processEntity);
FooDataCollection fooDataCollection = new FooDataCollection();
#if DEBUG
System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();
sw.Start();
#endif
callBackDelegate del = delegate(IAsyncResult ar)
{
ProcessEntityDelegate processSolution = ar.AsyncState as ProcessEntityDelegate;
FooDataItem fooDataItem = processSolution.EndInvoke(ar);
fooDataCollection.Add(fooDataItem);
};
foreach (fooEntityItem fooEntityItem in fooEntityCollection)
{
IAsyncResult result = processEntityDelegate.BeginInvoke(
fooEntityItem, new AsyncCallback(del), processEntityDelegate);
}
while (fooDataCollection.Count != fooEntityCollection.Count)
{ }
#if DEBUG
/Stop the timer
sw.Stop();
System.Diagnostics.Debug.WriteLine(String.Format(
"Operation Executed in {0} miliseconds.", sw.ElapsedMilliseconds.ToString()));
#endif
return fooDataCollection;
}
private static FooDataItem processEntity(FooEntityItem fooEntityItem)
{
lock (myLock);
{
FooDataItem fooDataItem = new FooDataItem();
fooDataItem.SomeProperty = fooEntityItem.SomeProperty;
return fooDataItem;
}
}
}
So what's the output time for processing 1000 records synchronously vs. asynchronously?
- Original loop: 6000 milliseconds or 6 seconds.
- Asynchronous loop: 194 milliseconds or 0.194 seconds.
Not a bad improvement for a little refactoring but, can we do better? Waiting for the asynchronous operations to finish with a while
loop is not something we want to do if we can avoid it.
Asynchronous calls were really designed more for running single threads for processing like I/O operations, while Thread Pools were designed for processing multiple tasks called background threads, and they take care of the thread management overhead for us. The web pool will hold a maximum settable number of worker threads up to a predefined limit (default is 25) and keep them in a suspended state so they are ready for the next asynchronous operation. For now, we'll use the default; more notes on tuning at the end of this posting. For more information, please refer to http://msdn2.microsoft.com/en-us/library/0ka9477y.aspx.
We will need to ensure that control is not returned to the calling method until the thread pool is finished, and we have several options available to us. My choice is to use the AutoResetEvent
in combination with a WaitHandle
. More information on the topic can be found here: http://msdn2.microsoft.com/en-us/library/system.threading.waithandle.aspx. Ideally, we would replace the while
loop with an array of AutoResetEvent
s and add one AutoResetEvent
per thread, then call the WaitHandle.WaitAll
passing in the AutoResetEvent
array. To make things more in line with the example above, I will replace the while
loop with a single AutoResetEvent
and signal the WaitHandle
using a similar syntax to the original while
loop.
Let's replace the asynchronous callback delegate with a thread pool wait callback delegate and see what happens.
public sealed class SolutionEntityMapper
{
Static object myLock = new object();
delegate FooDataItem ProcessEntityDelegate (FooEntityItem fooEntityItem);
We turn the anonymous method to a thread pool wait callback method, except this time, we pull out the FooDataItem
from the thread context. Create an array of auto reset events, and add a single AutoResetEvent
to the array, initializing it to false
.
public static FooDataCollection MapObject(fooEntityCollection fooEntityCollection)
{
ProcessEntityDelegate processEntityDelegate =
new ProcessEntityDelegate (SolutionEntityMapper.processEntity);
FooDataCollection fooDataCollection = new FooDataCollection();
AutoResetEvent[] threadCompleted = new AutoResetEvent[] { new AutoResetEvent(false) };
#if DEBUG
System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();
sw.Start();
#endif
WaitCallback del = delegate(Object threadContext)
{
FooDataItem fooDataItem = threadContext as FooDataItem;
fooDataCollection.Add(fooDataItem);
if (fooDataCollection.Count == fooEntityCollection.Count)
threadCompleted[0].Set();
}
Now, we can move the Thread Pool code into the loop:
foreach (fooEntityItem fooEntityItem in fooEntityCollection)
{
ThreadPool.QueueUserWorkItem(new WaitCallback(del),
processEntityDelegate (fooEntityItem));
}
#if DEBUG
/Stop the timer
sw.Stop();
System.Diagnostics.Debug.WriteLine(String.Format(
"Operation Executed in {0} miliseconds.", sw.ElapsedMilliseconds.ToString()));
#endif
WaitHandle.WaitAny(threadCompleted);
return fooDataCollection;
}
}
So, what's the output time for processing 1000 records synchronously vs. asynchronously vs. Thread Pooling?
- Original loop: 6000 milliseconds or 6 seconds.
- Asynchronous loop: 194 milliseconds or 0.194 seconds.
- Thread Pool: 101 milliseconds or 0.101 seconds.
Points of Interest
Tuning: The number of threads in the pool will control how many tasks you can complete in tandem. A thread pool with a limit of 25 will allow 25 worker threads per processor core. A dual processor or dual-core machine with a thread pool set to 25 max threads will allow 50 thread pool threads. If your application receives 100 simultaneous requests on a dual processor system with a thread pool limit of 25, 50 will be immediately processed and the other 50 will wait in the queue. As the initial 50 are completed, the others will move up in the queue.
Also, remember that the number of users in your system or the number of simultaneous connections is not the same as the number of simultaneous requests.