Introduction
My next installment of the Application Automation Layer requires a thread pool to manage worker threads. On investigating .NET's ThreadPool class, I discovered it is not quite what I had in mind. To quote from Manisha Mehta's article "Multithreading Part 4: The ThreadPool, Timer classes and Asynchronous Programming Discussed" found
here[
^]:
As you start creating your own multi-threaded applications, you would realize that for a large part of your time, your threads are sitting idle waiting for something to happen...
This is true only for a subset of threads, for example when an I/O completion event occurs and the thread is released. In many cases I require the ability to create threads that perform some work in the background while still allowing the user to interact with the application. And in specific, I'd like the ability to explore what I'll call "competitive threads", that is, adjusting thread priorities based on different factors. Think of it as "quality of service" for threads.
The article also states:
But remember that at any particular point of time there is only one thread pool per process and there is only one working thread per thread pool object...[A thread pool] has a default limit of 25 threads per available processor...
These statements seem contradictory, implying that only one thread can be executing per process, but that there are 25 threads available, all of which is very confusing. After investigating the code behind the ThreadPool, I have found that this statement is "sort of" true but also lead to some other discoveries.
The rest of this article discusses my observations. For purposes of understanding some of the numbers discussed, keep in mind that I am running these tests on a 1.6Ghz P4 single processor system. Also note that for all of these tests I am using a timer event set up to time out after one second. The timer event sets a flag which the thread monitors. When the thread sees the flag is set, it terminates itself. The code for the timer event is:
static void OnTimerEvent(object src, ElapsedEventArgs e)
{
done=true;
}
How High Can I Count?
How high can I count in 1 second? Roughly 10,277,633. Using a
System.Timers.Timer
object set to trigger after one second, the the code happily counts until the timer expires.
static decimal SingleThreadTest()
{
done=false;
decimal counter=0;
timer.Start();
while (!done)
{
++counter;
}
return counter;
}
Why Use Thread Pooling?
Here's the reason to use thread pooling. This time, I'm going to see how high I can count creating and destroying a thread for every increment of the counter. Here's the code:
static void CreateAndDestroyTest()
{
done=false;
timer.Start();
while (!done)
{
Thread counterThread=new Thread(new ThreadStart(Count1Thread));
counterThread.Start();
while (counterThread.IsAlive) {};
}
}
static void Count1Thread()
{
++count2;
}
And the answer is:
11 Yes. ELEVEN. In one second, my machine was able to create and destroy eleven threads. Obviously, if I have an application that needs to process lots of asynchronous, non I/O completion type of events, thread creation and destruction is a very expensive way to go. Hence the need for a thread pool.
First, A Benchmark
Before testing the performance of a thread pool, a benchmark is useful. I created one by simply instantiated 10 counting threads. Each thread increments its own counter, and at the end of one second, exits. Here's the code:
static void InitThreadPoolCounters()
{
threadDone=0;
for (int i=0; i<10; i++)
{
threadPoolCounters[i]=0;
}
}
static void InitThreads()
{
for (int i=0; i<10; i++)
{
threads[i]=new Thread(new ThreadStart(Count2Thread));
threads[i].Name=i.ToString();
}
}
static void StartThreads()
{
done=false;
timer.Start();
for (int i=0; i<10; i++)
{
threads[i].Start();
}
}
static void Count2Thread()
{
int n=Convert.ToInt32(Thread.CurrentThread.Name);
while (!done)
{
++threadPoolCounters[n];
}
Interlocked.Increment(ref threadDone);
}
...and the code that actually puts it all together:
...
InitThreadPoolCounters();
InitThreads();
StartThreads();
while (threadDone != 10) {};
...
The resulting count for each thread is:
T0 = 957393
T1 = 1003875
T2 = 934912
T3 = 1004638
T4 = 988772
T5 = 962442
T6 = 979893
T7 = 777888
T8 = 923105
T9 = 982427
Total = 9515345
Within a reasonable margin of error, the 10 separate threads total to the same value as determined earlier by the single application thread counter. This therefore is our benchmark for the performance of a thread pool.
Using The ThreadPool
Now, let's see what happens when I use .NET's ThreadPool object:
static void QueueThreadPoolThreads()
{
done=false;
timer.Start();
for (int i=0; i<10; i++)
{
ThreadPool.QueueUserWorkItem(new WaitCallback(Count3Thread), i);
}
}
static void Count3Thread(object state)
{
int n=(int)state;
while (!done)
{
++threadPoolCounters[n];
}
Interlocked.Increment(ref threadDone);
}
The test, which is suppoed to run for only 1 second, takes something like 30 seconds to complete! And when it does complete, the counts are ridiculously high, indicating that the timer event never fired. To understand this, we have to dive into the sscli\clr\src\vm\win32threadpool.cpp code. Let's look first at
ThreadPool.QueueUserWorkItem()
.
This function puts the worker thread delegate into a queue and tests whether a new thread should be created or not. If a new thread should be created, it calls the CreateWorkerThread
function and exits. Conversely, if the a thread is not to be created at this point, a different thread, "CreateThreadGate" is created, if it doesn't already exist. The purpose of the thread gate is to periodically check and see if the thread can be created at a later time.
The ShouldGrowWorkerThread
function performs a test of 3 parameters to determine whether or not a new thread should be created.
Note the the very first thing this function tests is to see if the number of running threads is less than the number of available CPU's. Obviously, this function will return false when there are one or more running threads on a single CPU system. When this is the case, (as per the flowchart) the thread gate is utilized to create the thread at a later time. I'll get to that shortly.
The CreateWorkerThread
function is pretty much just a stub to instantiate the actual worker thread:
The worker thread sits in a wait loop awaiting a WorkRequestNotification event, timing out after 40 seconds if the event isn't signalled. Assuming the event is signalled, execution continues by removing the delegate from the queue (which places the event in the unsignalled state), testing whether a valid delegate was actually obtained, and then invoking the delegate. When the delegate returns, the worker thread immediately checks to see if there are additional requests queued. If there are, it processes those requests, and if not, it returns to the wait state.
From this code alone, a paramount fact of a ThreadPool managed thread is revealed: Do Your Work Quickly. Any time spent by your worker thread will result in delays when processing other items in the queue.
Now let's inspect the gate thread by looking at the GateThreadStart
flowchart:
The first thing of note is that this function goes to sleep for 1/2 a second. After this delay, it performs a test to determine whether there are any thread requests in the queue. If not, it goes back to sleep. If so, it calls into a function to determine whether a thread should be initiated. This function delays creation of a thread based on the time the last thread was created and the number of currently running threads. By inspecting this table:
it is interesting to note that can take 5.4 seconds for the 25th thread to be instantiated. Furthermore, because of the Sleep(500) call, these times end up being quantized to 500ms intervals when threads are created in rapid succession. For example, if two threads are created in succession, the second thread will take 1000ms as the requisite 550ms will not have transpired and the function returns to the sleep state.
This section of the ThreadPool emphasizes the need to get in and out of your worker thread as quickly as possible so as to avoid the bottleneck that will occur if there are several concurrent threads running.
Timers And Waitable Objects
Timers callback using the ThreadPool. Therefore, if you want fairly reliable timers, they and your worker threads need to be short. Similarly, any waitable object, such as I/O completion events, that are managed by the ThreadPool are also affected by the implementation of the other threads in your application.
What Are The Alternatives?
As I mentioned at the beginning of this article, not all threads are created for the same purpose. While the ThreadPool is useful for managing threads that are usually in a wait state and that take only a short amount of time to do their work, .NET's ThreadPool class is a very poor choice for managing threads in situations that do not meet this criteria.
Fortunately, Stephan Stoub at Microsoft has written a ManagedThreadPool
class that is designed for this second type of thread requirement. Using it is identical to .NET's ThreadPool:
static void QueueManagedThreadPoolThreads()
{
done=false;
timer.Start();
for (int i=0; i<10; i++)
{
Toub.Threading.ManagedThreadPool.QueueUserWorkItem(
new WaitCallback(Count3Thread), i);
}
}
And as the following test numbers illustrate:
T0 = 970806
T1 = 996123
T2 = 914349
T3 = 990998
T4 = 977450
T5 = 957585
T6 = 951259
T7 = 770934
T8 = 982279
T9 = 1135806
Total = 9647589
it performs very well. The advantage of Stoub's ManagedThreadPool
class is that all threads are initially created and assigned when needed. There is no complex holdoffs of thread creation, making this thread pool suitable for threads of both types.
Conclusion
I hope that this article has shed some light onto the complexity of thread pools. The attached code includes Stephan Stoub's code, unmodified. I leave it to the reader to further inspect his code, which is quite excellent.
And special thanks to CPian leppie for finding Stoub's code for me!