Introduction
Input and output are inherently slow compared with other processing. This slowness is due to several factors:
- Delays caused by track and sector seek time on random access devices, such as disks and CD-ROMs.
- Delays caused by a slow data transfer rate between a physical device and system memory.
- Delays in network data transfer using file servers, storage area networks, and so on.
Tools like Process Explorer and the Performance counters can count the bytes per second as a rate or an actual quantity to use as a criterion to evaluate data transfer. In Windows, when I/O has been thread-synchronous, the thread waits until the entire I/O operation is complete. If you compare the speed of a hard drive to that of the CPU, then these I/O operations will be inherently slow, possibly enough to cause I/O conflicts. When the scheduler has to execute another thread within a different process, the result is a context switch. This is a heavy-weight operation: the page tables in the processor have to be reset; the processor’s cache has to be reset. There are changes to the translation look-aside buffer. More to the point, these caches are often used to store operating system code to execute, rather than fetching them from a memory location. While system code execution is not linear and will be preempted, processor architecture dictates that algorithms calculate how much system code can be stored in the microprocessor’s cache and pipelines. Developers are finding that it is necessary to divide an application into a multithreaded application. Multithreaded applications can take advantage of dual-core microprocessors and create scalability. For instance, if you print a Microsoft Word document, a thread runs the spooler, but you can edit the document while it is printing. These actions are actually overlapping computing operations with I/O operations. When performing an asynchronous compute-bound operation, it can execute using other threads. When performing asynchronous I/O-bound operations, you have a Microsoft device driver doing the work for you and no threads are required. This article is written to explain how a thread can continue without waiting for an operation to complete. That is, threads can perform asynchronous I/O.
Stated strictly, threads are overhead. Creating a thread is not really cost-effective: a thread kernel object has to be allocated and initialized, as each thread gets 1 MB of address space reserved (and committed on demand) for its user-space stack, and another 20 KB (or according to the MSDN 12 KB) is allocated for its kernel-mode stack. Then, just after creating a thread, Windows calls a function in every DLL in the process notifying each DLL that a new thread has been created. Destroying a thread is overhead as well: every DLL in the process receives notification that the thread is about to die, and the kernel object and the stacks have to be freed. Recall that when Windows makes a processor stop executing one’s thread code and start executing another thread’s code, they call this a context switch ( a change in functionality). Recall that a context switch involves entrance into kernel mode. You have to save the CPU’s registers into the currently executing thread’s kernel object. The system must then acquire a spin lock, determine which thread to schedule next, and release the spin lock. For an accurate accounting of CPU consumption, use Process Explorer. Go to the View menu, select Performance and check three choices: CPU cycles, CSwitch Delta, and Context Switches. This will give more information about which threads are consuming CPU so they can perhaps be right-clicked and suspended. Note that any operating system’s strength is a function of how well and cooperative the application software works and interfaces with the systems software. Getting an accurate accounting of threads that are consuming too much CPU prevents these threads from wasting too much memory resources. Or too many threads degrade system performance.
The CLR’s Thread Pool
To improve this situation, the CLR contains code to manage its own thread pool. The CLR’s Thread Pool is like a set of threads that are available for an application’s use. There is one thread pool per process and this thread pool is shared by all Application Domain’s in the process. When the CLR initializes, the thread pool has no threads in it. Internally the CLR maintains a queue of operation requests. What is a queue? It is sort of like a braided strand that stores items as if it were almost like a waiting list (stated crudely). When an application wants to perform an asynchronous operation, you call some method that appends an entry into the thread pool’s queue. The thread pool’s code will extract entries from this queue and dispatch the entry to a thread pool thread. Starting with .NET 2.0, the default maximum number of worker and I/O completion threads is 25 and 1000, respectively. A well-designed application should not need anywhere near 25 threads per processor. The 1000 I/O completion threads are equivalent to unlimited. If there are no threads when the CLR’s internal code dispatches an entry to a thread pool thread, a thread will be created. While there is a performance hit with the creation of a thread, it is offset when the thread pool thread has completed its task: the thread is not destroyed, but rather is returned to the thread pool, where it sits idle waiting to respond to another request. Since there is no destruction of the thread, there is less of a performance hit. There are two types of technologies being used by CPU manufacturers today: hyper-threading and multi-core. Both of them allow a single chip to appear as two or more CPUs to Windows and applications. The CLR’s code used to manage a thread pool is able to attain a certain balance between not having enough threads and having too many threads. If the application has many tasks, and the CPUs are available, the thread creates more threads. If the application’s workload decreases, the thread pool threads kill themselves.
Internally, the CLR categorizes its threads as either worker threads or I/O threads. Worker threads are used when an application asks the thread pool to perform an asynchronous compute-bound operation (which can include initializing an I/O-bound operation). I/O threads are used to notify your code when an asynchronous I/O-bound operation has completed. This means that you are using the APM to make I/O requests like accessing a file, network server, database, web service, or other hardware device. In short, the thread pool cuts overhead by sharing and recycling threads, allowing multithreading to be applied at a more granular level without a performance hit. The easiest way into the thread pool is by calling ThreadPool.QueueUserWorkItem
instead of instantiating and starting a Thread
object:
using System;
using System.Threading;
class Test {
static void Main() {
ThreadPool.QueueUserWorkItem (Go);
ThreadPool.QueueUserWorkItem (Go, 123);
Console.ReadLine();
}
static void Go (object data)
{
Console.WriteLine("From the thread pool!" + data);
}
}
Output
From the thread pool!
From the thread pool!123
Using the Thread Pool to Perform an Asynchronous Compute-Bound Operation
A compute bound operation requires computation, or calculations. Calculating cells in an Excel spreadsheet, grammar-checking in a Word document, are examples of compute-bound operations. They should not perform any synchronous I/O operations because all synchronous I/O operations suspend the calling thread while the underlying hardware (disk-drive, network interface card, etc.) performs the work. A suspended thread is a thread that is not running but is using system resources. To queue an asynchronous compute-bound operation to the thread pool, you typically call one of these methods of the ThreadPool
class:
-
static Boolean QueueUserWorkItem(WaitCallback callBack);
-
static Boolean QueueUserWorkItem(WaitCallback, callBack, Object, state);
-
static Boolean UnSafeQueueUserWorkItem(WaitCallback callBack, Object, state);
These methods queue a “work item” to the thread pool’s queue, and then all of these methods return immediately. A work item is a method identified by the callBack parameter that will be called by the thread pool. Consider the following code example:
using System;
using System.Threading;
public static class Program {
public static void Main() {
Console.WriteLine("Main thread: queuing an asynchronous operation");
ThreadPool.QueueUserWorkItem(ComputeBoundOp, 5);
Console.WriteLine("Main thread: Doing other work here...");
Thread.Sleep(10000);
Console.ReadLine();
}
private static void ComputeBoundOp(Object state) {
Console.WriteLine("In ComputeBoundOp: state={0}", state);
Thread.Sleep(1000);
}
}
Understanding Asynchronous Programming
Asynchronous programming is basically allowing portions of code to be executed on separate threads. This is referred to as the Asynchronous Programming Model. To start an asynchronous operation, you call some BeginXxx method. All of these methods queue the desired operation and return an IAsyncResult
object by identifying the pending operation. To get the result of the operation, you call the corresponding EndXxx method passing it to the IAsyncResult
object. For example, the FileStream
class of the System.IO
namespace has a Read
method that reads data from a stream. To support the APM model, it also supports BeginRead
and EndRead
methods. The pattern of using BeginReadXxx and EndReadXxx methods allows you to execute methods asynchronously. Consider the code below:
using System;
using System.Threading;
using System.IO;
class Test {
static void Main() {
byte[] buffer = new byte[100];
string filename = string.Concat(Environment.SystemDirectory, "\\kernel32.dll");
FileStream fs = new FileStream(filename,
FileMode.Open, FileAccess.Read, FileShare.Read, 1024, FileOptions.Asynchronous);
IAsyncResult result = fs.BeginRead(buffer, 0, buffer.Length, null, null);
int numBytes = fs.EndRead(result);
fs.Close();
Console.WriteLine("Read {0} Bytes", numBytes);
Console.WriteLine(BitConverter.ToString(buffer));
}
}
Output
Read 100 Bytes
4D-5A-90-00-03-00-00-00-04-00-00-00-FF-FF-00-00-B8-00-00-00-00-00-00-00-40-00-00
-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-0
0-00-00-00-00-00-00-F0-00-00-00-0E-1F-BA-0E-00-B4-09-CD-21-B8-01-4C-CD-21-54-68-
69-73-20-70-72-6F-67-72-61-6D-20-63-61-6E-6E-6F-74-20-62-65
Rendezvous Models: The Wait Until Done Technique
The code below would not have used the APM efficiently if we hadn't put some code between the calls BeginRead
and EndRead
. Some of APM’s value is seen because this other code would execute as bytes are being read from the file stream. The APM is not used efficiently when you call BeginXxx to then immediately call an EndXxx method, because the calling thread just goes to sleep waiting for the operation to complete.
using System;
using System.IO;
using System.Threading;
public static class Program {
public static void Main() {
FileStream fs = new FileStream(@"C:\windows\system32\autoexec.NT", FileMode.Open,
FileAccess.Read, FileShare.Read, 1024,
FileOptions.Asynchronous);
Byte[] data = new Byte[100];
IAsyncResult ar = fs.BeginRead(data, 0, data.Length, null, null);
Int32 bytesRead = fs.EndRead(ar);
fs.Close();
Console.WriteLine("Number of bytes read={0}", bytesRead);
Console.WriteLine(BitConverter.ToString(data, 0, bytesRead));
}
private static void ReadMultipleFiles(params String[] pathnames) {
AsyncStreamRead[] asrs = new AsyncStreamRead[pathnames.Length];
for (Int32 n = 0; n < pathnames.Length; n++) {
Stream stream = new FileStream(pathnames[n], FileMode.Open,
FileAccess.Read, FileShare.Read, 1024,
FileOptions.Asynchronous);
asrs[n] = new AsyncStreamRead(stream, 100);
}
for (Int32 n = 0; n < asrs.Length; n++) {
Byte[] bytesRead = asrs[n].EndRead();
Console.WriteLine("Number of bytes read={0}", bytesRead.Length);
Console.WriteLine(BitConverter.ToString(bytesRead));
}
}
private sealed class AsyncStreamRead {
private Stream m_stream;
private IAsyncResult m_ar;
private Byte[] m_data;
public AsyncStreamRead(Stream stream, Int32 numBytes) {
m_stream = stream;
m_data = new Byte[numBytes];
m_ar = stream.BeginRead(m_data, 0, numBytes, null, null);
}
public Byte[] EndRead() {
Int32 numBytesRead = m_stream.EndRead(m_ar);
m_stream.Close();
Array.Resize(ref m_data, numBytesRead);
return m_data;
}
}
}
Output
Number of bytes read=100
40-65-63-68-6F-20-6F-66-66-0D-0A-0D-0A-52-45-4D-20-41-55-54-4F-45-58-45-43-2E-42
-41-54-20-69-73-20-6E-6F-74-20-75-73-65-64-20-74-6F-20-69-6E-69-74-69-61-6C-69-7
A-65-20-74-68-65-20-4D-53-2D-44-4F-53-20-65-6E-76-69-72-6F-6E-6D-65-6E-74-2E-0D-
0A-52-45-4D-20-41-55-54-4F-45-58-45-43-2E-4E-54-20-69-73-20
The APM’s Polling Technique and Callback Technique
The Polling
method is similar to the Wait-Until-Done technique with the exception that the code will poll the IAsyncResult
to see whether it has completed. The following code provides an example:
using System;
using System.IO;
using System.Threading;
class Test {
static void Main() {
byte[] buffer = new byte[100];
string filename = string.Concat(Environment.SystemDirectory, "\\kernel32.dll");
FileStream fs = new FileStream(filename, FileMode.Open,
FileAccess.Read, FileShare.Read, 1024, FileOptions.Asynchronous);
IAsyncResult result = fs.BeginRead(buffer, 0, buffer.Length, null, null);
while (!result.IsCompleted)
{
Thread.Sleep(100);
}
int numBytes = fs.EndRead(result);
fs.Close();
Console.WriteLine("Read {0} Bytes", numBytes);
Console.WriteLine(BitConverter.ToString(buffer));
}
}
By calling the IsCompleted
property on the IAsyncResult
object returned by the BeginRead
, we can continue to do work as necessary until the operation is complete. The callback model requires that we specify a method to callback on and include any state that we need in the callback. Consider the following code:
static byte[] buffer = new byte[100];
static void TestCallbackAPM()
{
string filename = string.Concat(Environment.SystemDirectory, "\\kernel32.dll");
FileStream fs = new FileStream(filename, FileMode.Open,
FileAccess.Read, FileShare.Read, 1024, FileOptions.Asynchronous);
IAsyncResult result = fs.BeginRead(buffer, 0, buffer.Length,
new AsyncCallback(CompleteRead), fs);
}
In this model, we are creating an a new AsyncCallback
delegate, specifying a method to call (on another thread) when the operation is complete. Also, we are specifying some object that we might need as the state of the call. For this example, I sent the stream
object in because I had to call EndRead
and close the stream
.
References and Suggested Reading
- CLR via C#, by Jeffrey Richter
History
- 28th March, 2009: Initial post