Hi Folks, I have a very long list of topics to be covered as a part of article. This time I chose to take up Threading in .net since I popped out this topic from my list using FIFO. You know Threading is a evergreen topic and you might have seen many articles on threading. But I want to make this article very interesting for my buddies. So let’s start our journey. Not all the topics will be covered in this single article. This will be released as chunks starting from basics to .net 4.5 framework threading features.
Introduction
I have come across so many people who knows what threading is but when comes to implementation, they are confused what threading feature to use to accomplish their task. The judgment is always a failure when you do not know what is being provided by .net framework with its Pros & Cons. I was also a victim of this confusion when I started using threads in my development. So the intention of this article is to replace question mark with full stop. First Let us know the basics of threading.
Background
Before we dive in , we need to know what is Process and a thread since process and thread goes hand in hand. Process is an isolation between Applications to secure Application and its data. It is also a physical separation of memory and resources. A thread is an entity which performs all the actions inside a process; it means thread is the one which gives life to a process. As soon as you click on the executable, windows creates a process for the application and creates a Main thread inside the process to perform all the required initializations and further execution. So, we can conclude that every process has at least one thread in it. The threading concept was introduced to make windows responsive. Responsive means user can visualize many applications running parallel in windows.
Behind the scenes:
We know that each application runs in its own process, each process has at least one thread(Main thread). Windows performs switching between the applications and also between threads inside the process to give the user a feeling of parallel execution. This is called as context switching. By this we know that threads are awesome to keep windows responsive at any point of time. But if we look at the negative part of it, Context switching has its own overhead involved which might lead to performance degradation. Threads have memory and time overhead associated with them. Every thread has one of each of the following.
- Thread Kernel Object: The OS allocates these data structure to each of the thread created which contains a set of CPU registers. This also contains threads context and consumes space of 700 bytes in X86 and 1240 bytes in X64 bit machines.
- Thread Environment Block (TEB): This is the block of memory allocated and initialized in user mode . This consumes 4kb in X84 and 8kb in X64. It helps in storing threads local storage data as well as data structures used in GDI and OpenGL graphics.
- User Mode Stack: The user mode stack is used for local variables and arguments passed to methods. It also contains the next statement to be executes when thread returns from method. By default Windows allocates 1MB of memory.
- Kernel Mode Stack: It is used when application code passes arguments to Kernel mode function in the OS. This is used mainly for security reasons i.e. when Windows copies any data passed from User mode stack to Kernel mode. Post this process Windows validates the data and operates on them. It consumes 12kb in X86 and 24kb in X64 bit machines.
Looking at all these overheads we can infer that each thread created has a very big role to be played in terms of time and memory. So please create threads only if it is required. But most of us think creating thread is an advanced way of programming things which is not at all true.
What are CLR threads and Windows Threads?
As of now CLR thread directly maps to Windows thread. So we can say that CLR threads make use of Windows thread internally. You should also not that developers should avoid using windows threads directly since there are many forums which states that Microsoft is enhancing CLR threads by making it light weight, optimal use of resources and so on. If this is the reason then we lose the performance benefits of CLR threads and we might end up with many other problems as well. So it is always advisable to use CLR threads instead of using Windows threads.This introduction and heads up on threads is more than enough to begin with. So lets dive in and start with our age old thread grand mother.
Thread (System.Threading)
So here we will start with the easiest way of creating thread i.e. using Thread class available in System.Threading namespace. Here you create a thread by creating an instance of Thread class and passing the name of a method into its constructor. Of Course it is not mandatory to pass the method name in the constructor and you can explicitly specify the method name to the thread object.
static void Main(string[] args)
{
Thread threadObj = new Thread(Compute);
threadObj.Start();
Console.ReadLine();
}
private static void Compute(object obj)
{
for (int i = 0; i < 1000; i++)
{
Console.WriteLine("Thread executing " + i.ToString());
}
}
This is a typical way of creating thread when you want to execute a compute bound operation. It is highly recommended to avoid this technique and it is always recommended to use CLR thread pool which we will be discussing shortly. You create explicit threads if one of this condition is to be met
- If your thread needs to run in priority other than normal.
- If you need the thread to behave as a foreground thread.
- If you want to start thread and possibly abort it prematurely by calling Threads Abort method.
Note: Creating a Thread object is very lightweight operation. We know when CLR thread is created it is directly mapped to Windows thread but this operating system thread is created only when you invoke threads “Start” method not when you just create a Thread object. Since we came across Foreground thread it is better to have a heads up on What is Foreground and Background thread before we move ahead.
What is Foreground Thread and Background Thread
Foreground thread: Foreground thread doesn’t allow application process to terminate until its job is done. Therefore you should use Foreground thread to execute the tasks that you really want to complete.
Background thread: These threads are similar to Foreground threads but lose its life as soon as you terminate the application. It means the thread dies as soon as you shutdown your application.
Note: When you find that your application is closed and still find its process running, then the culprit is Foreground thread. The solution is never make any thread as foreground thread until you explicitly know what you are doing. By default the thread you create using Thread class is Foreground thread. To make the thread as background thread just set IsBackground Property of Thread Object to true.
You can easily notice the difference when you execute the following snippet. Just comment the line IsBackground=true to know how Foreground thread behaves. As soon as you close the application you launch Task manager and go to Process tab. You find application process still running because, the thread is iterating through For loop where the value we have given is more. When you uncomment the line and perform the same steps, you can find the application process gets terminated as soon as application closes.
static void Main(string[] args)
{
Thread threadObj = new Thread(Compute);
threadObj.IsBackground = true;
threadObj.Start();
Console.ReadLine();
}
private static void Compute(object obj)
{
for (int i = 0; i < 100000; i++)
{
Console.WriteLine("Thread executing " + i.ToString());
}
}
Thread Pool(CLR):
We know creating and destroying thread is the costliest operation in terms of performance. Many threads consumes more memory resources and leads to more operating system context switching. As a solution to this problem CLR has option to manage its own thread pool. Thread Pool is a set of threads that are available for application to use it. Each CLR has its own thread pool and all the App domains within that CLR uses the same thread pool. Process may have more than one CLR loaded into it. So in this scenario each CLR will have separate thread pool. When application wants to perform some action you add the operation to the thread pool queue since thread pool has its own queue maintained where each operation is dispatched from the queue and a thread pool thread is allocated to perform the action. If there are no threads in the thread pool, a new thread is created to perform the action. We know that creating and destroying a thread is the costliest operation but we cannot avoid it. The main idea behind Thread pool is to create the thread when needed and maintain the thread in the pool. So next time if any request arrives , CLR uses the same thread to perform the action which will reduce the overhead of creating the thread and also CLR has no overhead of destroying it. There is some point where thread in the thread pool goes to idle state if there are no any requests and even kills itself there by releasing all the memory resources. The thread pool categorizes its threads as
- Worker threads: These are used when application asks thread pool to perform asynchronous operation such as accessing file system, database, services and so on.
- I/O threads: these are used to notify when asynchronous operations are completed.
Below code demonstrates the usage of Thread Pool to perform some asynchronous operation.
static void Main(string[] args)
{
Console.WriteLine(" Queueing an Operation");
ThreadPool.QueueUserWorkItem(Compute);
Console.ReadLine();
}
private static void Compute(object obj)
{
for (int i = 0; i < 1000; i++)
{
Console.WriteLine("Thread Pool thread executing " + i.ToString());
}
}
Task Parallel Library
Starting with .NET 4.0, the Task Parallel Library is considered by Microsoft to be the preferred way to write parallel code. This library makes it much simpler to create work, to be performed in parallel, over previous threading models. This makes it much easier to take advantage of the multi-core machines that are becoming so common place in the computing environments in which we work and live. The TPL comes with several items to make parallelization easier:
Tasks
Now we know the advantage of using CLR Thread Pool over normal Thread Object. You also noticed creating a thread pool thread is so easy. But Thread pool thread has its own limitations. The biggest problem is that there is no way for you to know when the operation has completed and there is no way to get the return value when operation gets completed. This is where our rescuer Mr. Task comes in to picture. Creating a task and delegating the action to be performed is as simple as creating the Thread Pool thread. We used ThreadPool.QueueUserWorkItem (Operation Name). For creating the task we just create the Task object passing the Method name or Action delegate in its constructor. Below code snippet shows how to create task and execute the operation.
static void Main(string[] args)
{
Task<int> taskObj = new Task<int>(o => Compute((int)o),1000);
taskObj.Start();
int res = taskObj.Result;
Console.WriteLine("Task completed and the result is "+ res.ToString());
Console.ReadLine();
}
private static int Compute(object obj)
{
for (int i = 0; i < (int)obj; i++)
{
Console.WriteLine("Thread executing " + i.ToString());
}
return 1000;
}
</int></int>
Task Continuation:We developers should always ensure that threads should be created in a smart way. Calling Wait or Result property when the task has not yet finished execution might lead to creation of new thread which increases resource usage. There is much better way to start another task when the previous task has completed. We can use the result of the previous task as input for the new task that is going to be started by using ContinueWith method available in Task. This ContinueWith method returns a new task which will be executed as soon as the previous task completes. Task contains a collection of ContinueWith, so you can call ContinueWith several times on the same Task. When the Task completes all the ContinueWith tasks will be queued to the thread pool. You can also specify TaskContinuationOptions which indicate what you want the new task to execute like
- OnlyOnCanceled: This executes only when first task gets cancelled
- OnlyOnFaulted: Executes only when first task throws unhandled exception
- OnlyOnRanToCompleteion: Executes only if first task completes successfully without being cancelled or throwing an unhandled exception.
static void Main(string[] args)
{
Task<int> taskObj = new Task<int>(o => Compute((int)o), 1000);
taskObj.Start();
Task nextTask = taskObj.ContinueWith(task => Console.WriteLine("Task completed and the result is " + taskObj.Result.ToString()),TaskContinuationOptions.OnlyOnRanToCompletion);
Console.ReadLine();
}
private static int Compute(object obj)
{
for (int i = 0; i < (int)obj; i++)
{
Console.WriteLine("Thread executing " + i.ToString());
}
return 1000;
}
</int></int>
Task Schedulers: The task infrastructure is very powerful and flexible. Task schedulers play a big role in this infrastructure. It is the one responsible for executing all the scheduled tasks. There are two tasks scheduler types :
- Thread Pool Task Scheduler :This scheduler schedules tasks to the thread pool Worker threads.
- Synchronization Context Task Scheduler: The synchronization task scheduler is mainly used in Windows Forms, WPF and Silver light which schedules all tasks to the application’s GUI thread so that all the tasks code can successfully update the UI components. This does not use thread pool at all. You can get hold of synchronization context using TaskScheduler.FromCurrentSynchronizationContext(); . You can use this synchronization context while creating the task and the task will execute on the GUI thread. Note: When you need to perform some compute bound operations you can create a task which by default makes use of thread pool thread to perform the action. So here no UI thread gets blocked. Make sure that you should not update the UI elements using the normal task which will throw InvalidOperationException. Once you are done with the operation you can continue with another task which takes in current synchronization context and you can update the GUI components.
Task Factories: Task Factories are used to create a bunch of Task objects that share the same state. To keep you from having to pass the same parameters to each Tasks constructor over and over again, you can create a task factory that encapsulate the common state. You can create task factories with or without return types.
static void Main(string[] args)
{
Task mainTask = new Task(()=> {
var tokenSource = new CancellationTokenSource();
var taskFactory = new TaskFactory<int>(tokenSource.Token, TaskCreationOptions.AttachedToParent, TaskContinuationOptions.ExecuteSynchronously, TaskScheduler.Default);
var childTasks = new Task[]
{
taskFactory.StartNew(()=> Compute(100)),
taskFactory.StartNew(()=> Compute(1000)),
taskFactory.StartNew(()=> Compute(10000))
};
});
mainTask.Start();
Console.ReadLine();
}
private static int Compute(object obj)
{
for (int i = 0; i < (int)obj; i++)
{
Console.WriteLine("Thread executing " + i.ToString());
}
return 1000;
}
</int>
Parallel:
This is a static helper class that makes it easy to perform parallel loops and method invocations without having to create tasks or threads directly. To simply programming this class encapsulates common scenarios while using Task object internally. Of course there is some overhead in using this class but if well used based on the use cases, this provides a flexible way of doing operations concurrently. All of the Parallel’s methods have the calling thread participate in the processing of the work, which is good in terms of resource usage.
Note: Not all work benefits from parallelism. If each work item to be performed takes a relatively insignificant amount of time to execute, parallelism can actually degrade performance. When in doubt, stay serial and only parallelize when a need is identified. That said, if you do find a need to parallelize work, the TPL is a great set of tools for the job and VS2010 has a great concurrency profiler to help identify hot spots.
static void Main(string[] args)
{
List<int> countList = new List<int>() {3,4,8,10,6};
Parallel.For(0, 10, i=> Compute(i));
Parallel.ForEach(countList, i=>Compute(i));
Parallel.Invoke(
() => Compute(10),
() => Compute(10),
() => Compute(10));
Console.WriteLine("Parallelsim completed");
Console.ReadLine();
}
private static int Compute(object obj)
{
for (int i = 0; i < (int)obj; i++)
{
Console.WriteLine("Thread executing " + i.ToString());
}
return 1000;
}
</int></int>
Performance Monitor is the best tool available to measure the performance of the application. You get many options to monitor each processor core in your system. Since we deal with parallelism we need to measure the efficiency based on how each processor core is being utilized to the fullest. The graph shows the two processors (red and green line) available in my system. This was captured when above code was executed making use of Task Parallelism. You can see how these two processors are utilized to the maximum extent.
This is the image captured when Same operation was run using normal threads. You can see how insignificantly the processor cores are used and only one processor core is used to the maximum extent.
Thread concepts which you should be familiar
Understanding Thread Pool:
Thread pool is a set of threads available readily for the application. CLR allows developers to set the maximum number of threads that the thread pool can create. We know that each thread consumes at least 1 MB of memory. Say a system of 32bit OS has 2 GB of usable address space . After Win32 dlls, CLR dlls, the native heap and the managed heap gets loaded, we hardly get 1.5GB of address space. So maximum of around 1300 threads can be created. Above this limit, you get Out ofmemory exception. Considering all conditions the default number of threads in thread pool is set to 1000.
Worker threads: Thread pool comprises of Worker threads and I/O threads. The ThreadPool.QueueUserWorkItem method and Timer class always queue work items to the global queue. Worker threads pull items from this queue using FIFO algorithm and process them. Since all worker threads operates on this global queue , all worker threads contend on a thread synchronization lock to ensure that two or more threads don’t take same work item from the queue.
When a non worker thread schedules a Task , the task is added to global queue. But each worker thread has its own local queue and when worker thread scheduler task, the task is added to calling the threads local queue. When worker thread is ready to process an item it always checks its local queue for a task first. Since only the worker thread is allowed to access its local queue no synchronization lock is required. These tasks are executed using LIFO. If worker threads local queue is empty it tries to steal another worker threads local queue where synchronization lock is required and this happens very rare. If the entire local queue is empty then thread will extract from global queue. If global queue is also empty then it goes to idle state. If it sleeps for a long time then it will wake up and destroy itself allowing system to reclaim the resources.
Cooperative Cancellation
.Net Framework provides a standard and efficient pattern for cancelling operations. This pattern is cooperative meaning that the operation that you wish to cancel has to explicitly support being cancelled. The code snippet shows how cancellation can be performed using Cancellation Token source.You can even register a callback to notify when any cancellation takes place.
static void Main(string[] args)
{
CancellationTokenSource cancelObj = new CancellationTokenSource();
cancelObj.Token.Register(() => CancelNotification());
ThreadPool.QueueUserWorkItem(o => Compute(100000, cancelObj.Token));
Console.WriteLine("Press enter to cancel the Operation");
Console.ReadLine();
cancelObj.Cancel();
Console.ReadLine();
}
private static int Compute(object obj, CancellationToken token)
{
for (int i = 0; i < (int)obj; i++)
{
if (token.IsCancellationRequested)
{
break;
}
Console.WriteLine("Thread executing " + i.ToString());
}
return 1000;
}
private static void CancelNotification()
{
Console.WriteLine("The user operation is canceled");
}
AggregateException: AggregateException, is just a container for one or more exceptions which may be thrown when using PLINQ or TPL. As such exceptions may be thrown on different threads and may also occur concurrently, the system automatically catches and re throws them within an AggregateException wrapper to ensure that they all get reported in one place. The exceptions themselves are exposed via the InnerExceptions property.
Note: I/O Asynchronous operations(Begin, End) will be covered in another article
Points of Interest
As usual it was really interesting writing this article as you get chance to explore more on the topic. There are many things to be covered under this topic but one article is not sufficient to address all the features. I am confident that any developer who reads this article will be confident enough to use threads efficiently.