Introduction
As a critical design element, multithreading has received a lot of attention and has been continually expanded on in each version of .NET. With continually expanding libraries knowing how to best leverage each tool becomes increasingly challenging. My goal is to create an article series explaining some real-world threading considerations and which objects and strategies I have found best to handle them.
Like many programming tools, as the tool expands it often promotes ease of development at the cost of abstraction from flexibility and customization. For the most part it's an optimal give and take. Typically it stops people from shooting themselves in the foot and promotes best practices underneath the hood. It does mean though you still have to be aware of the old tools if there's something you can't do in the new ones.
So there are reasons to reach back to the .NET 1.1 multithreading tools and everything inbetween. Regardless of what tools you use though, you have to understand the details and what's going on behind the scenes to use them most efficiently.
Part 1 is therefore going to focus on foundational threading issues and workings. I'm also going to introduce some basic constructs to get started multithreading in your applications. Enjoy!
Using the Code
The code is a just solution of some very simple Console Applications that include the snippets discussed.
Thread Slicing
A CPU can only run one thread per core at a time, but you can run dozens of applications at once on PCs with even just one core. The trick is (for the most part) they're not technically running at the same time. The CPU is allocating time and scheduling threads to run which is known as thread slicing (a.k.a. time slicing, thread scheduling) to give it the look and feel of running concurrently. This is also the mechanism that allows multithreading within an application.
Immediately slicing bring up some major concerns of multithreading:
- There are a limited number of threads that can actually run concurrently.
- Creating threads increases thread management costs.
- Threads are scheduled which may or may not run as you expect.
All together this should clearly indicate that multithreading needs to be carefully thought through. In fact this also illustrates that if you don't you could wind up with worse performance (e.g. cost of thread management is greater than benefits of the created threads). So let's address these briefly and talk about ways we can maximize the efficiency of multithreading by thread slicing.
NOTE: I'm going for more of the chalkboard physics approach to simplify the explanations. Inevitably there are other threads running on your computer taking up resources your application isn't controlling.
There are two basic types of threading work:
- Crunching a bunch of data.
- Waiting for data.
Just in case it's not clear by "crunching" data we're talking a lot of calculations and operations that will maximize CPU usage for the duration of it's execution. In this particular case the optimal number of concurrent threads is straightforward - the number of cores on the CPU. As long as each thread is responsible for computing similar amounts of data the time to complete the total operations should be 1 / n where n is the number of cores.
So let's think about it a minute... If each core is at 100% I can't execute anything else. If each thread causes a core to get to 100% then adding more threads will yield a performance benefit similar to 1 / n - C * t where n is the number of cores, C is the cost of managing a thread, and t is the total number of concurrent threads. This is why you want to limit concurrent execution for this type of work to the number of cores on the CPU.
Threading that is waiting for data gets more complex because optimizations are going to depend on what you're waiting on. Let's say for example you're waiting on data from a service call going over a network. Since you can have multiple, concurrent network connections it can be very valuable to initiate the call; however, network communications will have some sort of bottleneck so don't go over whatever that threshold is.
Say for example you have threads making database calls and the database connections are pooled with a maximum of 100 pooled connections... Don't go over 100 concurrent threads... If the database calls are making blocking calls you probably will want to severely limit the concurrent requests to avoid excessive blocking that could be hogging the pooled connections or causing timeouts. Again, it's all about situational limitations and thresholds.
The last piece of basic thread slicing considerations to discuss is execution order. Without thread control mechanisms thread execution order is essentially random. Here's a small code snippet to demonstrate this using the Task Parallel Library (TPL):
Parallel.For(0, 10, (i) =>
{
Console.WriteLine("My i value is " + i);
});
Console.WriteLine("Press any key to continue...");
Console.ReadKey();
Again the execution order will be random, but here's an example of the output it produced:
My i value is 9
My i value is 6
My i value is 8
My i value is 7
My i value is 5
My i value is 3
My i value is 2
My i value is 4
My i value is 0
My i value is 1
Press any key to continue...
If we're going to be creating threads where execution order is important proper controls must be put in place. There's a whole lot of objects and strategies we can use to do control threads, but I'll introduce them in future articles.
Differences in code outcomes from thread execution order are called race conditions. Although it's not a given, the vast majority of race conditions will cause bugs. Bugs caused by race conditions can be some of the hardest to isolate and identify so extreme care must be taken upfront when creating threads with execution order dependencies.
I know it's a really basic introduction to thread slicing, but hopefully it does highlight the need for thoughtful thread creation and limits. Real-world environments are going to demand a lot of situational adjustments to get better optimizations and proper thread controls.
Critical Sections
There are plenty of ways thread slicing can get complex, but for the majority of people implementing basic multithreading capabilities it's critical sections that are going to introduce the most heartache. Critical sections are sections of code that should not be accessed by multiple threads simultaneously. The key phrase here being "should not" because there's nothing that inherently stops code from accessing critical sections and nothing that even identifies critical sections!
The basic mechanisms of a critical section are blocking and signaling which are both relatively simple. Blocking is stopping a thread from entering a section of code and signaling is allowing it to, or notifying it can, enter a section of code. In other words stop and go...
What's not so simple is that the safer your critical section is (more code blocked, more blocking in general) the worse performing your code becomes. In fact, since blocking reduces code execution to one thread so it can take something multithreaded and effectively turn it into a single thread. That makes reducing the cost of controlling the critical sections a very important aspect of multithreading.
NOTE: Although there are numerous objects to help control access to critical sections I'm only going to introduce the most basic construct, the lock
, for the sake of explaining critical section basics.
The lock
keyword is normally utilized in a SyncRoot pattern such as below:
private object _syncRoot = new object();
public int Count { get; set; }
public void ThreadingMethod()
{
int localCount = 0;
lock(_syncRoot)
{
localCount = ++Count;
Console.WriteLine("Count is now " + Count);
}
Console.WriteLine("Completing ThreadingMethod " + localCount + " execution.");
}
STOP!!! Okay read on but don't graze over this... There are intricacies to this simple example that are important to be thoroughly covered.
Identify the critical section, not just non-thread-safe operations.
An operation that can be executed by multiple threads concurrently is called a thread-safe operation. A thread-safe operation must be an atomic operation or have thread control mechanisms in place. Any non-thread-safe operation called on shared resources by multiple threads concurrently must be included in a critical section or you risk threading bugs.
In the first critical section example there's actually only one non-thread-safe operation which is:
localCount = ++Count;
But it's one line code, how is this not thread-safe? Any single operation in .NET is compiled into MSIL which could produce a multitude of intermediate operations. Any single operation in MSIL is then converted in native code which could produce a multitude of operations. These are all interruptable by thread slicing.
What that can result in, in this example, is the increment operation interrupted mid execution leading to overwriting Count
with lower values. Threaded incrementing is the classic example to display the behavior. To demonstrate this just run the code below:
static void Main(string[] args)
{
Parallel.For(0, 10000, (i) =>
{
++Count;
});
Console.WriteLine("Count is " + Count);
Console.WriteLine("Press any key to continue...");
Console.ReadKey();
}
You'll probably get something between 9,000 and 10,000. Here's what I got one run:
Count is 9243
Press any key to continue...
Okay great this is non-thread-safe, but if this is the only non-thread-safe operation why do I have the first Console.WriteLine
in the critical section? Because logically the value output to the Console
should be the incremented value the current thread is working with. Without it included other threads can increment the value again before the Console.WriteLine
is called.
Below is an example that properly controls access to Count
during the increment, but instead of using the localCount value it's accessing Count
again in the second Console.Writeline
.
public void ThreadingMethod()
{
lock(_syncRoot)
{
++Count;
Console.WriteLine("Count is now " + Count);
}
System.Threading.Thread.Sleep(TimeSpan.FromMilliseconds(10));
Console.WriteLine("Completing ThreadingMethod " + Count + " execution.");
}
In this particular example Count
will be incremented as expected (i.e. if I call this 100 times Count
will always be 100); however, you're going to get duplicated and missing values in the second Console.WriteLine
such as:
Completing ThreadingMethod 94 execution.
Completing ThreadingMethod 99 execution.
Completing ThreadingMethod 94 execution.
Completing ThreadingMethod 95 execution.
Completing ThreadingMethod 100 execution.
Completing ThreadingMethod 100 execution.
Completing ThreadingMethod 100 execution.
Completing ThreadingMethod 100 execution.
Completing ThreadingMethod 100 execution.
Lock as little as possible.
In this simplistic example the "work" is writing to the Console
which is hardly a costly operation; however, in real-world situations this is likely to be very expensive operations which is why we wanted to thread it in the first place. If all of my work is put inside the lock
then threading is pointless because it effectively reduces it to single thread execution.
Below is an example demonstrating locking more than what is needed:
public void ThreadingMethod()
{
lock(_syncRoot)
{
++Count;
Console.WriteLine("Count is now " + Count);
Console.WriteLine("Completing ThreadingMethod " + Count + " execution.");
}
}
If you run this example you'll always get a sequential, ascending order output from 1 to Count
. Basically that indicates that other threads can't do work at the same time. Even if it wasn't all of my work and there was some concurrent work available, any needless work within the lock
reduces the amount of work I can run concurrently decreasing performance.
One trick to reduce what needs to be locked is to dump shared resources to local variables. In the first example that's what localCount
is for. After my shared resource Count
is modified in a thread-safe manner I can use localCount
anywhere I want without the risk of threading issues.
Word of caution here though, this example works in part because int
is a value type. If you want to copy shared reference variables in threaded tasks to local variables you may need to clone the object. Otherwise the variable could have unexpected changes from other threads.
The Parallel Class
The Parallel
class is part of the TPL and can be found in the System.Threading.Tasks
namespace.
I briefly introduced the TPL during the Thread Slicing section, but I'm going to cover the Parallel
class in more detail for the opening thread creation strategy. There's several other very useful ways to create threads and Parallel
isn't a solve all, but it's easy to understand, easy to use, and feature rich.
Very useful scenarios:
- Loading paged data concurrently with Parallel.For.
- Loading/initializing objects in a collection with Parallel.ForEach.
Here's my most common usage:
Parallel.ForEach(IEnumerable<T> sources, ParallelOptions options, Action<T, ParallelLoopState> body)
This code is going to create a Task
for each item in source that executes the Action
body. The ParallelLoopState
allows you to cease execution which is beneficial in handling special conditions and exceptions. ParallelOptions
allow some control of Task
execution.
Now a very important thing to understand about a Task
is that it is managed so it's not necessarily a 1:1 relationship with threads. It's possible, for example, that all Task
objects created will run synchronously. I'll discuss the Task
class in more detail in future articles, but for the good and bad of it just know that the TPL is taking care of those details for you.
ParallelOptions
only has three properties, but usually I'm only using MaxDegreeOfParallelism
. You can use this in conjunction with the Environment.ProcessorCount
property to limit your concurrent execution to the number of cores.
new ParallelOptions() { MaxDegreeOfParallelism = Environment.ProcessorCount }
Again, your MaxDegreeOfParallelism
should be tailored to the specific limitations and thresholds of the code it's going to execute as discussed in the Thread Slicing section. Whatever you set it to, being able to set it as a parameter and have that controlled out-of-the-box is incredibly convenient.
The ParallelLoopState
has two methods to cease execution which are Break()
and Stop()
. The difference between the two is that Stop()
will try to cease execution on all iterations where as Break()
will only cease execution on future iterations.
NOTE: In my experiences stopping or breaking a parallel Task
seldom has an immediate response. Think of it like asking a raging lunatic to stop a rant - usually not going to happen until it's over anyway. If you really need a thread to stop immediately it requires more complex thread controls.
When you're creating threads, to include with Parallel
, it's very import to handle exceptions. Unhandled exceptions on background threads can do anything from crashing your application or just silently dying. Both of those are really bad.
As a general rule anything that is multithreaded should be wrapping in a try
catch
block; however, I'm not suggesting to swallow all exceptions. Use proper exception handling practices. Handle them where you can, throw them where you should, etc.
If you do need to re-throw exceptions inside the Action
body it is going to toss another AggregateException
on the loop. You should of course handle the AggregateException
properly too.
Next Up TBD
This article is just scratching the surface as there are easily 100+ objects to discuss and many different strategies to use them. Consequently I'm a bit indecisive about what the next article is going to cover. Leave me some feedback if there's anything you're particularly interested in hearing about next.
History
- 2015-09-17 Initial article.