Revision
I have spent quite a bit of time reflecting on the feedback in the comments. This has led me to remove the ThreadSafeList
and ThreadSafeDictionary
from the library. I strongly encourage everyone to use the ConcurrentBag
, ConcurrentDictionary
and ConcurrentQueue
collections for your thread-safe needs.
Introduction
If you have never done parallel programming before, you probably are intimidated by it. I was until we reached threading topics in my algorithms class in college. Today, threading is a necessity for performance. This article will talk about threading in general, and then present a small library of essentials to get you started.
Background
Having had a lot of threading experience in college, I was kind of shocked by how many professional devs had never seen the concept nor knew what the lock
statement is for. I've had to mentor several developers in the appropriate usage of threads, collections and critical sections.
In Windows, each CPU core gets its own base thread in the operating system. This thread is a stand-alone context that the CPU core operates under. If a thread that's managed by Windows needs some resources, the thread scheduler in Windows will thunk an existing thread by taking a complete snapshot of all of the registers of the CPU and save them off. The incoming thread is then restored to the registers, and the code picks up where it left off. This cycle happens continuously and frequently.
The obvious question is what to do when two or more threads need to use the same memory or I/O. If thread A sets variable X = X++
and thread B sets X = X++
, then thread A would expect X
to be X + 1
, but it may be X + 2
if thread B
happened to take over before X
was reassigned in thread A
. If A
needed to make a decision based on X+1
, then A
will make an incorrect decision, essentially duplicating the decision in thread B
.
This is called Last-in-wins and is the most common result of threading. To avoid this, we use a lock
which prevents thread B
from accessing X
while thread A
is accessing X
.
public class Example1
{
private static object _padLock = new object();
private int Item { get; set; } = 0;
public int GetResult(int x) { return x++ };
public Example1()
{
var list = new List<int> { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
var plist = list.AsParallel();
Parallel.ForEach(item => {
lock(_padLock)
{
Item = GetResult(Item + 1);
}
});
}
}
Locks are not only used to prevent individual items from being miscalculated, but to also prevent collections such as List<T>
and Dictionary<K, V>
from having their contents scrambled as well.
SimpleThreading
eliminates the need to remember to manually block your critical sections by making collections thread-safe.
Notice that thread-safe collections only protected against items being updated simultaneously. If there is logic to be done surrounding a critical section of code concerning the collection, you must still place a lock around the critical code!
Using the Code
SimpleThreading
presents three classes:
ThreadSafeList<T>
ThreadSafeDictionary<K, V>
ThreadBlock
Using ThreadSafeList and ThreadSafeDictionary are as simple as using the same use cases you would use in a single-threaded environment. The thread-safe collections simply prevent two or more threads from updating a value in the collection (or the collection itself) simultaneously.
ThreadBlock
follows the same basic pattern as Microsoft's ActionBlock
, but also allows the lambda passed to each task to return a value and gives you a mechanism to retrieve those values after the block is completed. ThreadBlock
also allows you to specify a warmup Action
to be called on each item.
var block = new ThreadBlock<string, int>(s =>
{
var result = 0;
return int.TryParse(s, out result) ? result : 0;
});
block.AddRange(new List<string>{ "1", "2", "3", "four", "5", "six", "7", "8", "nine", "10"});
block.LockList();
block.Execute(5, null, tasks =>
{
Assert.AreEqual(10, block.Results.Count);
Parallel.ForEach(block.Results, pair =>
{
Debug.WriteLine($"{pair.Key} - {pair.Value}");
$"{pair.Key} - {pair.Value}".ToDebug();
});
});
Here's what's happening:
- The
ThreadBlock
is instantiated, passing in a single variable (you could use a Tuple
here to pass in more), and returning a single value (again, you could use a Tuple
to pass back more). The code of the anonymous delegate simply attempts to parse the string s
and return it as an int
, if it's valid.
- The
ThreadBlock
is populated with the values that will be sent to the anonymous delegate via the s
parameter.
- The
ThreadBlock
is locked. If you fail to do this, you will get an exception when you attempt to Execute the ThreadBlock
. Locking the ThreadBlock
prevents further changes to the collection of values.
- The
ThreadBlock
is executed, given both a MaxDepthOfParallelism
(5 in this case) and the anonymous delegate used to process the results when all threads have returned.
- The anonymous delegate writes out the values returned by the individual threads executed in the
ThreadBlock
.
The MaxDepthOfParallelism
is very important. This should be set to a number that the computer can comfortably handle without becoming CPU bound. If your work load has lots of I/O, you can make this large, if it's all processing, target one or two threads per processing core to prevent too much thread thunking.
Points of Interest
This library is intended for programmers of any level. Even as a seasoned vet at parallel programming, I had to put a LOT of thought into creating this deceptively simple library. The results speak for themselves.
History
- Version 1.0.0.0 - Initial version
- Version 1.0.1.0 - Updated to include the
AddOrUpdate
and GetOrAdd
metaphors.
- Version 2.0.0.0 - Removed the
ThreadSafeList
and ThreadSafeDictionary
. Converted ThreadBlock
to use ConcurrentDictionary
and ConcurrentBag
. Added warmup Action
for each item.
Where to Get It
SimpleThreading
is available on Nuget via the Visual Studio package manager (install-package GPS.SimpleThreading
) or by downloading the package from GitHub.