Introduction
This is a quick tour of the new language support for multi-threaded programming coming in .NET 4.5. These examples were programmed using the Visual Studio 11
developer preview released back in September.
Multi-Threaded Evolution
Like Java, C# has had threading support since the beginning. In the early days, this meant having a set of synchronization classes in the CLR, along with the lock
statement.
While this was better than having to call a system library, the difficulty of writing threaded code was still too high for most developers and projects.
But multi-threaded programming is increasingly important because multiple parallel cores, rather than faster clock speeds, is how computers are evolving into the future.
As a result, Microsoft has continually enhanced threading support in .NET in almost every release. .NET 4 introduced the Task Parallel Library (TPL),
a major conceptual step forward. .NET 4.5 builds on this by integrating tasks directly into the language.
Tasks not Threads
This latest approach to threading is to not think about or deal with threads at all! Instead, you program with tasks, where a task is simply a value (data)
that will be available at a later time (technically, a “future”). How does this help?
The idea is to make parallel code look and act, as much as possible, like sequential code. Consider this function, which calls two other long running functions
before displaying the result:
..
Function();
..
public void Function()
{
string s1 = GetExpensiveString();
string s2 = GetAnotherExpensiveString();
Console.WriteLine(s1 + s2);
}
The problem with this code is that GetAnotherExpensiveString
can’t begin until GetExpensiveString
completes. Also, the calling
code (and the rest of the program) is stopped until both long-running functions return.
For the purposes of this simple example, the 'expensive' methods just mark time:
private static string GetExpensiveString()
{
for (int i=0; i < 5; i++)
Thread.Sleep(1000);
return DateTime.Now.ToLongTimeString();
}
In .NET 4.5, this code becomes:
..
FunctionAsync();
..
public async Task FunctionAsync()
{
string s1 = await GetExpensiveStringAsync();
.. control yields; s1 assigned sometime later
string s2 = await GetAnotherExpensiveStringAsync();
.. more time passes
Console.WriteLine(s1 + s2);
}
private static Task<string> GetExpensiveStringAsync()
{
return Task<string>.Factory.StartNew( () => GetExpensiveString() );
}
We’ve introduced two keywords, and one naming convention, to indicate the places where control can switch to another thread and then revert back.
Let’s look at the time-consuming methods first. They have been changed to return Task<string>
instead of string
, and by convention,
their names are now suffixed with ‘Async’. So instead of returning the string
we want, they are returning a Task
object, a kind of proxy or promise
that says “I’ll give the string to you later.” Since my long running functions don't access any shared variables, there are no locking issues, so the asynchronous versions
just call the synchronous functions inside a task.
But I’m still assigning the return values to string variables. This works because of the await
keyword in front of each call. This essentially says
“whenever this task completes, come back here, assign its value, and continue on.”
The other change is to the function signature: we’ve decorated the function with the async
keyword, changed the return value to Task
,
and appended the “Async” suffix, again by convention. These changes tell the compiler and other programmers that this function contains asynchronous control flow.
Asynchronous Control Flow
Our function looks pretty similar (that’s the idea) but behaves quite differently. When you called Function
, you sat for a while and it didn’t return
until the line was written to the console. In FunctionAsync
, the call to GetExpensiveStringAsync
starts a task on another thread and then
immediately returns, where control flow on this thread continues.
Sometime later, the task completes, and the CLR resumes execution back in FunctionAsync
. In this case, the process repeats: the CLR starts another
background task, and still later returns to finally execute the console write.
Note that unlike with Function
, the caller to FunctionAsync
continues executing after the quick return. You can now do other work
in parallel with FunctionAsync
, and sync up with him if needed. Here is the revised calling code:
Task task = FunctionAsync();
task.Wait();
So what’s going on to make all of this work?
Under the covers, C# has turned our simple function into a state machine, and created hidden classes that essentially save the call stack on the heap.
Each await
expression marks a place where the function can switch threads or “pause”, and then resume later on.
This is all conceptually simple but quite complex in the details. This is especially true with resuming complex flow of control (think nested ifs and loops)
and propagating exceptions out. There are similarities to how the yield
statement works in iterators, and this also leverages the capture of local
variables in closures first introduced with lambda expressions.
Even More Asynchronous
We’ve achieved some parallel execution here, but we can do better. We’ve moved GetExpensiveString
to the background so that other processing
can go on in parallel. But as currently structured, GetAnotherExpensiveString
can’t start executing until after GetExpensiveString
finishes.
This final version fixes that problem:
public async Task FunctionAsync()
{
Task<string> t1 = GetExpensiveStringAsync();
Task<string> t2 = GetAnotherExpensiveStringAsync();
await Task.WhenAll(t1, t2);
string s1 = t1.Result;
string s2 = t1.Result;
Console.WriteLine(s1 + s2);
}
In order to get both time consuming functions to run in parallel, we have to deal more directly with the task objects. Here, we use the task WhenAll
method to wait
until all of the other tasks finish. The .NET 4.5 Task
API adds several new methods that make it very convenient to compose and synchronize tasks.
Summing Up
Don’t be fooled: there is no magic here. Asynchronous programming is still difficult and complex. This simple example program ends up running on four different threads.
It is much less repeatable now as it’s subject to timing variations from run to run, especially on multi-core processors.
Sequential code execution is a fundamental expectation burned into the brain of every programmer. Asynchronous programming will always be difficult
precisely because it violates this basic assumption. It forces you to think differently about your programs. It took me about a day of messing around (struggling)
with these new features before they started to click into place. In that sense, it is probably similar to learning LINQ or lambda expressions. As always, Microsoft’s tool,
sample, and documentation support is there to help get you up to speed.
It’s worth the effort because the benefits are so significant: in this example, each of the expensive functions takes five seconds to execute, and the final parallel
program takes slightly longer than five seconds total, as compared to ten for the original version. Besides the speed up, the real win in a production program
is the ability to keep the UI live and responsive.
The new .NET asynchronous programming model is a big step forward. As with other areas, it allows you to think and program at a higher level while it manages the details.
There’s a lot of sophisticated machinery down in the compiler and run-time to pull this off, and that’s as it should be.
History
- October 27, 2011 - Revised to incorporate feedback.
- October 26, 2011 - Original article.