Introduction
Perhaps you have some calculation or other operation that you want to run asynchronously, in your C# program, and you
want to use a Task
1. The expected way to do this is to just provide
the task factory with the code you want to run, frequently in the form of a lambda:
<a id="SRC1"> Task t = Task.Factory.StartNew( () => MyLongAsynchronousCalculation() );</a>
That works well if the lambda is pretty much self-contained, or uses resources from the object that's creating
the asynchronous activity.
But perhaps the asynchronous activity is somewhat complicated, and has an ongoing state, and, in other words, is the kind
of thing you'd like to encapsulate in a class. There are two basic Object Oriented ways to approach this: Use subclassing,
or use composition.
Each technique has its pros and cons in general, but in the specific case of using a Task
, composition is generally preferred.
No less an authority than Jon Skeet2
says
<a id="SRC2"> I wouldn't personally extend Task<T>, I'd compose it instead.</a><a href="#FN3"><sup>3</sup></a><a id="SRC3"></a>
But…why? One deficiency of nearly all current object
oriented languages is that they don't have first-class support for delegation; that makes subclassing the preferred approach over
composition for many design problems where behavior from another class is to be used. And, also, the .NET Framework developers know
proper object oriented design4 and they could have made the Task
class sealed, to
prevent subclassing, like many other classes in the Framework—but they didn't.
As soon as you try subclassing Task
you discover the major problem: Task
takes the code it is to run as an
Action
delegate parameter to its constructor and it doesn't provide any setter for that delegate that you can use after
the Task is constructed—even if you're not starting the Task right away. In the constructor you can't make that delegate refer
to the Task
subclass instance you're creating.
<a id="SRC4"> class D : Task {
public D() : base(??) { }
public void Run() { … }
}
…
D d = new D();</a>
What do you put where the question marks are? You can't simply refer to D
's method Run
: It is not
a static member so you need an object reference—and you don't have one. You can't use the this
keyword there: it
isn't allowed by the language. There's no good choice, so you're stuck?
Or are you? That's what this article describes: A way to properly derive from Task
so that it runs a method in an
instance of your subclass. The technique is borrowed from lazy functional languages5
and is called "Tying The Knot."6, 7
Using a value before it is computed: Tying the Knot
In a functional language all values are immutable. If the value has multiple fields they can't be modified.
So how do you construct a data structure that refers to itself? This comes up in (for example) cyclic lists,
or graph structures that aren't DAGs.
Consider, for example, representing rational numbers in the range [0..1) as a linked list of base 10 digits.
The rational \(\frac{1}{8} = 0.125\) is finite, so that's easy. But what about \(\frac{1}{7} = 0.\overline{142857}\),
which is a non-terminating repeating decimal fraction?9
In an imperative language it isn't hard since you can just clobber a field after it is created.
var OneEighth = new SinglyLinkedList<int> { 1, 2, 5 };
Console.WriteLine("1/8 = " + string.Join(",", OneEighth.Take(30).Select(e => e.ToString())));
Console: 1/8 = 1,2,5
var OneSeventh = new SinglyLinkedList<int> { 1, 4, 2, 8, 5, 7 };
OneSeventh.Next.Next.Next.Next.Next.Next = OneSeventh;
Console.WriteLine("1/7 = " + string.Join(",", OneSeventh.Take(30).Select(e => e.ToString())));
Console: 1/7 = 1,4,2,8,5,7,1,4,2,8,5,7,1,4,2,8,5,7,1,4,2,8,5,7,1,4,2,8,5,7
In a lazy functional language (like Haskell10) you must do it differently since lists
are immutable. But the language has a feature called "letrec", for "let recursive" (where "let" is the binding construct in
the language) that allows you refer to a variable before it has been computed as long as you don't use it!
<a id="SRC10"> oneEighth = 1 : 2 : 5
oneSeventh = let x = 1 : 4 : 2 : 8 : 5 : 7 : x
in x</a>
Here, the name x
refers to the list under construction and is also used to construct the tail of the list. It works because
x
refers to a memory location that isn't going to be referenced until some code uses the variable
oneSeventh
and traverses past the 6th element of the list. (Note the difference between a value
and a variable: The variable is the location that can hold a value.
That is tying the knot!
Tying the Knot in C#: closures and variable capture
Given that C# is not a lazy evaluation language12 how is tying the knot to be implemented?
We need both delayed evaluation, and being able to bind a value after a data structure has been built. Two related language
mechanisms will work together. First, to provide delayed evaluation, we'll introduce an extra level of
indirection13 by using a delegate—a pointer-to-method; typically, in C#, this will
be written as a lambda expression. Second, to provide ex post facto binding we'll use the excellent C# implementation
of variable capture—C# almost has true closures14—to bind a value after
a data structure has been built.
The combination works like this:
- Create a variable to provide a binding location for some value, but don't provide the value.
- Create a lambda expression that closes over that variable, and returns the value in the variable. Since creating the lambda
doesn't dereference the variable to get the value, it is fine.
- Create the data structure, passing in the lambda to the creation routine, which will save it somewhere but not invoke it yet.
- Store the data structure into the closed-over-variable.
- Evaluate the lambda, which dereferences the variable to get the data structure, and does something with it (like store it in some
field internal to itself.
To make this concrete, suppose we have a class T
where the constructor takes an Action
, called A
,
which it stores away in a readonly field, so that there is no way to (re-)set A
after the instance of T
is
constructed. And then it has a method T.M
which is called sometime later, after construction is finished, and invokes
the Action
A
.
(And further suppose that we can't change T
to provide a setter for A
or anything else to fix up this situation.)
Now suppose that we subclass T
with a class D
and we want the Action
A
to run a
method on itself, an instance of D
. Normally our Action
A
would look something like this:
class D : T {
public D(Action a) : base(a) { }
…
Action f = () => this.Foo();
…
}
But that won't work because we don't have this
yet, and actually we can't get a reference to our new instance
D
until after its base constructor, a member of T
, runs and the D
constructor returns.
The way to solve this is to use variable capture in creating the lambda expression:
…
D d = null; Action g = () => d.Foo(); d = new D(g); d.M(); …
The real trick is going to be to get an API that can accept an Action
or Func
instead of the
object—an instance of some specific type—it is expecting. But fortunately, in our problem of deriving from
Task
, that problem is solved because Task
takes two parameters: An Action
and
an arbitrary object
—and we'll pass our delaying lambda in as the arbitrary object!
Back to the problem: Subclassing the Task class
Back to the Task
that's left to us: How do we subclass Task
?
As previously described, Task
takes an Action
which is the code to be run when the
Task
is started.15 We want it to run a method on our subclass.
There are only two problems left to be solved: Where does the Action
come from, and how does it
get communicated to the Task
?
To answer the first question: The Action
, as well as the derived instance, will be created in a factory method. The
constructor of our derived class will have protected access so a developer can't create one directly.
To answer the second question: It would be easy enough to have our derived class have a constructor that, in addition to its
other arguments for initializing itself, took the Action
and immediately passed it into its base class:
<a id="SRC15"> protected D(Action a) : base(a) { … }</a>
And then this article would be over. But I prefer to not have to write this code more than once. I would like to provide a generic
abstract class to do all the work. It will derive from Task
and all my various subclasses will derive from it.
As soon as I do this, however, I run into a problem: my factory method will be generic in my most-derived subclass. That's so that,
as a factory, it can return an instance of that subclass (instead of a superclass—its own type—that would need to be
cast to the actual subclass). But if it is generic in the type that it is creating (and returning) then it can only create instances
of types that have a zero-argument constructor (due to the way the generic constraint new()
works). And with that
zero-argument constructor, how can the Action
be passed in?
The answer is somewhat unsatisfactory: It will be passed in via a static field that the constructor can reference. And that's our
last problem to solve: Ensure that the factory method is serialized so that it is safe to set the static field and then immediately
create a new object that refers to that field in its constructor, so that if the factory method is called on two different threads
simultaneously there are no race conditions that would lead to one of the new instances getting the Action
for the
other instance.
Really, this is a complication that is only necessary if you want to have a resusable abstract base class to own all the code
that handles passing the Action
in to the Task
.
Anyway, the code for the abstract generic class DeriveFromTaskBase
is in the zip archive
associated with this article, so I'll only comment on highlights here.
The public API of DeriveFromTaskBase
The public API of DeriveFromTaskBase
consists of the factory method Create
that creates an instance
and starts it, and an abstract method Run
that must be overriden to provide the subclass-specific computation that
is the entire purpose for subclassing Task
.
The Create
factory method takes an optional Action<T>
, called beforeStartInitializer
, which
is run just before the instance is started. Its purpose is to provide a chance to initialize the instance and make up for the fact
that there is only a zero-argument constructor. The Action<T>
you provide will be given the instance itself and
can set properties or run methods on that instance. (Remember that when you create the Action
with a lambda expression
you can capture any values you need at that time.) If you also, or alternatively, have things you can do at construction time (that,
necessarily, can't rely on any outside inputs), you can (optionally) override the method Constructor
and do that
initialization.
public abstract class DeriveFromTaskBase : Task
{
#region Public interface
public static T Create<T>(Action<T> beforeStartInitializer = null)
where T : DeriveFromTaskBase, new()
{
…
}
public virtual void Constructor() { }
public abstract void Run();
#endregion
The construction of the derived instance
The constructor is fairly simple: referring to a static field that holds the indirection Action
it simply
passes that Action
to the base Task
and then calls the (optional) Constructor
method.
private static Action thisDeferred;
protected DeriveFromTaskBase() : base(thisDeferred)
{
Constructor();
}
The factory which "ties the knot" and creates the derived instance
Interestingly, there are two knots to be tied!
The factory method grabs a lock to ensure serialization. Then it provides a location for the
to-be-created instance and provides a location for the true Action
. It ties the first knot by creating the
indirection-Action
that invokes the true Action
by capturing its location.
It then performs new T()
to finally create the derived instance you're really looking
for. And it ties the second knot by creating the true Action
that captures the location of the
new instance. After all that it starts the instance—and the Task
calls its start Action
which calls the true Action
which calls the instance's Run
method, and at last the
Task
is going!
private static object createLock = new object();
private static Action thisDeferred;
private static T Create<T>(Action<T> beforeStartInitializer = null)
where T : DeriveFromTaskBase, new()
{
T t = null;
lock (createLock)
{
Action thisDeferredInner = null;
thisDeferred = () => thisDeferredInner();
t = new T();
thisDeferredInner = () =>
{
if (null != beforeStartInitializer)
beforeStartInitializer(t);
t.Run();
};
}
t.Start();
return t;
}
(By now you shouldn't need comments to understand the above code…but no worries: There are comments in the sources that
are in the zip.)
Article Summary
Is it worth it? Well, that depends. For the particular case in hand, utilizing a Task
to run complicated code
that needs ongoing state, if you were starting from scratch then it would be easiest to write a class that, intead of deriving
from Task
, simply owns a Task
instance. That is, use composition.
But, YNK.16 Now that this class is written for you (and explained!) you may find
yourself needing actual Task
s doing complicated operations for one reason or another, and the keeping track of the
relationship between calculating instance and Task
instance may be annoying.
Then, also, in the general case (moving away from Task
s), knowing how to tie the knot as a technique in
general, and the particular way you specifically do it in C#, with lambdas and variable capture, may prove useful to you (as
well as interesting.) I hope so.
Oh, one more thing. Now that you understand the previous section you should be easily able to understand the diagram at the
top of this article. What's that you say? You can't? Well, it's not your fault. Fact is, it isn't a very good diagram; it's
just the best I could do. Please feel free to provide me with a better diagram (let me know in a comment) and I'll be glad
to put it up to replace the one I came up with and give you the credit for it.
Article Revision History
- 01-FEB-2014: Original article.
Footnotes
1 That is, a System.Threading.Tasks.Task
.
2Jon Skeet's C# In Depth is an excellent book, as is Jon Skeet's blog.
3 See Jon Skeet's answer on SO.
4 They wrote a book on it: Framework Design Guidelines: Conventions, Idioms, and Patterns for Reusable .NET Libraries.
5 E.g., Haskell.
6 A full description of Tying the Knot. The canonical algorithm for tying the knot is repmin, which is a one pass algorithm for building a tree which is the same shape as the given tree except each leaf has the minimum leaf value from the original tree.
7 What's with these footnotes anyway? It's a CodeProject article on the web, not an academic paper!8
8 I dunno, I just thought it would be funny.
9 Here's an exercise for the reader: Given this representation of rationals in the
range [0..1), write the equality function. Ensure that it finds the following two representations of the same number,
\(\frac{1}{5} = 0.2 = 0.1\overline{9}\), to be equal:
10 Ibid.11 4.
11 Oh boy! I have always wanted to use Ibid.!
12 Lazy evaluation (wikipedia)
means that an expression is not evaluated when it is bound to a variable, but only when it is used. Nearly all languages
use strict evaluation where expressions are fully evaluated when they are bound to a variable (or procedure argument).
In fact, this is why some logging libraries sometimes go out of their way to use non-language mechanisms, like preprocessor macros in C/C++,
to provide efficient processing of formatted messages and their arguments: They are trying to improve performance and reduce the overhead
of logging by avoiding the evaluation of message arguments unless the message's log/trace level is high enough that the log message will
actually be written to some sink.
13 The Fundamental Theorem of Software Engineering:
We can solve any problem by introducing an extra level of indirection..
14 There's some question about whether C# closures are "true" closures or not. There
was quite a discussion on Wikipedia, and
also at the programming language blog Lambda The Ultimate, with
respect to the Wikipedia article on closures. Some people will accept
nothing less than closures which "bind return" (that is, allow for call-with-current-continuation). Other people believe that the
closures in C# 3.0 and Javascript are as close to "true" closures as makes no difference.
15 There's a second way to get data into a Task
: Pass an Async State
Object
into it at construction time. But this has the same problem as the start Action
: There are no setters that
you can use to change the Async State Object
after the constructor has run. You could instead provide a custom
class wrapper around the instance reference (and whatever other information you want to pass in). Filled with a null, you would pass it
in at construction time but keep a reference to it. Then you could fill its field with the reference to the new instance as soon as
you had it, then after that, start the instance. But…by the time you did that, you might as well have done it this way.
16 YNK = You Never Know.