Introduction
This article demonstrates the use of a parallel-processing library
by rending a visualization of the Mandelbrot set.
There are numerous methods by which task-parallelism can be implemented
in C#. Most often these form loose
programming patterns that have no strict implementation and are not immediately
reusable. In developing this library, I hope to have built a good reusable
generic system that is reasonably efficient.
I have also demonstrated it's use by building a Mandelbrot-Set Explorer that creates some pretty fractal renderings - quite quickly through the use of multi-threading, despite relatively little effort going into optimizing the algorithm.
Background
Looking back at all the multi-threaded code I have read or
written, there are two main patterns that seem to be followed (these may be
recognized patterns, but if they are I do not know their names)
The first is to write a synchronous code model (ie normal
program flow) that is executed within a managed thread that is separate from
the UI thread. Generally this method is used with very long running serial operations
so that they don’t make your application un-responsive.
The thread executing the code is typically a
manually-started thread specifically created for that operation, but it may also
be one of an array of pre-initialized threads that are consuming work from a
queue.
An example of this is a Tcp server. The blocking method:
AcceptSocket of the TcpListener class passes the newly accepted client to
another thread, so the thread listening for connections via the AcceptClient
method is immediately free to accept the next client:
A simple
multi-threaded socket server implementation (this assumes a TcpListener
(Listener) is initialized and a [void SocketConsumer(object)] method that processes the inbound socket’s
request)
new Thread(()=> {
while (true) {
var socket = Listener.AcceptSocket();
new Thread(SocketConsumer).Start(socket);
}
}).Start();
The code to handle the socket would be in the SocketConsumer
method. A new thread takes several cycles to start, so creating a new thread
on-the-fly like the above method, should only be used where the operation is
long-enough to not care that a handful of milliseconds is taken up with the
creation of the thread (having already created threads that are waiting on a
wait-handle or auto-reset-event for more work to do is one way around the issue,
this is used in the .NET ThreadPool class)
The other significant pattern is the ‘Asynchronous’ code model
using call-back delegates. This I find a lot more difficult to create and debug,
as the program flow is disconnected.
You start an operation and define a callback using a Begin…
method:
Eg:
IAsyncResult rs = Listener.BeginAcceptSocket(callbackCode,
null);
Then a separate method (or anonymous delegate) receives the
callback once the operation is complete:
void callbackCode(IAsyncResult target) {
Socket sock = Listener.EndAcceptSocket(target);
SocketConsumer(sock);
}
There is an IsCompleted property of the IASyncResult
interface that can tell you if the method is complete, this could be polled (at
cost) but the polling cycle a) wastes processor cycles and b) wastes the time
between the method completing, and the next polling interval.
The Begin…. And End…. Methods in the .NET framework submit
the work to the thread-pool, then execute a callback method delegate (void
(IAsyncResult)) once they are complete. An event can be raised by the code in
the callback method to signal that the method is complete.
There are a couple of shortcomings in these techniques. The
first is that when using the Begin and End methods, your code ends up somewhat
like spaghetti… you cannot easily follow program flow and tracking exceptions
is a pain.
The other is that there is no easy, consistent way to submit
a range of tasks to be executed simultaneously, and then suspend flow of your
program until they are all complete.
An example of this may be submitting a number of queries to
bring back data for input into another process. The client machine is doing
very little when a database is executing a query, so even on single processor
machines, the execution of queries gains a lot when run in parallel (assuming
the database server can handle the load) – however you may need all the data to
be back before you can start the next part of the process. Being able to wait
on a dozen operations to complete, then moving on with program flow is an often
required need.
Development
The key missing element to the existing Begin.. End
asynchronous programming model, is the ability to wait for a task, or a
collection of tasks to complete, quickly and easily without having to define
events or call-backs to mess up the code.
This started me down the path of developing something that
you could wrap around a method an make it ‘waitable’ – the “WaitableTask”
class.
The “WaitableTask” would have a “WaitComplete” method that
suspended program flow until it had completed its work load.
This would be achieved by setting a ManualResetEvent in the
callback method passed to the BeginInvoke method of a delegate.
I decided on a Manual reset-event (rather than
AutoResetEvent) as I wanted the event to stay signalled after it had been set.
This way, if the operation had already completed before you started waiting,
the WaitComplete method would run straight through without stopping.
The WaitComplete method would wait for the ManualResetEvent
to be set, and then return control. My first
design used an abstract class with no definition for the delegate to invoke –
and a number of abstract methods. A class could then be derived from this for
each delegate signature that was required.
I was never happy with this level of abstraction though.
After flapping around a bit, I realised I could use the base-class for all
delegates (the Delegate class) – the main problem though, was that this class
has no BeginInvoke method (as the parameters are not defined at this level of
the class-hierarchy)
However, the base Delegate class can be “invoked” by calling
“Invoke” on the MethodInfo returned from its Method property, against its
Target property, supplying an array of objects for the method parameters, and
returning an object.
This invocation works regardless of the signature of the
delegate (typically delegate signatures must match exactly in order to bind,
but in this case, the binding has already been achieved by the concrete
delegate that is being referenced as an abstract Delegate class)
Example below: the InvokeMethod(object[]) method invokes the _method Delegate, passing
in an array of objects as the parameters, and returning an object reference:
protected virtual object InvokeMethod(object[] parameters)
{
return _method.Method.Invoke(_method.Target, parameters);
}
This still does not have a BeginInvoke method. However the
signature of the InvokeMethod is concrete, so I can create a delegate to invoke
it:
public delegate object Invoke(object[] parameters);
Invoke _invokeDelegate = InvokeMethod;
_asyncToken = _invokeDelegate.BeginInvoke(_parameters, Callback, this);
All this easily came together to form the WaitableTask
class. This class defines no generic type parameters for either the arguments
or the return type, however a Create method in a Factory class is able to
define argument type parameters and a concrete delegate for the WaitableTask,
and a derived class is able to supply a return type parameter:
public static WaitableTask Create<TArg>(Proc<TArg> method, TArg argument)
{
return new WaitableTask(method, argument);
}
public static WaitableTask<TResult> Create<TArg1>(Func<TArg1, TResult> method, TArg1 argument)
{
return new WaitableTask<TResult>(method, argument);
}
I created overloads of the Create method for up to three
argument types, with or without a return type. Creating a “WaitableTask” is now
a matter for a single line of code.
In addition to the WaitComplete method, and an IsComplete
property, I defined a “Completed” event and an “Errored” event.
When a delegate is invoked using BeginInvoke, it is
submitted to the ThreadPool. Once complete, the thread-pool executes the
callback delegate to signal this, and the results of the operation become
available through the EndInvoke method of the original delegate. Any exceptions
are also thrown to the EndInvoke method.
The WaitableTask class catches any exceptions thrown by the
EndInvoke method of the _invoke delegate, stores them internally within the
task, and raises the “Errored” event, passing in the exception with its
EventArgs.
The Completed event is still raised when the operation is
complete, regardless of any exceptions thrown.
By using a similar pattern, a collection of WaitableTasks
can be grouped together and waited on as a whole.
The AddTask(WaitableTask<T>) method in the
WaitableTaskList<T> class adds the specified waitable-task into an
internal list, and handles it’s “Completed” event.
By checking if any of the other tasks are still running,
this event handler is able to determine if the last task just finished
processing, and can raise an event of its own.
The WaitableTaskList<T> class defines two events:
TaskComplete is raised for each task that is complete, and ListComplete is
fired only when all the tasks are complete.
By adding another set of Create<> methods to the
Factory classes, it is now a matter of a single line to define a wait-list that
runs the same method with multiple arguments simultaneously, and can be waited
on for completion.
public static WaitableTaskList Create<TArg1>(
Proc<TArg1> taskMethod, params Argument<TArg1>[] args)
{
var taskList = new WaitableTaskList();
foreach (var arg in args)
{
taskList.AddTask(Create<TArg1>(taskMethod, arg.Argument1));
}
return taskList;
}
Parallel Processing
Example
To demonstrate the use of the WaitableTask library, I needed
a parallel programming problem to solve.
My existing implementations that use this library are database
applications that use it to run queries in parallel; however I did not want to
have to ship an example database with the code in order to demonstrate it.
I turned to the list of “Embarrassingly Parallel Problems”
for inspiration… (An “Embarrassingly Parallel Problem” is one that requires
little to no effort to divide into parallel tasks - this is generally because
each element of the problem requires no interaction with the other elements)
I chose the rendering of the Mandelbrot set, as I thought it
would make a cool program, and there was a pseudo-code listing for calculating
the value of each pixel using an “Escape Time” algorithm posted on Wikipedia
that I could use easily enough.
Calculating the value for a pixel in the Mandelbrot set is
completely independent – it requires no results from other operations and is
easy to make “parallel”
I chose to render each “row” of the image using a separate
task/thread. That way I just needed to pass an integer describing the row being
rendered. Also, it meant that each task would be long enough to justify its
passage through the queue -> if I tried to define a task to render each
pixel, then even a small image (320x240) would require 76,800 tasks. The CPU
would spend more time and memory managing the list of tasks, queuing each task
and handling it’s completion than it would actually running the task and any
performance gains would be lost… in fact, it would probably take a lot longer
to render.
The next issue was how to actually turn the integer values
calculated by the escape-time algorithm into colour information and set that on
a bitmap image.
There are two issues here:
Each pixel in a computer image is made of red, green and
blue components that match the corresponding colour filters in the cells in the
human retina, but the output of the escape time algorithm that calculates a
value for each pixel is a single integer.
I needed a method of turning an integer value into colour
information. I first considered generating a formula/algorithm that would turn
an integer of defined range into red, green and blue values, but rejected this
as it would be inflexible, slow and painful to code.
Instead I opted to generate a ‘palette’ of colour
information by creating gradients between a set of nominated colours. Each
possible value of the output of the fractal algorithm would be mapped to a
different element in an array of colours. This offered flexibility as different
‘palettes’ could be used with the same algorithm, and it would be fast.
This is the method that calculates a colour value for a
single pixel using an escape-time fractal algorithm.
public static Color CalculateMandelbrotPoint(double x0, double y0, Palette p)
{
double x = 0;
double y = 0;
int iteration = 0;
while ((((x * x) + (y * y)) < (2 * 2)) && (iteration < p.Colours.Length))
{
double xtemp = (x * x) - (y * y) + x0;
y = 2 * x * y + y0;
x = xtemp;
iteration++;
}
return p.Colours[iteration-1];
}
Now I just needed a way to set this colour information on a
bitmap image.
The System.Drawing.Bitmap object offers a
“SetPixel(x,y,color)” method. However, this method is not thread safe and is
notoriously slow. Thread collisions on the SetPixel method result in strange
error messages and GDI+ faults in the host application.
This is where I remembered that there is a slightly scary,
but much faster method of setting pixel values on a bitmap.
Using the LockBits method of the Bitmap object, you can
obtain a reference to the area of un-managed memory that contains the actual bitmap
data, and marshal that into a managed byte array. You can then change the
values of the byte array, marshal the data back and release the bitmap.
_lockData = _bmp.LockBits(new Rectangle(0, 0, this._bmp.Width, this._bmp.Height), flags, this._bmp.PixelFormat);
_lockState = true;
_byteLen = Math.Abs(_lockData.Stride * _lockData.Height);
_imageBuffer = new byte[_byteLen];
IntPtr start = _lockData.Scan0;
System.Runtime.InteropServices.Marshal.Copy(start, _imageBuffer, 0, _byteLen);
Essentially, Bitmap.SetPixel does this, but for each pixel
you set, it locks the bitmap, marshals out the data, modifies the correct byte
values then returns the data and unlocks the image. You can see here that two
threads calling the same method would have a problem: if the bitmap is already
locked, it is not possible to lock it again, and marshalling back the data
could overwrite the values set by another call that got in a millisecond
earlier.
By essentially batching these operations, not only is the
manipulation of the image data much faster, it also becomes thread-safe.
The operation is thread-safe, as each time you set the
values for a pixel you are just setting a value in a common array using indexes
and values that are bound to the calling thread. Even if the same code is run
by two or more threads simultaneously, the index values are different
per-thread and there is no conflict.
The difficulty here is calculating which values to set: the
data returned is specific to the pixel format of the source image.
For example, if the pixel-format is RGB, 24 bits-per-pixel
then each pixel is represented by three bytes (one for red, one for green, one
for blue) – if the format is 32 bits per pixel ARGB, then each pixel has an
additional byte indicating it’s Alpha (transparency) value (255 = opaque,
0=transparent)
In addition to this, the returned array is one-dimensional,
where the image has two dimensions. Essentially the image is un-wound by row
into one long sequence of bytes. You need to know the pixel format and the
width of the image in order to know where to break the sequence into each row.
The above table shows the index, colour-type and value for a
3x3 grid of white pixels.
The width of the image is 3, the height of the image is 3,
the pixel format is 24bppRGB, and the “stride” of the image is 9.
The “stride” means the width of the image in bytes, not just
pixels. As each pixel is defined by three byte values, the stride for a 3 pixel
wide image is 3*3=9. Looking at the table, we can see that for the middle pixel
(1,1) the indexes are 12,13 and 14.
The formula to calculate the first index position for a
pixel (x,y) is:
Index = (stride * y) + (bytes-per-pixel * x)
For example, the first (blue) index of the middle pixel is: (9
* 1) + (3 * 1) = 12
Or for the pixel at the bottom-right corner (2,2) the index
is: (9*2) + (3 *2) = 18+6 = 24.
Below is the code for the Set Pixel method in the example:
it assumes that the pixel format is 24 bits-per-pixel RGB and that the array
_imageBuffer is filled with the data from the image. It gets the stride width
of the image from the BitmapData returned by the Bitmap.LockBits method.
protected void SetPixel24Bpp(int x, int y, Color color)
{
if (_lockState)
{
int bpp = 3;
int read_from = (x * bpp) + (_lockData.Stride * y);
this._imageBuffer[read_from + 0] = color.B;
this._imageBuffer[read_from + 1] = color.G;
this._imageBuffer[read_from + 2] = color.R;
}
}
Optimization:
A couple of extra processing cycles could be gained by not
recalculating the index value for each pixel. As we are rendering by row, and
the image data is arranged by row in the array (this is because the old CRT
monitors would display an image by horizontal scan-line) we could calculate the
index position at the start of the scan-line, then just add three to it for
each pixel.
Using the Code
To create a waitable-task list, first you need to create an
array or list of Argument objects. Each Argument will create a new task for the
list.
In the fractal rendering example, the method we want to call (CalculateMandelbrotRow) takes
one integer argument, so we need to create Argument<int> objects to add
to the list, and add one for each row in the image:
var args = new List<Argument<int>>();
for (int y = 0; y < _imageSize.Height; y++)
{
args.Add(new Argument<int>(y));
}
Then it is just a matter of passing the method and the
argument list to the static Create<> method in the TaskFactory class.
The argument parameter is a param-array, so we can pass in
individual Argument objects (as many as you want) or an Array of argument
objects. Here I am using the ToArray()
method of the List class to turn the list into an array.
var renderTask = TaskFactory.Create<int>(this.CalculateMandelbrotRow, args.ToArray());
The task-list then needs to be started:
renderTask.BeginAll();
And then we want to wait for it to complete. In this case we
want Application.DoEvents to be called every 100 milliseconds to keep the
calling UI thread “responsive” while the fractal is rendering:
renderTask.WaitComplete(Application.DoEvents);
Some Pretty Fractals Generated by the Example: