Parallel Programming & the Mandelbrot Set

Simon Bridge

4.25/5 (8 votes)

1 Jun 2012CPOL14 min read

47.4K

872

Provides a multi-threading library and demonstrates its use by rendering the Mandelbrot Set

Download WaitableTaskLibrary.zip - 222.2 KB

Introduction

This article demonstrates the use of a parallel-processing library
by rending a visualization of the Mandelbrot set.

There are numerous methods by which task-parallelism can be implemented
in C#. Most often these form loose
programming patterns that have no strict implementation and are not immediately
reusable. In developing this library, I hope to have built a good reusable
generic system that is reasonably efficient.

I have also demonstrated it's use by building a Mandelbrot-Set Explorer that creates some pretty fractal renderings - quite quickly through the use of multi-threading, despite relatively little effort going into optimizing the algorithm.

Background

Looking back at all the multi-threaded code I have read or
written, there are two main patterns that seem to be followed (these may be
recognized patterns, but if they are I do not know their names)

The first is to write a synchronous code model (ie normal
program flow) that is executed within a managed thread that is separate from
the UI thread. Generally this method is used with very long running serial operations
so that they don’t make your application un-responsive.

The thread executing the code is typically a
manually-started thread specifically created for that operation, but it may also
be one of an array of pre-initialized threads that are consuming work from a
queue.

An example of this is a Tcp server. The blocking method:
AcceptSocket of the TcpListener class passes the newly accepted client to
another thread, so the thread listening for connections via the AcceptClient
method is immediately free to accept the next client:

A simple
multi-threaded socket server implementation (this assumes a TcpListener
(Listener) is initialized and a [void SocketConsumer(object)] method that processes the inbound socket’s
request)

new Thread(()=> {
 while (true) {
  var socket = Listener.AcceptSocket();
                new Thread(SocketConsumer).Start(socket);
 }
}).Start();

The code to handle the socket would be in the SocketConsumer
method. A new thread takes several cycles to start, so creating a new thread
on-the-fly like the above method, should only be used where the operation is
long-enough to not care that a handful of milliseconds is taken up with the
creation of the thread (having already created threads that are waiting on a
wait-handle or auto-reset-event for more work to do is one way around the issue,
this is used in the .NET ThreadPool class)

The other significant pattern is the ‘Asynchronous’ code model
using call-back delegates. This I find a lot more difficult to create and debug,
as the program flow is disconnected.

You start an operation and define a callback using a Begin…
method:

Eg:

IAsyncResult rs = Listener.BeginAcceptSocket(callbackCode,
null);

Then a separate method (or anonymous delegate) receives the
callback once the operation is complete:

void callbackCode(IAsyncResult target) {
 
 Socket sock = Listener.EndAcceptSocket(target);
 SocketConsumer(sock);
 
}

There is an IsCompleted property of the IASyncResult
interface that can tell you if the method is complete, this could be polled (at
cost) but the polling cycle a) wastes processor cycles and b) wastes the time
between the method completing, and the next polling interval.

The Begin…. And End…. Methods in the .NET framework submit
the work to the thread-pool, then execute a callback method delegate (void
(IAsyncResult)) once they are complete. An event can be raised by the code in
the callback method to signal that the method is complete.

There are a couple of shortcomings in these techniques. The
first is that when using the Begin and End methods, your code ends up somewhat
like spaghetti… you cannot easily follow program flow and tracking exceptions
is a pain.

The other is that there is no easy, consistent way to submit
a range of tasks to be executed simultaneously, and then suspend flow of your
program until they are all complete.

An example of this may be submitting a number of queries to
bring back data for input into another process. The client machine is doing
very little when a database is executing a query, so even on single processor
machines, the execution of queries gains a lot when run in parallel (assuming
the database server can handle the load) – however you may need all the data to
be back before you can start the next part of the process. Being able to wait
on a dozen operations to complete, then moving on with program flow is an often
required need.

Development

The key missing element to the existing Begin.. End
asynchronous programming model, is the ability to wait for a task, or a
collection of tasks to complete, quickly and easily without having to define
events or call-backs to mess up the code.

This started me down the path of developing something that
you could wrap around a method an make it ‘waitable’ – the “WaitableTask”
class.

The “WaitableTask” would have a “WaitComplete” method that
suspended program flow until it had completed its work load.

This would be achieved by setting a ManualResetEvent in the
callback method passed to the BeginInvoke method of a delegate.

I decided on a Manual reset-event (rather than
AutoResetEvent) as I wanted the event to stay signalled after it had been set.
This way, if the operation had already completed before you started waiting,
the WaitComplete method would run straight through without stopping.

The WaitComplete method would wait for the ManualResetEvent
to be set, and then return control. My first
design used an abstract class with no definition for the delegate to invoke –
and a number of abstract methods. A class could then be derived from this for
each delegate signature that was required.

I was never happy with this level of abstraction though.
After flapping around a bit, I realised I could use the base-class for all
delegates (the Delegate class) – the main problem though, was that this class
has no BeginInvoke method (as the parameters are not defined at this level of
the class-hierarchy)

However, the base Delegate class can be “invoked” by calling
“Invoke” on the MethodInfo returned from its Method property, against its
Target property, supplying an array of objects for the method parameters, and
returning an object.

This invocation works regardless of the signature of the
delegate (typically delegate signatures must match exactly in order to bind,
but in this case, the binding has already been achieved by the concrete
delegate that is being referenced as an abstract Delegate class)

Example below: the InvokeMethod(object[]) method invokes the _method Delegate, passing
in an array of objects as the parameters, and returning an object reference:

protected virtual object InvokeMethod(object[] parameters)
{
     // invoke the _method delegate directly and return the result.
     return _method.Method.Invoke(_method.Target, parameters);
}

This still does not have a BeginInvoke method. However the signature of the InvokeMethod is concrete, so I can create a delegate to invoke it:

// concrete delegate matches the parameters of InvokeMethod
public delegate object Invoke(object[] parameters);

// assign the delegate:
Invoke _invokeDelegate = InvokeMethod;

// start invokation of the delegate and record the IAsyncResult.
_asyncToken = _invokeDelegate.BeginInvoke(_parameters, Callback, this);

All this easily came together to form the WaitableTask class. This class defines no generic type parameters for either the arguments or the return type, however a Create method in a Factory class is able to define argument type parameters and a concrete delegate for the WaitableTask, and a derived class is able to supply a return type parameter:

/// <summary>
/// create a simple waitable with one argument and no return type.
/// </summary>
/// <typeparam name="TArg"></typeparam>
/// <param name="method"></param>
/// <param name="argument"></param>
/// <returns></returns>
public static WaitableTask Create<TArg>(Proc<TArg> method, TArg argument)
{
 return new WaitableTask(method, argument);
}

/// <summary>
/// create a waitable with a return type of TResult and 1 argument of type TArg1
/// </summary>
/// <typeparam name="TResult">
/// the return type for the operation
/// </typeparam>
/// <typeparam name="TArg1">
/// the argument type for the operation.
/// </typeparam>
/// <param name="method">
/// a delegate pointing to the method to run.
/// </param>
/// <param name="argument">the argument for the function</param>
/// <returns>an WaitableTask ready to run the operation</returns>
public static WaitableTask<TResult> Create<TArg1>(Func<TArg1, TResult> method, TArg1 argument)
{
    // create and return the delegate:
    return new WaitableTask<TResult>(method, argument);
}

I created overloads of the Create method for up to three argument types, with or without a return type. Creating a “WaitableTask” is now a matter for a single line of code.

In addition to the WaitComplete method, and an IsComplete property, I defined a “Completed” event and an “Errored” event.

When a delegate is invoked using BeginInvoke, it is submitted to the ThreadPool. Once complete, the thread-pool executes the callback delegate to signal this, and the results of the operation become available through the EndInvoke method of the original delegate. Any exceptions are also thrown to the EndInvoke method.

The WaitableTask class catches any exceptions thrown by the EndInvoke method of the _invoke delegate, stores them internally within the task, and raises the “Errored” event, passing in the exception with its EventArgs.

The Completed event is still raised when the operation is complete, regardless of any exceptions thrown.

By using a similar pattern, a collection of WaitableTasks can be grouped together and waited on as a whole.

The AddTask(WaitableTask<T>) method in the WaitableTaskList<T> class adds the specified waitable-task into an internal list, and handles it’s “Completed” event.

By checking if any of the other tasks are still running, this event handler is able to determine if the last task just finished processing, and can raise an event of its own.

The WaitableTaskList<T> class defines two events: TaskComplete is raised for each task that is complete, and ListComplete is fired only when all the tasks are complete.

By adding another set of Create<> methods to the Factory classes, it is now a matter of a single line to define a wait-list that runs the same method with multiple arguments simultaneously, and can be waited on for completion.

public static WaitableTaskList Create<TArg1>(
Proc<TArg1> taskMethod, params Argument<TArg1>[] args)
{
    var taskList = new WaitableTaskList();
    foreach (var arg in args)
    {
        // add a task for each item in the args list.
        taskList.AddTask(Create<TArg1>(taskMethod, arg.Argument1));
    }
    // return the task-list:
    return taskList;
}

Parallel Processing Example

To demonstrate the use of the WaitableTask library, I needed a parallel programming problem to solve.

My existing implementations that use this library are database applications that use it to run queries in parallel; however I did not want to have to ship an example database with the code in order to demonstrate it.

I turned to the list of “Embarrassingly Parallel Problems” for inspiration… (An “Embarrassingly Parallel Problem” is one that requires little to no effort to divide into parallel tasks - this is generally because each element of the problem requires no interaction with the other elements)

I chose the rendering of the Mandelbrot set, as I thought it would make a cool program, and there was a pseudo-code listing for calculating the value of each pixel using an “Escape Time” algorithm posted on Wikipedia that I could use easily enough.

Calculating the value for a pixel in the Mandelbrot set is completely independent – it requires no results from other operations and is easy to make “parallel”

I chose to render each “row” of the image using a separate task/thread. That way I just needed to pass an integer describing the row being rendered. Also, it meant that each task would be long enough to justify its passage through the queue -> if I tried to define a task to render each pixel, then even a small image (320x240) would require 76,800 tasks. The CPU would spend more time and memory managing the list of tasks, queuing each task and handling it’s completion than it would actually running the task and any performance gains would be lost… in fact, it would probably take a lot longer to render.

The next issue was how to actually turn the integer values calculated by the escape-time algorithm into colour information and set that on a bitmap image.

There are two issues here:

Each pixel in a computer image is made of red, green and blue components that match the corresponding colour filters in the cells in the human retina, but the output of the escape time algorithm that calculates a value for each pixel is a single integer.

I needed a method of turning an integer value into colour information. I first considered generating a formula/algorithm that would turn an integer of defined range into red, green and blue values, but rejected this as it would be inflexible, slow and painful to code.

Instead I opted to generate a ‘palette’ of colour information by creating gradients between a set of nominated colours. Each possible value of the output of the fractal algorithm would be mapped to a different element in an array of colours. This offered flexibility as different ‘palettes’ could be used with the same algorithm, and it would be fast.

This is the method that calculates a colour value for a single pixel using an escape-time fractal algorithm.

/// <summary>
/// calculate the colour at the specified point in the mandelbrot-set
/// </summary>
/// <param name="x0">scaled X input (between -2.5 and 1)</param>
/// <param name="y0">scaled Y input (between -1 and 1)</param>
/// <param name="p">palette used to assign a colour to the iteration</param>
/// <returns>colour from the internal pallette</returns>
public static Color CalculateMandelbrotPoint(double x0, double y0, Palette p)
{
    double x = 0;
    double y = 0;
    int iteration = 0;
    while ((((x * x) + (y * y)) < (2 * 2)) && (iteration < p.Colours.Length))
    {
        double xtemp = (x * x) - (y * y) + x0;
        y = 2 * x * y + y0;
        x = xtemp;
        iteration++;
    }

    return p.Colours[iteration-1];
}

Now I just needed a way to set this colour information on a bitmap image.

The System.Drawing.Bitmap object offers a “SetPixel(x,y,color)” method. However, this method is not thread safe and is notoriously slow. Thread collisions on the SetPixel method result in strange error messages and GDI+ faults in the host application.

This is where I remembered that there is a slightly scary, but much faster method of setting pixel values on a bitmap.

Using the LockBits method of the Bitmap object, you can obtain a reference to the area of un-managed memory that contains the actual bitmap data, and marshal that into a managed byte array. You can then change the values of the byte array, marshal the data back and release the bitmap.

_lockData = _bmp.LockBits(new Rectangle(0, 0, this._bmp.Width, this._bmp.Height), flags, this._bmp.PixelFormat);
_lockState = true;
_byteLen = Math.Abs(_lockData.Stride * _lockData.Height);
_imageBuffer = new byte[_byteLen];
IntPtr start = _lockData.Scan0;

// marshal the data into the array.
System.Runtime.InteropServices.Marshal.Copy(start, _imageBuffer, 0, _byteLen);

Essentially, Bitmap.SetPixel does this, but for each pixel you set, it locks the bitmap, marshals out the data, modifies the correct byte values then returns the data and unlocks the image. You can see here that two threads calling the same method would have a problem: if the bitmap is already locked, it is not possible to lock it again, and marshalling back the data could overwrite the values set by another call that got in a millisecond earlier.

By essentially batching these operations, not only is the manipulation of the image data much faster, it also becomes thread-safe.

The operation is thread-safe, as each time you set the values for a pixel you are just setting a value in a common array using indexes and values that are bound to the calling thread. Even if the same code is run by two or more threads simultaneously, the index values are different per-thread and there is no conflict.

The difficulty here is calculating which values to set: the data returned is specific to the pixel format of the source image.

For example, if the pixel-format is RGB, 24 bits-per-pixel then each pixel is represented by three bytes (one for red, one for green, one for blue) – if the format is 32 bits per pixel ARGB, then each pixel has an additional byte indicating it’s Alpha (transparency) value (255 = opaque, 0=transparent)

In addition to this, the returned array is one-dimensional, where the image has two dimensions. Essentially the image is un-wound by row into one long sequence of bytes. You need to know the pixel format and the width of the image in order to know where to break the sequence into each row.

The above table shows the index, colour-type and value for a 3x3 grid of white pixels.

The width of the image is 3, the height of the image is 3, the pixel format is 24bppRGB, and the “stride” of the image is 9.

The “stride” means the width of the image in bytes, not just pixels. As each pixel is defined by three byte values, the stride for a 3 pixel wide image is 3*3=9. Looking at the table, we can see that for the middle pixel (1,1) the indexes are 12,13 and 14.

The formula to calculate the first index position for a pixel (x,y) is:

Index = (stride * y) + (bytes-per-pixel * x)

For example, the first (blue) index of the middle pixel is: (9 * 1) + (3 * 1) = 12

Or for the pixel at the bottom-right corner (2,2) the index is: (9*2) + (3 *2) = 18+6 = 24.

Below is the code for the Set Pixel method in the example: it assumes that the pixel format is 24 bits-per-pixel RGB and that the array _imageBuffer is filled with the data from the image. It gets the stride width of the image from the BitmapData returned by the Bitmap.LockBits method.

protected void SetPixel24Bpp(int x, int y, Color color)
{
    if (_lockState)
    {
        // bytes-per-pixel is three:
        int bpp = 3;

        // first byte address:
        int read_from = (x * bpp) + (_lockData.Stride * y);

        // set the three byte values:
        this._imageBuffer[read_from + 0] = color.B;
        this._imageBuffer[read_from + 1] = color.G;
        this._imageBuffer[read_from + 2] = color.R;
    }
}

Optimization:

A couple of extra processing cycles could be gained by not recalculating the index value for each pixel. As we are rendering by row, and the image data is arranged by row in the array (this is because the old CRT monitors would display an image by horizontal scan-line) we could calculate the index position at the start of the scan-line, then just add three to it for each pixel.

Using the Code

To create a waitable-task list, first you need to create an array or list of Argument objects. Each Argument will create a new task for the list.

In the fractal rendering example, the method we want to call (CalculateMandelbrotRow) takes one integer argument, so we need to create Argument<int> objects to add to the list, and add one for each row in the image:

// create the argument list for the TaskFactory Create method.
var args = new List<Argument<int>>();

// setup an argument for each row of the output image: each row will be a queued task for the thread-pool.
for (int y = 0; y < _imageSize.Height; y++)
{
args.Add(new Argument<int>(y));
}

Then it is just a matter of passing the method and the argument list to the static Create<> method in the TaskFactory class.

The argument parameter is a param-array, so we can pass in individual Argument objects (as many as you want) or an Array of argument objects. Here I am using the ToArray() method of the List class to turn the list into an array.

// create the task list: use the TaskFactory to shortcut and provide concrete types:
var renderTask = TaskFactory.Create<int>(this.CalculateMandelbrotRow, args.ToArray());

The task-list then needs to be started:

renderTask.BeginAll();

And then we want to wait for it to complete. In this case we want Application.DoEvents to be called every 100 milliseconds to keep the calling UI thread “responsive” while the fractal is rendering:

// wait until complete, call Application.DoEvents each 100 ms 
renderTask.WaitComplete(Application.DoEvents);

Some Pretty Fractals Generated by the Example:

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)