Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / C#

Multicore Speedup Analysis using Jacobi Relaxation Running on Multiple Threads

3.18/5 (8 votes)
10 Jul 2007CPOL3 min read 1   221  
Analyse the speedup achieved by multicore processors by running a computation intensive task, using multithreading to create 1 separate thread per processor.
Screenshot - jacobi.jpg

Introduction

This project uses a computation intensive algorithm (Jacobi Relaxation) to calculate the time taken to complete 50000 iterations. The program creates 1 thread per processor and allocates a section of the calculation per thread. Thus on a single core machine there will be 1 thread, on dual core 2 threads...

After completing 50000 iterations, the program displays the time taken.
Theoretically, the time taken to complete the computation should become half when the number of CPUs doubles... Reaching this optimum result would definitely be the goal of increasing the number of cores.

Background

Jacobi Relaxation is a simple numerical algorithm used to calculate the distribution of a boundary value over an area. For example, if we had a metal sheet, and applied a voltage of 10V to its upper and lower edges, the Jacobi Relaxation Algorithm will calculate the final voltage distribution on the metal plate.

This project creates a separate thread for each processor it detects and distributes the computation to each thread. Thus for a metal sheet of 500 height and 500 width and 2 CPU cores, each thread will compute an area of 250 x 500. If we had 4 cores, each thread will compute an area of 125 x 500.

Thus with higher number of cores, the time taken to compute the algorithm should go down, thus giving a speedup.

Using the Code

The code modules of interest are:

  1. MainForm.cs
  2. ThreadClass.cs
  3. ProcessorUtilization Control (CustomControls folder)

MainForm

In Mainform.cs, on buttonStart_Click() you will find that we are creating 1 separate thread for each CPU core detected.

C#
//start the threads... one per processor

for (int counter = 0; counter < System.Environment.ProcessorCount; counter++)
{
    ThreadClass threadClass = new ThreadClass(this, counter, 
                                      System.Environment.ProcessorCount, 
                                      this.bitmap, pictureCanvas.Width, 
                                      pictureCanvas.Height, updateMainAt);
    System.Threading.Thread newThread;
    newThread = new System.Threading.Thread(new ThreadStart(threadClass.ThreadFunction));
    newThread.Start();
}

ThreadClass

In the file ThreadClass.cs, you will find the ThreadFunction which calls ApplyRelaxation() where the actual computation is carried out.
The most vital speedup factor was achieved by using unsafe code (using pointers). Without using unsafe code, it was noticed that the non threaded version of the code ran at the same speed as the threaded version. After doing some 'time' analysis for the computation (for which the Logger class -Logger.cs- was used) it was observed that interthread communication was eating a lot of time (The main array holding the data of the calculation is actually in MainForm.cs and thus each new background worker thread has to access it). But by using unsafe code, most probably the .NET interthread overheard is bypassed and thus a good speedup is achieved.

C#
private unsafe void ApplyRelaxation(int startY, int stopY)
{
    calls++;
    loopCounts = 0;
    //RED BLACK OPTIMIZATION OF JABOBI RELAXATION

    int start = 1; //RED PHASE
    
    // Use 'fixed' to prevent the garbage collector from moving the 
    // main.RelaxationArray    
    fixed (double* relaxationPointer = main.RelaxationArray)
    {
        for (int y = startY; y < stopY; y++)
        {
            for (int x = start; x < pictureCanvasWidth - 1; x += 2)
            {
                loopCounts++;
                //Use pointers to achieve speedup... 
                *((relaxationPointer + (y * pictureCanvasWidth)) + x) = 
                       (*((relaxationPointer + ((y - 1) * pictureCanvasWidth)) + x) 
                       + *((relaxationPointer + (y * pictureCanvasWidth)) + (x + 1)) 
                       + *((relaxationPointer + (y * pictureCanvasWidth)) + (x - 1)) 
                       + *((relaxationPointer + ((y + 1)*pictureCanvasWidth) + x)))/ 4;
            }
        }

        start = 2; //BLACK PHASE
        for (int y = startY; y < stopY; y++)
        {
            for (int x = start; x < pictureCanvasWidth - 1; x += 2)
            {
                loopCounts++;
                //Use pointers to achieve speedup... 
                *((relaxationPointer + (y * pictureCanvasWidth)) + x) = 
                     (*((relaxationPointer + ((y - 1) * pictureCanvasWidth)) + x) 
                     + *((relaxationPointer + (y * pictureCanvasWidth)) + (x + 1)) 
                     + *((relaxationPointer + (y * pictureCanvasWidth)) + (x - 1)) 
                     + *((relaxationPointer + ((y + 1) * pictureCanvasWidth) + x))) / 4;
            }
        }
    }
}

Note that in the above code fragment, we are using the Red Black method of calculating the Jacobi Relaxation (Red-Black optimization)

The most interesting part I found in writing this program is the speedup achieved by using unsafe code.

ProcessorUtilization Control (in CustomControls Folder)

This user control uses System.Diagnostics.PerformanceCounter to evaluate the CPU usage.

C#
private void timer_Tick(object sender, EventArgs e)
{
    performanceCounter1.InstanceName = instance;
    Utilization.Value = (int)(performanceCounter1.NextValue());
}

At each timer tick, the user control queries performanceCounter (performanceCounter1.NextValue) and updates the ProgressBar value (Utilization.Value). ('Utilization' is a Progress Bar embedded into the User Control.)

Points of Interest

The most interesting part (initially annoying) was to figure out why the multithreaded program ran at the same speed as the non-multithreaded program. I had to do several iterations of the program to achieve this final result (which had an acceptable gain).

One particular version of the multithreaded program actually ran slower than the non-multithreaded version. The culprit was found to be that 3 threads (2 worker threads and the main thread) were trying to access the main Picture Canvas using thread synchronization routines. The problem was solved by allowing only the mainForm to handle the drawing of the Picture Canvas, thus eliminating the synchronization delays.

I guess one thing to learn is that careful programming has to be done to achieve good speedup using Multithreading.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)