Introduction
This project uses a computation intensive algorithm (Jacobi Relaxation) to calculate the time taken to complete 50000 iterations. The program creates 1 thread per processor and allocates a section of the calculation per thread. Thus on a single core machine there will be 1 thread, on dual core 2 threads...
After completing 50000 iterations, the program displays the time taken.
Theoretically, the time taken to complete the computation should become half when the number of CPUs doubles... Reaching this optimum result would definitely be the goal of increasing the number of cores.
Background
Jacobi Relaxation is a simple numerical algorithm used to calculate the distribution of a boundary value over an area. For example, if we had a metal sheet, and applied a voltage of 10V to its upper and lower edges, the Jacobi Relaxation Algorithm will calculate the final voltage distribution on the metal plate.
This project creates a separate thread for each processor it detects and distributes the computation to each thread. Thus for a metal sheet of 500 height and 500 width and 2 CPU cores, each thread will compute an area of 250 x 500. If we had 4 cores, each thread will compute an area of 125 x 500.
Thus with higher number of cores, the time taken to compute the algorithm should go down, thus giving a speedup.
Using the Code
The code modules of interest are:
- MainForm.cs
- ThreadClass.cs
ProcessorUtilization
Control (CustomControls folder)
MainForm
In Mainform.cs, on buttonStart_Click()
you will find that we are creating 1 separate thread for each CPU core detected.
for (int counter = 0; counter < System.Environment.ProcessorCount; counter++)
{
ThreadClass threadClass = new ThreadClass(this, counter,
System.Environment.ProcessorCount,
this.bitmap, pictureCanvas.Width,
pictureCanvas.Height, updateMainAt);
System.Threading.Thread newThread;
newThread = new System.Threading.Thread(new ThreadStart(threadClass.ThreadFunction));
newThread.Start();
}
ThreadClass
In the file ThreadClass.cs, you will find the ThreadFunction
which calls ApplyRelaxation()
where the actual computation is carried out.
The most vital speedup factor was achieved by using unsafe code (using pointers). Without using unsafe code, it was noticed that the non threaded version of the code ran at the same speed as the threaded version. After doing some 'time' analysis for the computation (for which the Logger
class -Logger.cs- was used) it was observed that interthread communication was eating a lot of time (The main array holding the data of the calculation is actually in MainForm.cs and thus each new background worker thread has to access it). But by using unsafe code, most probably the .NET interthread overheard is bypassed and thus a good speedup is achieved.
private unsafe void ApplyRelaxation(int startY, int stopY)
{
calls++;
loopCounts = 0;
int start = 1;
fixed (double* relaxationPointer = main.RelaxationArray)
{
for (int y = startY; y < stopY; y++)
{
for (int x = start; x < pictureCanvasWidth - 1; x += 2)
{
loopCounts++;
*((relaxationPointer + (y * pictureCanvasWidth)) + x) =
(*((relaxationPointer + ((y - 1) * pictureCanvasWidth)) + x)
+ *((relaxationPointer + (y * pictureCanvasWidth)) + (x + 1))
+ *((relaxationPointer + (y * pictureCanvasWidth)) + (x - 1))
+ *((relaxationPointer + ((y + 1)*pictureCanvasWidth) + x)))/ 4;
}
}
start = 2;
for (int y = startY; y < stopY; y++)
{
for (int x = start; x < pictureCanvasWidth - 1; x += 2)
{
loopCounts++;
*((relaxationPointer + (y * pictureCanvasWidth)) + x) =
(*((relaxationPointer + ((y - 1) * pictureCanvasWidth)) + x)
+ *((relaxationPointer + (y * pictureCanvasWidth)) + (x + 1))
+ *((relaxationPointer + (y * pictureCanvasWidth)) + (x - 1))
+ *((relaxationPointer + ((y + 1) * pictureCanvasWidth) + x))) / 4;
}
}
}
}
Note that in the above code fragment, we are using the Red Black method of calculating the Jacobi Relaxation (Red-Black optimization)
The most interesting part I found in writing this program is the speedup achieved by using unsafe code.
ProcessorUtilization Control (in CustomControls Folder)
This user control uses System.Diagnostics.PerformanceCounter
to evaluate the CPU usage.
private void timer_Tick(object sender, EventArgs e)
{
performanceCounter1.InstanceName = instance;
Utilization.Value = (int)(performanceCounter1.NextValue());
}
At each timer tick, the user control queries performanceCounter
(performanceCounter1.NextValue
) and updates the ProgressBar
value (Utilization.Value
). ('Utilization' is a Progress Bar embedded into the User Control.)
Points of Interest
The most interesting part (initially annoying) was to figure out why the multithreaded program ran at the same speed as the non-multithreaded program. I had to do several iterations of the program to achieve this final result (which had an acceptable gain).
One particular version of the multithreaded program actually ran slower than the non-multithreaded version. The culprit was found to be that 3 threads (2 worker threads and the main thread) were trying to access the main Picture Canvas using thread synchronization routines. The problem was solved by allowing only the mainForm
to handle the drawing of the Picture Canvas, thus eliminating the synchronization delays.
I guess one thing to learn is that careful programming has to be done to achieve good speedup using Multithreading.