If your calculation takes several hours - take a serious look at
CUDA and CUBLAS[
^] - for a typical number crunching application - an iterative solver - it ought to do wonders.
Here is some info on what you may expect to gain
http://www.tomshardware.com/reviews/nvidia-cuda-gpgpu,2299.html[
^]
Obviously, moving your number crunching to another thread, as SAKryukov suggests, is a good thing to do, and even splitting it, if possible, among multiple cores using multiple threads. This may require a significant amount of redesign - or if your lucky - simply deviding the iterations of the outer loop between the threads.
You may find that
ACE[
^] has the C++ classes required to facilitate this move in a very elegant manner.
Best regards
Espen Harlinn