In your case the usual approach is to create one calculation thread for each core, offload IO to separate threads - possibly using separate computers for staging (including storing) of incomming data, and similarly try to offload storing of results to separate computers - often using a message queing solution.
Staging computer 1 ---
\
Staging computer 2 --- \
calculating computer enques results to ---> storage solution
Staging computer 3 --- /
Staging computer 4 --- /
Moving IO out of the way is often the only real solution when you're doing heavy calculations. Database IO is *much* more expensive then enquing results to a message quing solution, especially if you're using something that handles load balancing in an intelligent way.
Regards
Espen Harlinn