The main problem in parallelizing an algorithm is identifying pieces of code that are 100% independent of the rest of the code, in the sense that the order of processing each of the pieces does not matter. For Gaussian elimination in its original form, this is not easily possible: every step builds on another.
Yes you can run the body of the innermost loops in parallel, but these only consist of a single statement, and creating a thread takes way more resources and processing time - so what would be the point?
Which raises the question: what is the goal that you want to achieve by parallelizing GE?
P.S.: I doubted that its' possible to parallelize GE in a meaningful way, but I knew that optimized libraries can do this kind of stuff very efficiently, so I did a search, and found this article:
Parallel Gaussian Elimination using MPI (pdf)[
^]