Introduction
I've been hearing about PLINQ (Parallel LINQ) since the first days of announcing LINQ, the idea of making use of the new functional style programming provided in .NET 3.5 in order to give better performance on Multi Core machines. The idea sounds cool since the first day, and it now comes true in a new name Parallel FX or PFX.
Note: At the time this article was written, Microsoft didn't release any version of PFX. The article just gives you a future vision collected from many sources from inside and outside Microsoft. Once the CTP is released, the article is going to be updated, so keep the link in your bookmarks.
Background
LINQ (Language Integrated Queries) is a new feature in C# 3.0, VB 9 and .NET 3.5 that brings the concept of Queries as a first class citizen in the next versions of .NET programming languages. The idea is to provide a better abstraction in the way programs are handling data, so the compiler and runtime can help in many ways. One of these ways is optimizing for performance.
PLINQ is a way to make your code run on Multicore machines (which are very common now and in the future) without explicitly defining threads, locks, etc.
The programming model provided is quite simple and utilizes the same LINQ model, the new assembly is called System.Concurrency.dll which is the library that contains the new interface called IParallelEnumerable<T>
. It also adds an extension method for all collections and arrays that implement old IEnumerable
. The extension method is called AsParrallel<T>
which converts any collection to a Parallel
enabled collection of type IParallelEnumerable<T>
.
Using the Code
Take a look at the following code:
IEnumerable<int> data = new int[] {1, 2, 3, 4, 5, 6};
var q = data.Where(x => x > 4).OrderBy(x=>x).Select(x => x);
foreach (var i in q) ....
This code is what you already write in C# 3.0. Now if you want to add PLINQ support, all that you have to do is add the AsParallel
function call before using any query operation so the previous code will look like this:
IEnumerable<int /> data = new int[] {1, 2, 3, 4, 5, 6};
var q = data.AsParallel().Where(x => x > 4).OrderBy(x=>x).Select(x => x);
foreach (var i in q) ....
Or you can write it in the LINQ query style:
IEnumerable<int /> data = new int[] {1, 2, 3, 4, 5, 6};
var q = from i in data.AsParallel()
where i > 4
orderby i
select i;
foreach(var i in q) ....
Once you've added the AsParallel
function call, PLINQ will be ready to execute transparently all the OrderBy
, Where
, Select
, GroupBy
... etc. on all the available processors. You don't need to explicitly create threads, locks and manage concurrent execution (unless you are making something big). This doesn't mean that you can make use of the PLINQ power on anything other than Queries. The ParallelEnumerable
class also adds some extra extension methods like the ForAll
method. The ForAll
method is useful if you are applying some kind of operation on all the members of a certain collection, so the ForAll
function will do this operation in parallel for all the members of the collection.
IEnumerable<int /> data = new int[] {1, 2, 3, 4, 5, 6};
data.ForAll(i=>Console.WriteLine(i));
The previous code sample will print out all the members of the array, if you imagined calling a more complex function that does some heavy work on each array member, the ForAll
will give you extra power to do the job faster by making use of the parallel data processing techniques. This is not all the new stuff introduced in the System.Concurrency
library. The new Parallel
class is also a nice addition. It provides some extra general purpose parallel execution, so it is not related to LINQ. The most important part is the Parallel.For
function, which as you expected from the name executes a parallel
loop. Check the following code:
void ParMatrixMult(int size, double[,] m1, double[,] m2, double[,] result)
{
Parallel.For( 0, size, delegate(int i) {
for (int j = 0; j < size; j++) {
result[i, j] = 0;
for (int k = 0; k < size; k++) {
result[i, j] += m1[i, k] * m2[k, j];
}
}
});
}
This is an example I got from here. It illustrates a Matrix Multiplication using Parallel.For
. As you can see, the Parallel.For
method accepts the start index and the length, then a delegate
to execute. There is also the Parallel.Aggregate
function which can be used to aggregate a certain data item over a parallel loop safely. This is all that I could write in one post, however System.Concurrency
contains more cool APIs.
Points of Interest
PLINQ = PFX. PLINQ was just a future vision of what LINQ can bring to software development. PFX is a bigger concept that PLINQ is just a subset of; it comes with lots of general purpose APIs that help in different concurrency problems that you might face.
Resources
Here come the resources for further readings. I originally wrote this article on my blog:
- Optimize Managed Code for Multi-Core Machines
- Running Queries on Multi-Core processors
- Channel9 Video Programming in the Age of Concurrency (Andres Hejlsberg and Joe Duffy)
History
- 24th October, 2007: Initial post