Introduction
As you can probably tell from my previous articles, I have a thing for optimization utilizing hardware resources, primarily CPU cores. This article is no exception, and deals with the new TPL libraries. This example uses the CTP Parallel Extensions for .NET 3.5, and is geared towards an intermediate audience.
Background
The goal of this project was to utilize the TPL to speed up calculation performance of a simple report. This example uses a LINQ group by, then performs calculations in parallel using TPL. The code attempts a fairly typical sequential calculation, and reports the time taken; it's immediately followed up by running the same calculations in parallel and displaying the results.
Since this project is geared towards intermediate developers, I will not go into detail on some aspects of this code. The reader should have a decent understanding of LINQ, Predicates, and of course, TPL.
Using the code
The project utilizes an abstract class. I didn't want to rewrite timer and calculation code twice, and perhaps thrice if I were adventurous enough to attempt a third theory.
The method that needs to be overridden is StartCalculations
; the rest of the code handles the timer for the comparison of the various methods if I choose to create more.
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
namespace ParallelClass
{
public abstract class ReportCalculations
{
private readonly List<CompanyInfo> _totals = new List<CompanyInfo>();
private long _elapsedTime;
public long Elapsed_ms
{
get { return _elapsedTime; }
}
public void Begin(IEnumerable<IGrouping<int,
IGrouping<int, CompanyInfo>>> companyGroups)
{
var sw = new Stopwatch();
sw.Start();
StartCalculations(companyGroups);
sw.Stop();
_elapsedTime = sw.ElapsedMilliseconds;
}
public virtual string Name
{
get { return "Generic Report Class"; }
}
public virtual void StartCalculations(IEnumerable<IGrouping<int,
IGrouping<int, CompanyInfo>>> companyGroups)
{
}
}
}
The rest of the code is pretty straightforward. The class accepts an int
that defines the number of transaction lines to be initialized in the database. The database in this example is just a collection of CompanyInfo
classes.
In the abstract class, you'll notice it is passed an interesting IEnumerable
object. This is the result of the nested LINQ query devised in the code below in the GetGrouping
method. The GetGrouping
method groups the objects by CompanyID, then TransactionCode, so it's easier to handle multiple calculations via TPL.
The PopulateCompanyTransactions
method randomly generates all the transactions that will be up for sorting and calculating.
In this example, I have two classes that are derived from our abstract class ReportCalculations
.
They are MySeq
and MyTPL
. MySeq
operates a typical sequential loop that calculates each transaction group on a single thread. The latter, MyTPL
, caclulates the sums utilizing all CPUs present when possible.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading;
namespace ParallelClass
{
public class CompanyInfo
{
public int CompanyId { get; set; }
public int TransactionCode { get; set; }
public decimal Amount { get; set; }
}
public class Process
{
public Process(int recordsToProcess)
{
var rec = PopulateCompanyTransactions(recordsToProcess);
var grouping = GetGrouping(rec);
var calcClasses = new List<ReportCalculations> { new MySeq(), new MyTPL() };
foreach(var calc in calcClasses)
{
calc.Begin(grouping);
Console.WriteLine("{0} : {1}", calc.Name, calc.Elapsed_ms);
}
Console.ReadLine();
}
private static IEnumerable<IGrouping<int, IGrouping<int,
CompanyInfo>>> GetGrouping(IEnumerable<CompanyInfo> companyInfos)
{
var query = from company in companyInfos
group company by company.CompanyId
into companyGroup
from transactionGroup in
(
from company in companyGroup
group company by company.TransactionCode
)
group transactionGroup by companyGroup.Key;
return query;
}
private static List<CompanyInfo> PopulateCompanyTransactions(int totalRecords)
{
var rnd = new Random();
var companyInfo = new List<CompanyInfo>();
for (int count = 0; count < totalRecords; count++)
companyInfo.Add(new CompanyInfo
{
Amount = (decimal) (rnd.Next(-50, 1000)*rnd.NextDouble()),
CompanyId = rnd.Next(0, 100),
TransactionCode = rnd.Next(100, 120)
});
return companyInfo;
}
}
public class MySeq : ReportCalculations
{
private readonly List<CompanyInfo> _totals = new List<CompanyInfo>();
public override string Name { get { return "Sequential"; } }
public override void StartCalculations(IEnumerable<IGrouping<int,
IGrouping<int, CompanyInfo>>> companyGroups)
{
foreach (var firstGroup in companyGroups)
{
foreach (var secondGroup in firstGroup)
{
decimal total = 0;
foreach (var details in secondGroup)
total += details.Amount;
_totals.Add(new CompanyInfo { Amount = total,
CompanyId = firstGroup.Key, TransactionCode = secondGroup.Key });
}
}
}
}
public class MyTPL : ReportCalculations
{
private readonly List<CompanyInfo> _totals = new List<CompanyInfo>();
public override string Name { get { return "TPL"; } }
public override void StartCalculations(IEnumerable<IGrouping<int,
IGrouping<int, CompanyInfo>>> companyGroups)
{
foreach (var firstGroup in companyGroups)
Parallel.ForEach(firstGroup, group => Calculate(group, firstGroup.Key));
}
private void Calculate(IGrouping<int, CompanyInfo> grouping, int companyID)
{
decimal total = 0;
Parallel.ForEach(grouping, g => { total += g.Amount; });
_totals.Add(new CompanyInfo { Amount = total,
CompanyId = companyID, TransactionCode = grouping.Key });
}
}
}
Conclusion
Initially I didn't expect a big difference in the calculations because my hypothesis was that by the time the underlying thread handler did an analysis, the last thread would have already completed, and this seems to be the case with limited levels of transactions.
My experience shows a huge performance boost when transaction groups contain enough lines that the thread is still alive by the time the handler checks. This boost, in this example, starts at about 1 million transactions. I was getting a 23%-25% boost in performance, consistently, with 10 million transactions, but a much slower result with anything below 1 million.
The impact this project shows should be obvious in that although parallel models are theoretically more efficient, there are cases that need more investigation and benchmarking prior to implementation.
This example should not be a definitive benchmarking guide for your own work, since each case of parallel design is going to be unique.
For fun, try creating a new calculation class, and outperform both the MySeq
and MyTPL
classes at any number of transactions.