Introduction
In this tip, we will traverse random data tree structure using background workers.
At first, we will implement traversal algorithm using Thread Pool and synchronization primitives to compare this approach with regular background worker.
An example project is available at GitHub here.
Background
There were a couple of times in my projects when I needed to walk over tree data structure that has a lot of child elements and do some operation for each child element in collection.
I've found that using ThreadPool
or BackgroundWorker
may have some performance benefits if you are dealing with big tree data structure.
Here, I'm using Composite Pattern to implement tree, see (ex project).
Using the Code
Let's implement a couple of classes that will actually do the work.
Thread Pool Walker:
class TreeRunner
{
protected Action<TreeComponent<string>> _action;
public TreeRunner(Action<TreeComponent<string>> action) => _action = action;
public void Walk(ConcurrentQueue<TreeComponent<string>> queue, int count = 4)
{
using (var countDown = new CountdownEvent(count))
{
var b = new Barrier(count);
for (int i = 0; i < count; i++)
ThreadPool.QueueUserWorkItem((object o) =>
{
b.SignalAndWait();
Console.WriteLine($"THread start {o}");
try
{
while (queue.Count > 0)
{
if (queue.TryDequeue(out TreeComponent<string> node))
{
_action.Invoke(node);
foreach (var child in node.GetChilds())
queue.Enqueue(child);
}
}
}
catch (Exception)
{
}
finally
{
Console.WriteLine($"THread END {o} final ");
b.SignalAndWait();
countDown.Signal();
}
}, i);
countDown.Wait();
}
}
}
Walker class that uses background workers.
class TreeWalkerUsingBackgroundWoker
{
protected Action<TreeComponent<string>> _action;
private DateTime start;
public TreeWalkerUsingBackgroundWoker
(Action<TreeComponent<string>> action) => _action = action;
private List<BackgroundWorker> _workers = new List<BackgroundWorker>();
private ConcurrentQueue<TreeComponent<string>> _q;
private int _count = 0;
private int _visited = 0;
public void Walk(ConcurrentQueue<TreeComponent<string>> queue, int count = 4)
{
_q = queue;
_count = count;
start = DateTime.Now;
for (int i = 0; i < count; i++)
{
var worker = new BackgroundWorker();
worker.WorkerSupportsCancellation = true;
worker.DoWork += new DoWorkEventHandler((object sender, DoWorkEventArgs args) => {
Console.WriteLine("Starting Worker ");
while (_q.Count > 0)
{
if (_q.TryDequeue(out TreeComponent<string> node))
{
_action.Invoke(node);
Interlocked.Increment(ref _visited);
foreach (var child in node.GetChilds())
_q.Enqueue(child);
}
}
Console.WriteLine("End Worker");
});
worker.RunWorkerAsync();
worker.RunWorkerCompleted += Worker_RunWorkerCompleted;
_workers.Add(worker);
}
}
private void Worker_RunWorkerCompleted(object sender, RunWorkerCompletedEventArgs e)
{
if (_count == _workers.Count(w => w.IsBusy == false))
{
Console.WriteLine("End Task");
Console.WriteLine
(" " + (DateTime.Now - start).TotalMilliseconds + $" / Visited {_visited}");
}
}
}
Finally, let's run some tests for both options:
static void RunTest1()
{
Root = new TreeBranch<string>();
Root.AddChild(BuildTree(new TreeBranch<string>()));
Console.WriteLine("Starting Walking Tree component");
var start = DateTime.Now;
var walker = new TreeRunner
(new Action<TreeComponent<string>>((TreeComponent<string> cs) =>
{
new TreeVisitor<string>().Visit(cs);
Interlocked.Increment(ref Visited);
}
));
var c = new ConcurrentQueue<TreeComponent<string>>();
c.Enqueue(Root);
walker.Walk(c);
Console.WriteLine("First Walker Finished " +
(DateTime.Now - start).TotalMilliseconds + " / " + Visited);
}
static void RunTest2()
{
Console.WriteLine("Starting Walking Tree component");
var start = DateTime.Now;
var walker = new TreeWalkerUsingBackgroundWoker
(new Action<TreeComponent<string>>((TreeComponent<string> cs) =>
{
new TreeVisitor<string>().Visit(cs);
}
));
var c = new ConcurrentQueue<TreeComponent<string>>();
c.Enqueue(Root);
walker.Walk(c);
}
Results
On my i5 7700k with 640002 child elements, I got the following results:
Single thread 900ms ~ 1200ms
Using ThreadPool 1st run 400-500ms, 2nd run 390ms
Using Background Worker 1st run 300, 2nd 206ms
It's obvious that the second approach is a little bit faster because of synchronization primitives in the first one.