Lambda Syntax and Performance

Bill Wagner

5.00/5 (3 votes)

16 Dec 2015CPOL3 min read

6.7K

Lambda Syntax and Performance

Recently, I was asked a question about performance in event handlers. One of my regular readers had been told that using Lambda syntax created an event handler that would execute more slowly than using the ‘classic’ delegate syntax and defining a separate method for the event handler.

TL;DR Version

Yes, but you probably don’t care. And if you do, you might want to reconsider your design.

A Thorough Explanation

Dear readers, don’t accept these kinds of assertions just because someone says it’s true. Let’s write a bit of code, and let’s look at what’s generated.

Let’s start by creating a simple example that demonstrates both of the available options in action:

class EventSource : Progress<int>
{
    public async Task<int> PerformExpensiveCalculation()
    {
        var sum = 0;
        for(int i = 0; i < 100; i++)
        {
            await Task.Delay(100);
            sum += i;
            this.OnReport(sum);
        }
        return sum;
    }
}

class Program
{
    static void Main(string[] args)
    {
        var source = new EventSource();

        EventHandler<int> handler = (_, progress) => Console.WriteLine(progress);
        source.ProgressChanged += handler;
        Console.WriteLine(source.PerformExpensiveCalculation().Result);
        source.ProgressChanged -= handler;

        source.ProgressChanged += ProgressChangedMethod;
        Console.WriteLine(source.PerformExpensiveCalculation().Result);
        source.ProgressChanged -= ProgressChangedMethod;
    }
}

This program calls two different versions of the event source driver, one where the event source is connected and disconnected using Lambda syntax, one where a separate method is defined for the delegate. Let’s think about how we would run performance tests to tell the difference. Where might any performance changes exist? Will it be in processing the event, or connecting and disconnecting the event handler? or would it be in processing the event? The question is important in testing the performance of these two different versions. If it’s in processing the event, this version would be fine to instrument. However, if the performance differences are in connecting and disconnecting the event handlers, this application won’t be useful. We will be unlikely to measure the difference with connecting and disconnecting the handler only once.

Before making extensive measurements, let’s look at the generated IL. Here’s the IL from VersionOne (which uses the lambda syntax)

MSIL

IL_0007: ldsfld class [mscorlib]System.EventHandler`1<int32> 
	blogExample.Program/'<>c'::'<>9__0_0'
IL_000c: dup
IL_000d: brtrue.s IL_0026

IL_000f: pop
IL_0010: ldsfld class blogExample.Program/'<>c' 
	blogExample.Program/'<>c'::'<>9'
IL_0015: ldftn instance void blogExample.Program/'<>c'::'<Main>b__0_0'(object,  int32)
IL_001b: newobj instance void class [mscorlib]System.EventHandler`1<int32>::.ctor(object,  native int)
IL_0020: dup
IL_0021: stsfld class [mscorlib]System.EventHandler`1<int32> 
	blogExample.Program/'<>c'::'<>9__0_0'

IL_0026: stloc.1
IL_0027: ldloc.0
IL_0028: ldloc.1
IL_0029: callvirt instance void class [mscorlib]System.Progress`1<int32>
	::add_ProgressChanged(class [mscorlib]System.EventHandler`1<!0>)
IL_002e: nop
IL_002f: ldloc.0
IL_0030: callvirt instance class [mscorlib]System.Threading.Tasks.Task`1<
	int32> blogExample.EventSource::PerformExpensiveCalculation()
IL_0035: callvirt instance int32 class [mscorlib]System.Threading.Tasks.Task`1<int32>::get_Result()
IL_003a: call void [mscorlib]System.Console::WriteLine(int32)
IL_003f: nop
IL_0040: ldloc.0
IL_0041: ldloc.1
IL_0042: callvirt instance void class [mscorlib]System.Progress`1<int32>
	::remove_ProgressChanged(class [mscorlib]System.EventHandler`1<!0>)

There are 5 instructions that set the event handler (IL_0010 to IL_0029). There is one instruction to disconnect the handler (IL_0042).

Let’s compare that to the version where the method is declared as a function member of the class:

MSIL

IL_004a: ldftn void blogExample.Program::ProgressChangedMethod(object,  int32)
IL_0050: newobj instance void class [mscorlib]System.EventHandler`1<int32>::.ctor(object,  native int)
IL_0055: callvirt instance void class [mscorlib]System.Progress`1<int32>
	::add_ProgressChanged(class [mscorlib]System.EventHandler`1<!0>)
IL_005a: nop
IL_005b: ldloc.0
IL_005c: callvirt instance class [mscorlib]System.Threading.Tasks.Task`1<
	int32> blogExample.EventSource::PerformExpensiveCalculation()
IL_0061: callvirt instance int32 class [mscorlib]System.Threading.Tasks.Task`1<int32>::get_Result()
IL_0066: call void [mscorlib]System.Console::WriteLine(int32)
IL_006b: nop
IL_006c: ldloc.0
IL_006d: ldnull
IL_006e: ldftn void blogExample.Program::ProgressChangedMethod(object,  int32)
IL_0074: newobj instance void class [mscorlib]System.EventHandler`1<int32>::.ctor(object,  native int)
IL_0079: callvirt instance void class [mscorlib]System.Progress`1<int32>
	::remove_ProgressChanged(class [mscorlib]System.EventHandler`1<!0>)

Here, there are 3 lines of IL for attaching the event handler (IL_004A to IL_0055). And also 3 lines of IL for disconnecting (IL_006e to IL_0079). Well, does seem to indicate one extra line of IL to define the first version. Could that add up?

Well, not really. But, let’s measure to be sure.

I’ll modify the program to perform some measurements. I want to measure adding and removing the event handler, not the whole process. So, I commented out the calls to PerformExpensiveOperation(), and I’m simply adding and removing the event handler:

class Program
{
    static void Main(string[] args)
    {
        for (int repeats = 10; repeats <= 1000000; repeats *= 10)
        {
            VersionOne(repeats);
            VersionTwo(repeats);
        }
    }

    private static void VersionOne(int repeats)
    {
        var timer = new Stopwatch();
        timer.Start();
        var source = new EventSource();
        for (int i = 0; i < repeats; i++)
        {
            EventHandler<int> handler = (_, progress) => Console.WriteLine(progress;
            source.ProgressChanged += handler;
            // Console.WriteLine(source.PerformExpensiveCalculation().Result);
            source.ProgressChanged -= handler;
        }
        timer.Stop();
        
        Console.WriteLine($"Version one: {repeats} add/remove takes {timer.ElapsedMilliseconds}ms");
    }

    private static void VersionTwo(int repeats)
    {
        var timer = new Stopwatch();
        timer.Start();
        var source = new EventSource();
        for (int i = 0; i < repeats; i++)
        {
            source.ProgressChanged += ProgressChangedMethod;
            // Console.WriteLine(source.PerformExpensiveCalculation().Result);
            source.ProgressChanged -= ProgressChangedMethod;
        }
        timer.Stop();

        Console.WriteLine($"Version two: {repeats} add/remove takes {timer.ElapsedMilliseconds}ms");
    }

    private static void ProgressChangedMethod(object sender, int e)
    {
        Console.WriteLine(e);
    }
}

And, here are the results:

Version one: 10 add/remove takes 0ms
Version two: 10 add/remove takes 0ms
Version one: 100 add/remove takes 0ms
Version two: 100 add/remove takes 0ms
Version one: 1000 add/remove takes 0ms
Version two: 1000 add/remove takes 0ms
Version one: 10000 add/remove takes 1ms
Version two: 10000 add/remove takes 1ms
Version one: 100000 add/remove takes 9ms
Version two: 100000 add/remove takes 13ms
Version one: 1000000 add/remove takes 79ms
Version two: 1000000 add/remove takes 93ms

So, if you are adding and removing at least 1 million event handlers during the normal execution of your program, you can save 11ms.

However, I would argue that if you are adding and removing more than 1000000 event handlers in your program, I would suggest you re-examine your overall design.

A Common Mistake

Before I leave you, I am going to make one final change to the test program that has a huge impact on performance. Note the updated code for the lambda syntax (VersionOne):

class Program
{
    static void Main(string[] args)
    {
        for (int repeats = 10; repeats <= 1000000; repeats *= 10)
        {
            VersionOne(repeats);
            VersionTwo(repeats);
        }
    }

    private static void VersionOne(int repeats)
    {
        var timer = new Stopwatch();
        timer.Start();
        
        var source = new EventSource();
        for (int i = 0; i < repeats; i++)
        {
            source.ProgressChanged += (_, progress) => Console.WriteLine(progress);
            
            // Console.WriteLine(source.PerformExpensiveCalculation().Result);
            
            source.ProgressChanged -= (_, progress) => Console.WriteLine(progress);
        }

        timer.Stop();

        Console.WriteLine($"Version one: {repeats} add/remove takes {timer.ElapsedMilliseconds}ms");
    }

    private static void VersionTwo(int repeats)
    {
        var timer = new Stopwatch();
        timer.Start();

        var source = new EventSource();
        for (int i = 0; i < repeats; i++)
        {
            source.ProgressChanged += ProgressChangedMethod;

            // Console.WriteLine(source.PerformExpensiveCalculation().Result);
            
            source.ProgressChanged -= ProgressChangedMethod;

        }

        timer.Stop();

        Console.WriteLine($"Version two: {repeats} add/remove takes {timer.ElapsedMilliseconds}ms");
    }

    private static void ProgressChangedMethod(object sender, int e)
    {
        Console.WriteLine(e);
    }
}

Instead of declaring a local variable for the lambda expression, I write it inline in the add and remove statements for the event handler. Here are the performance results from this version:

Version one: 10 add/remove takes 1ms
Version two: 10 add/remove takes 0ms
Version one: 100 add/remove takes 0ms
Version two: 100 add/remove takes 0ms
Version one: 1000 add/remove takes 99ms
Version two: 1000 add/remove takes 0ms
Version one: 10000 add/remove takes 6572ms
Version two: 10000 add/remove takes 1ms
Version one: 100000 add/remove takes 803267ms
Version two: 100000 add/remove takes 13ms

You’ll notice that the lambda version is much, much, slower. Note that I stopped execution after 100,000 add/remove sequences because of the time taken. Why? Well, let’s look at the generated IL for the code inside the loop in VersionOne():

MSIL

.loop
{
    IL_0018: nop
    IL_0019: ldloc.1
    IL_001a: ldsfld class [mscorlib]System.EventHandler`1<int32> 
    	blogExample.Program/'<>c'::'<>9__1_0'
    IL_001f: dup
    IL_0020: brtrue.s IL_0039

    IL_0022: pop
    IL_0023: ldsfld class blogExample.Program/'<>c' 
    	blogExample.Program/'<>c'::'<>9'
    IL_0028: ldftn instance void blogExample.Program/'<>c'
    	::'<VersionOne>b__1_0'(object,  int32)
    IL_002e: newobj instance void class [mscorlib]System.EventHandler`1<int32>::.ctor(object,  native int)
    IL_0033: dup
    IL_0034: stsfld class [mscorlib]System.EventHandler`1<int32> 
    	blogExample.Program/'<>c'::'<>9__1_0'

    IL_0039: callvirt instance void class [mscorlib]System.Progress`1<int32>
    	::add_ProgressChanged(class [mscorlib]System.EventHandler`1<!0>)
    IL_003e: nop
    IL_003f: ldloc.1
    IL_0040: ldsfld class [mscorlib]System.EventHandler`1<int32> 
    	blogExample.Program/'<>c'::'<>9__1_1'
    IL_0045: dup
    IL_0046: brtrue.s IL_005f

    IL_0048: pop
    IL_0049: ldsfld class blogExample.Program/'<>c' 
    	blogExample.Program/'<>c'::'<>9'
    IL_004e: ldftn instance void blogExample.Program/'<>c'
    	::'<VersionOne>b__1_1'(object,  int32)
    IL_0054: newobj instance void class [mscorlib]System.EventHandler`1
    	<int32>::.ctor(object,  native int)
    IL_0059: dup
    IL_005a: stsfld class [mscorlib]System.EventHandler`1<int32> 
    	blogExample.Program/'<>c'::'<>9__1_1'

    IL_005f: callvirt instance void class [mscorlib]System.Progress`1<int32>
    	::remove_ProgressChanged(class [mscorlib]System.EventHandler`1<!0>)
    IL_0064: nop
    IL_0065: nop
    IL_0066: ldloc.2
    IL_0067: stloc.3
    IL_0068: ldloc.3
    IL_0069: ldc.i4.1
    IL_006a: add
    IL_006b: stloc.2

    IL_006c: ldloc.2
    IL_006d: ldarg.0
    IL_006e: clt
    IL_0070: stloc.s V_4
    IL_0072: ldloc.s V_4
    IL_0074: brtrue.s IL_0018
}

Focus on the four highlighted lines of code above. Notice that the event handler added is not the same as the event handler removed! Removing an event handler that is not attached does not generate any errors, but it also doesn’t do anything. So, what happens now is that over the course of the looping test, VersionOne adds more than 1,000,000 event handlers to the delegate. They are all the same handler, but they still consume memory and CPU resources.

I think this may be where the misconception arises that event handlers written as lambda expressions are slower than handlers written as regular functions. If you don’t add and remove handlers correctly using the lambda syntax, that error quickly shows up in performance metrics.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)