Table Of Contents
Introduction
In recent years, C# has grown from a language with exactly one feature to solve a problem to a language with many potential (language) solutions for a single problem. This is both, good and bad. Good, because it gives us as developers freedom and power (without compromising backwards compatibility) and bad due to the cognitive load that is connected with making decisions.
In this series, we want to explore what options exist and where these options differ. Surely, some may have advantages and disadvantages under certain conditions. We will explore these scenarios and come up with a guide to make our life easier when renovating existing projects.
This is part II of the series. You can find Part I on CodeProject as well.
Background
In the past, I've written many articles particularly targeted at the C# language. I've written introduction series, advanced guides, articles on specific topics like async / await or upcoming features. In this article series, I want to combine all the previous themes in one coherent fashion.
I feel it's important to discuss where new language features shine and where the old - let's call them established - ones are still preferred. I may not always be right (especially, since some of my points will surely be more subjective / a matter of taste). As usual leaving a comment for discussion would be appreciated!
Let's start off with some historic context.
What are Methods?
Quite often, we see people being confused by the terminology - "method" vs "function". Indeed, every method is a function, but not every function is a method. A method is a special kind of function that has an implicit first argument, called the "context". The context is given as this
and is associated with the instance of the class the method has been specified in. As such, a (real) method can only exist within a class and static
methods should rather be called functions.
While people often think this
is guaranteed to be non-null
, the restriction is in fact artificial. Indeed, the runtime performs some implicit (by making the call of a method a callvirt
instruction) check for the first argument, however, in theory this could be easily circumvented.
With the implicit this
parameter, a special kind of syntax is derived. Instead of calling a function f
like f(a, b, ...)
we find c.f(a, b, ...)
, where c
is the instance of some class. All other cases may be thought of as fully qualified names, see:
namespace A
{
public class Foo
{
public static void Hello()
{
}
}
}
using A;
public class Bar
{
public static void Test()
{
Foo.Hello();
}
}
using static A.Foo;
public class Bar
{
public static void Test()
{
Hello();
}
}
As we can see, since C# 6 classes can also be thought like namespaces - at least for its static
members. Thus the static
methods are in fact always callbable like functions without any prefix, the difference is reduced to fully qualified vs normal name.
So far, we've referred to this
quite often. What is this
in C#?
The this
keyword refers to the current instance of the class and is also used as a modifier of the first parameter of an extension method.
We will cover extension methods later. For now, let's recap standard methods vs functions (i.e., static
"methods") with the following example:
void Main()
{
var test = default(Test);
test.FooInstance();
Test.FooStatic();
}
public class Test
{
public void FooInstance()
{
}
public static void FooStatic()
{
}
}
Results in the following MSIL code:
IL_0000: nop
IL_0001: ldnull
IL_0002: stloc.0
IL_0003: ldloc.0
IL_0004: callvirt Test.FooInstance
IL_0009: nop
IL_000A: call Test.FooStatic
IL_000F: nop
IL_0010: ret
There are two key differences between a method and a function:
- A method requires loading the instance it belongs to. This will be the implicit first argument.
- A method is always called using
callvirt
(independent if we sealed
it or made it explicitly virtual
).
So why declare a method sealed
or virtual
at all? The reason is simple: Engineering! It's one way of telling fellow developers how the code should be used. As with things like private
or readonly
, the implication is (at least at first) only in the handling by the compiler. The runtime may use this information for further optimization later, but due to some circumstances, there cannot be a direct impact.
Let's recap when functions should be preferred over methods.
Useful for | Avoid for |
- Small helpers
- Independent of specific (class) instances
- No cost / checks associated with virtual calls
|
- If you look for inheritance
|
The Modern Way
Standard methods and functions still have progressed in C# even though their purpose has stayed the same (why should it change?). We got some useful syntactic sugar to write a little bit less (verbose) code. We also have new language capabilities that help in writing even more reusable functions in C#.
Let's start our journey with a look at extension methods.
Extension Methods
Extension methods are a fairly old, yet a simple mechanism to reuse a function in a broad spectrum. The syntax of writing an extension method is fairly straight forward:
- We need a
static
class (cannot be instantiated, cannot be inherited, and does not allow instance members) - We need a
static
method (i.e., function) - Needs at least one argument (called the extension target)
- The first argument must be decorated with the
this
keyword
The following as an example for an extension method.
public static class Test
{
public static void Foo(this object obj)
{
}
}
While methods have an implicit this
argument (named this
), an extension method has an explicit "this
" argument, which can have a name we decide. Since this
is already a keyword, we cannot use it, unfortunately (or fortunately, as it would - at least on a first glance - look like a standard method, which is definitely not the case).
The advantage of an extension method (against an ordinary function) is only shown once we call it.
var test = default(object);
Test.Foo(test);
test.Foo();
While the first call uses the explicit syntax (which, granted, can be reduced to only Foo(test)
by having using static Test;
at the top), the second call makes use of the extension method.
As we can guess, from the generated MSIL, there is no difference at all!
IL_0000: nop
IL_0001: ldnull
IL_0002: stloc.0
IL_0003: ldloc.0
IL_0004: call Test.Foo
IL_0009: nop
IL_000A: ldloc.0
IL_000B: call Test.Foo
IL_0010: nop
We always load the first argument and then call the function. No magic in between! However, the extension method looks nicer, and has one additional benefit...
Consider having generic functions such as Where
, Select
, OrderBy
, etc. These functions work on IEnumerable<T>
instances.
If these functions would be called to introduce a condition, select a specific property, and order the enumerable by some rule we would write code such as:
var result = MyLinq.OrderBy(MyLinq.Select(MyLinq.Where(source, ...), ...), ...);
Hence, the resulting code needs to be read from the inside out (onion style) instead of the natural left-to-right direction as the sentence above giving the description (called "chaining" or in "piped" order). This is unfortunate, as it destroys the readability of code and makes it hard to grasp what goes on... Comments for the rescue?
Not really, by using extension methods, we can abuse that the first argument is given implicitly by the "calling instance". As a result, the code looks as follows:
var result = source.Where(...).Select(...).OrderBy(...);
The notation of explicitly writing out the MyLinq
class is also gone (without introducing using static MyLinq
). Wonderful!
Extension methods can also be used as generalized helpers. Consider the following interface
:
interface IFoo
{
Task FooAsync();
Task FooAsync(CancellationToken cancellationToken);
}
Here, we are already telling the implementation side to have 2 methods instead of just a single one. I guess almost all implementations of this interface will actually look as follows:
class StandardFoo : IFoo
{
public Task FooAsync()
{
return FooAsync(default(CancellationToken));
}
public Task FooAsync(CancellationToken cancellationToken)
{
}
}
This is bad. The implementer has to do more work than necessary. Instead, we can specify our interface and an associated helper method as follows:
interface IFoo
{
Task FooAsync(CancellationToken cancellationToken);
}
static class IFooExtensions
{
public static Task FooAsync(this IFoo foo)
{
return foo.FooAsync(default(CancellationToken));
}
}
Great, now somebody who implementations IFoo
only needs to take care of a single method and gets our convenience method for free.
Useful for | Avoid for |
- General interface methods
- Helper methods with at least 1 argument
|
- Replacing ordinary class instance methods
|
Delegates
In the previous section, we touched the importance of extension methods and their meaning for readable code (left-to-right instead of inside-out). The example used a LINQ-like set of functions to motivate extension methods. Actually, the LINQ (short for language integrated query) feature introduced (the need for) extension methods in the first place. However, we also left out an important part in the previous example...
LINQ only worked if we have a sophisticated set of options to define the options of the various functions (e.g., Select
). However, even the most sophisticated object structure as argument(s) would not give LINQ the needed flexibility (and really overcomplicate its usage). As a consequence, we need a special kind of object as argument - a function. In C#, the way to transport functions is indirectly via so-called delegates.
A delegate is defined via the following syntax:
delegate void Foo(int a, int b);
A delegate is thus exactly written like a function signature, where the function name is replaced by the delegate's name and the delegate
keyword has been used to introduce the signature.
Ultimately, a delegate is compiled into a class with a method Invoke
. The signature of this method equals the signature we've just introduced.
Let's look at the MSIL from calling our delegate with a sample implementation (empty body) to reveal some more information:
IL_0000: nop
IL_0001: ldsfld <>c.<>9__0_0
IL_0006: dup
IL_0007: brtrue.s IL_0020
IL_0009: pop
IL_000A: ldsfld <>c.<>9
IL_000F: ldftn <>c.b__0_0
IL_0015: newobj Foo..ctor
IL_001A: dup
IL_001B: stsfld <>c.<>9__0_0
IL_0020: stloc.0
IL_0021: ldloc.0
IL_0022: callvirt Foo.Invoke
IL_0027: nop
IL_0028: ret
Foo.Invoke:
Foo.BeginInvoke:
Foo.EndInvoke:
Foo..ctor:
<>c.b__0_0:
IL_0000: nop
IL_0001: ret
<>c..cctor:
IL_0000: newobj c..ctor
IL_0005: stsfld c.<>9
IL_000A: ret
<>c..ctor:
IL_0000: ldarg.0
IL_0001: call System.Object..ctor
IL_0006: nop
IL_0007: ret
We see that the generated class actually contains quite some functionality (also static
members). More importantly, there are two other methods - BeginInvoke
and EndInvoke
. Finally, creating a delegate is not free - it is actually an object creation of the generated class. Calling the delegate is actually the same as calling the Invoke
method on the class. Thus, this is a virtual call and more expensive than calling, e.g., a function.
So far, we have only seen what a delegate is and how it is declared. Actually, most of the time, we will not need to declare a delegate ourselves. We can just use the in-built generically declared ones:
Action<T...>
for all delegates returning void
(nothing) Func<T..., TReturn>
for all delegates returning something: TReturn
There are also some generic constructs for things such as event delegates, predicates (like Func
, but fixed to return bool
), etc.
How do we instantiate a delegate? Let's consider the delegate above, Foo
, which takes two integer arguments.
Foo foo = delegate (int a, int b) { };
foo(2, 3);
Alternatively, we may want to point it to an existing function:
void Sample(int a, int b)
{
}
Foo foo = new Foo(Sample);
foo(2, 3);
The generated MSIL is actually not exactly the same, but this does not play a role at the moment. The last one may actually be also simplified to just read Foo foo = Sample
, which handles the delegate instance creation implicitly.
Useful for | Avoid for |
- Explicitly stating a function signature
- Transporting functions between each other
|
- Anonymous functions
- Reusable blocks
|
So far so good. What we clearly lack is a syntax for writing an anonymous function more nicely. Luckily, C# got us covered.
Lambda Expressions
As we've already seen, delegates can be pretty handy to transport functions by packaging them nicely into classes. However, right now writing some logic in place, i.e., packing an anonymous function into the delegate, looks quite cumbersome and ugly.
Luckily, with C# 3, not only LINQ (together with extension methods) has been introduced, but also a fresh syntax for writing anonymous functions using the new "fat-arrow" (or lambda) operator =>
.
If we change the previous example to make use of lambda expressions, it could look as follows:
Foo foo = (a, b) => { };
foo(2, 3);
The generated MSIL is exactly the same as for the (anonymous) delegate. Hence, this is really just syntactic sugar, but exactly the sweetness we demand!
LINQ Expressions
There is one more thing that has been introduced together with LINQ (C# 3 was pretty great, right?): It's LINQ expressions! This is not about query syntax vs direct use of extension methods or similar, but rather how ORM's have adopted use of LINQ.
The problem is as follows: Before LINQ has been introduced to C#, we mostly had been stuck in writing SQL queries directly in C#. While this certainly has some advantages (full access to all features that are offered in our database), the disadvantages are quite real:
- No help from a compiler
- Potential security issues
- No static typing for the result
With LINQ, this has been addressed by introducing LINQ expressions, a way of not compiling an anonymous function to MSIL, but rather transforming the generated AST into an object.
Even though this is a compiler feature, it all comes down to using the right type. Earlier, we've seen that generic delegates such as Func
or Action
allow us to avoid writing them again. If we pack such a delegate into the Expression
type, we end up with an AST holder.
A quick example on this one (actually, the following one is not really compiling as we would need an expression on the right side, but the idea should be visible):
Expression<Foo> foo = (a, b) => { };
The generated MSIL is ugly to say the least (and given that it's a really short example, we can guess how real-life code may look like):
IL_0000: nop
IL_0001: ldtoken System.Int32
IL_0006: call System.Type.GetTypeFromHandle
IL_000B: ldstr "a"
IL_0010: call System.Linq.Expressions.Expression.Parameter
IL_0015: stloc.1
IL_0016: ldtoken System.Int32
IL_001B: call System.Type.GetTypeFromHandle
IL_0020: ldstr "b"
IL_0025: call System.Linq.Expressions.Expression.Parameter
IL_002A: stloc.2
IL_002B: ldnull
IL_002C: ldtoken Nothing
IL_0031: call System.Reflection.MethodBase.GetMethodFromHandle
IL_0036: castclass System.Reflection.MethodInfo
IL_003B: call System.Array.Empty<Expression>
IL_0040: call System.Linq.Expressions.Expression.Call
IL_0045: ldc.i4.2
IL_0046: newarr System.Linq.Expressions.ParameterExpression
IL_004B: dup
IL_004C: ldc.i4.0
IL_004D: ldloc.1
IL_004E: stelem.ref
IL_004F: dup
IL_0050: ldc.i4.1
IL_0051: ldloc.2
IL_0052: stelem.ref
IL_0053: call System.Linq.Expressions.Expression.Lambda<Foo>
IL_0058: stloc.0
IL_0059: ret
Essentially, the whole generated AST for this call is now made available in an object format - thus included as such in the MSIL.
ORMs can inspect this information to create optimized queries that transport variables and special fields securely without any chance of hijacking in any form. As the delegate is still strongly typed, the result can be strongly typed (and asserted by the ORM). But can we use LINQ expressions even without writing an ORM?
LINQ expressions can come in handy in many situations. An example would be the way in which they have been used in ASP.NET MVC / Razor views. Here, we needed to select a property from a given model. Now, as C#'s type system is rather limited, there is no way to reduce (and help) the developer on narrowing down potential strings (to all the property names). Instead, a LINQ expression "selecting" the property was used.
Expression<Func<TModel, TProperty>> selectedProperty = model => model.PropertyName;
Now, we still need some magic to evaluate this, however, in general, it is quite straight forward to get the property name or info from the given expression above:
static PropertyInfo GetPropertyInfo<T, TProperty>
(this T model, Expression<Func<T, TProperty>> propertyLambda)
{
var type = typeof(T);
var member = propertyLambda.Body as MemberExpression ??
throw new ArgumentException($"Expression
'{propertyLambda.ToString()}' refers to a method, not a property.");
var propInfo = member.Member as PropertyInfo ??
throw new ArgumentException($"Expression
'{propertyLambda.ToString()}' refers to a field, not a property.");
if (type != propInfo.ReflectedType && !type.IsSubclassOf(propInfo.ReflectedType))
throw new ArgumentException($"Expression
'{propertyLambda.ToString()}' refers to a property that is not from type {type}.");
return propInfo;
}
Problem solved - still strongly typed and no magic strings used.
Useful for | Avoid for |
- ORM mapping
- Communication with external systems
- Circumvent type system limitations
|
- Functions that are actually called
|
Method Expressions
Starting with C# 7, more functional elements should be introduced to the language. This also implies having more expressions (instead of just statements) and a more brief / concise syntax. This "cleanup" did not stop at standard functions.
public static int Foo(int a, int b)
{
return a + b;
}
This is a very simply example that requires 4 lines of code (at least if we follow the common style guide). The compiled MSIL looks as follows:
IL_0000: nop
IL_0001: ldarg.0
IL_0002: ldarg.1
IL_0003: add
IL_0004: stloc.0
IL_0005: br.s IL_0007
IL_0007: ldloc.0
IL_0008: ret
With method expressions, we can reduce it to a single line in C# (without conflicting with any style guide):
public static int Foo(int a, int b) => a + b;
Also, the generated MSIL is a little bit different:
IL_0000: ldarg.0
IL_0001: ldarg.1
IL_0002: add
IL_0003: ret
We have seen a similar reduction already with property (or getter / setter) expressions in the previous article. The 4 lost instructions all deal with the scope that has been introduced in the standard syntax.
Useful for | Avoid for |
- Aliasing (wrapping) of other methods
- Very short bodies (real one-liners)
|
|
Local Functions
Finally! A local function is a function within a function. This sounds more trivial at first then it really is, but let's wait a second to see the real advantages.
A very simple example:
void LongFunction()
{
void Cleanup()
{
}
if (specialCondition)
{
Cleanup();
return;
}
if (specialCondition)
{
Cleanup();
return;
}
Cleanup();
}
While this may look like "bad style" or a function implementation gone wrong, there may be many reasons for a function to look like this. Nevertheless, in the past, we had to fall back to some very special patterns to make that happen. We had to either:
- use
goto
with a special section in the end (in the former example, we would have called it cleanup), or - use a
using
statement together with the cleanup code in the Dispose
method of a class that is implementing IDisposable
.
The latter may come with other problems (e.g., transporting all required values in).
So, this is already a big win, that we can just define a block of reusable code inside a reusable code. But just like anonymous functions, such a local function is capable of capturing values from the outer scope.
Let's see capturing first with an anonymous function:
var s = "Hello, ";
var call = new Action<string>(m => (s + m).Dump());
call("world");
The resulting MSIL code looks as follows:
IL_0000: newobj <>c__DisplayClass0_0..ctor
IL_0005: stloc.0
IL_0006: nop
IL_0007: ldloc.0
IL_0008: ldstr "Hello, "
IL_000D: stfld <>c__DisplayClass0_0.s
IL_0012: ldloc.0
IL_0013: ldftn <>c__DisplayClass0_0.<Main>b__0
IL_0019: newobj System.Action<System.String>..ctor
IL_001E: stloc.1
IL_001F: ldloc.1
IL_0020: ldstr "world"
IL_0025: callvirt System.Action<System.String>.Invoke
IL_002A: nop
IL_002B: ret
<>c__DisplayClass0_0.<Main>b__0:
IL_0000: ldarg.0
IL_0001: ldfld <>c__DisplayClass0_0.s
IL_0006: ldarg.1
IL_0007: call System.String.Concat
IL_000C: call Dump
IL_0011: pop
IL_0012: ret
There are not so many interesting parts in the given code. Most parts we know already, such as that a delegate needs to be instantiated first. However, there is one line for the temporary (generated) class in there that is interesting.
In IL_000D
, we assign the constant string "Hello, "
to a field s
. This is the capturing of the variable s
from the outer scope!
Let's rewrite the code above to use a local function instead.
var s = "Hello, ";
void call(string m)
{
(s + m).Dump();
}
call("world");
Now the MSIL has changed to:
IL_0000: nop
IL_0001: ldloca.s 00
IL_0003: ldstr "Hello, "
IL_0008: stfld <>c__DisplayClass0_0.s
IL_000D: nop
IL_000E: ldstr "world"
IL_0013: ldloca.s 00
IL_0015: call <Main>g__call|0_0
IL_001A: nop
IL_001B: ret
<Main>g__call|0_0:
IL_0000: nop
IL_0001: ldarg.1
IL_0002: ldfld <>c__DisplayClass0_0.s
IL_0007: ldarg.0
IL_0008: call System.String.Concat
IL_000D: call Dump
IL_0012: pop
IL_0013: ret
The code is much shorter! If we look closely, we see that much of the saving comes from not having to deal with a delegate (i.e., no instance of it, no callvirt
, ...).
But wait a second - what else? Before, we had some more calls with c__DisplayClass0_0
, like calling its constructor. All that is now gone, why? The reason is simple - the generated c__DisplayClass0_0
is no longer a class, but rather a struct
! And as a struct
, we do not need any constructor call as the (real) default constructor exists.
The reason why we can have a struct
instead of a class is that a local functions remains local. There is no need to worry about it being destroyed at the end of the block. Yes, the local function could be captured itself, however, in this case, we have a different structure and we will not lose consistency.
Remark: With the struct
we are not referring to the "holder" of the local function, but to the holder of captured variables. For lambda expressions this holder (of the captured variables) would be a class. The other difference between local functions and a lambda expression (which already is a delegate) is that a local function can be called with call
(its a standard function after all), while any delegate will be called with callvirt
(like a method). Once you put the local function into a delegate the latter applies and you will call the delegate via callvirt
like any other delegate. Thus there is no benefit for calling a local function in this case. It only gives you an edge by being called directly.
Outlook
In the next part of this series, we will take on string
s and other data types.
As far as the future of functions is concerned, the next level may be to reduce some payload (e.g., for delegates) even further. Also, as delegates are classes, they cannot be just casted to each other (e.g., a Predicate<int>
cannot be casted to a Func<int, bool>
even though they will represent the same signature). Maybe such problems will be tackled along a much richer type system.
Conclusion
The evolution of C# has not stopped at functions. We went from simple methods to full functions with added extensibility, AST generation, a simple syntax for anonymous functions, and local reusable blocks. We've seen how C# has progressed from its initial version.
Points of Interest
I always showed the non-optimized MSIL code. Once MSIL code gets optimized (or is even running), it may look a little bit different. Here, actually observed differences between the different methods may actually vanish. Nevertheless, as we focused on developer flexibility and efficiency in this article (instead of application performance), all recommendations still hold.
If you spot something interesting in another mode (e.g., release mode, x86, ...), then write a comment. Any additional insight is always appreciated!
History
- v1.0.0 | Initial release | 31.03.2019
- v1.0.1 | Fixed typo | 01.04.2019
- v1.1.0 | Added table of contents | 14.04.2019
- v1.2.0 | Refined local function | 07.05.2019