Table of Contents
Introduction
In the recent years, C# has grown from a language with exactly one feature to solve a problem to a language with many potential (language) solutions for a single problem. This is both, good and bad. Good, because it gives us as developers freedom and power (without compromising backwards compatibility) and bad due to the cognitive load that is connected with making decisions.
In this series, we want to explore what options exist and where these options differ. Surely, some may have advantages and disadvantages under certain conditions. We will explore these scenarios and come up with a guide to make our life easier when renovating existing projects.
This is part IV of the series. You can find Part I, Part II, as well as Part III on CodeProject.
Background
In the past, I've written many articles particularly targeted at the C# language. I've written introduction series, advanced guides, articles on specific topics like async / await or upcoming features. In this article series, I want to combine all the previous themes in one coherent fashion.
I feel it's important to discuss where new language features shine and where the old - let's call them established - ones are still preferred. I may not always be right (especially, since some of my points will surely be more subjective / a matter of taste). As usual, leaving a comment for discussion would be appreciated!
Let's start off with some historic context.
Classic Type Systems
Historically, types have been introduced to tell developers how many bytes by an allocation will be reserved. Additionally, simple things such as additions could then be also figured out by the compiler. Earlier, in pure assembler, developers had to decide what operation fits to the given values. Now, the compiler was capable of knowing that not only 4 bytes have been reserved by the two values, but also that they should be treated like integers. An Int32
addition would apply.
Later on, the need to introduce custom types was communicated. The first structures have been born. While standard operations (from a machine point of view) may not make much sense, the allocation and structure (hence the name) was key. Not only could we have "names" (usually erased at compile-time, i.e., only known to the compiler for convenience of the developer), but all the parts have been properly specified by position and type.
With the introduction of object-oriented programming and its first interpretations, we have seen much more importance on the concept of types (mostly associated then with classes) and their relations to functions (then called methods). The relevance of type information inspection / access at runtime increased leading to capabilities like reflection. While classic native systems usually have very limited runtime capabilities (e.g., C++), managed systems appeared with vast possibilities (e.g., JVM or .NET).
Now one of the issues with this approach today is that many types are no longer originally coming from the underlying system - they come from deserialization of some data (e.g., incoming request to a web API). While the basic validation and deserialization could be coming from a type defined in the system, usually it comes just from a derivation of such a type (e.g., omitting certain properties, adding new ones, changing types of certain properties, ...). As it stands, duplication and limitations arise when dealing with such data. Hence, the need for dynamic programming languages, which offer more flexibility in that regard - at the cost of type safety for development.
Every problem has a solution and in the last 10 years, we've seen new love for the type systems and type theory appearing all over the place. Popular languages such as TypeScript bring the results of years of research and other (more exotic) programming languages to the mainstream. Hopefully, some of the more classic and historic programming languages are also able to learn from these advancements.
Dissecting C#'s Type System
This could also be called .NET's type system, however, while there is certainly some common base layer coming from .NET many constructs and possibilities just come from the language. In a different aspect, if we look how F# uses .NET's type system, we know that there is no natural limitation given by .NET - the system can bend and extended by a far margin.
C# likes to work with a static type system. And the word static here means something. Let's pretend we have the following type:
public class Person
{
public string FirstName { get; set; }
public string LastName { get; set; }
public DateTime Birthday { get; set; }
}
What if we want to enforce all properties to be optional? Well, actually in some sense, they are already as no one forces us to set them. But let's pretend nullable types have been introduced in this article already (they will be later) and what we are after is something like:
public class PartialPerson
{
public string? FirstName { get; set; }
public string? LastName { get; set; }
public DateTime? Birthday { get; set; }
}
Now we have an issue. Once the first class changes, we also need to make some change on the second class. What if we could instead write something like:
public type PartialPerson = Partial<Person>;
That's actually how TypeScript works. In TypeScript, Partial<T>
is just an alias for iterating over all properties and putting an optional (?
) on every property.
Alright, so C# does not like this. C# is more runtime oriented. Hence, we should use reflection for this.
At runtime, this could look as follows:
var PartialPersonType = Partial<Person>();
where the Partial
method could be implemented in a straight forward way.
public static Type Partial<T>()
{
var type = typeof(T);
var builder = GetTypeBuilder<T>();
foreach (var property in type.GetProperties())
{
CreateProperty(builder, property);
}
return builder.CreateType();
}
private static TypeBuilder GetTypeBuilder<T>()
{
var name = $"Partial<{typeof(T).Name}>";
var an = new AssemblyName(name);
var assemblyBuilder = AppDomain.CurrentDomain.DefineDynamicAssembly
(an, AssemblyBuilderAccess.Run);
var moduleBuilder = assemblyBuilder.DefineDynamicModule("MainModule");
return moduleBuilder.DefineType(name,
TypeAttributes.Public |
TypeAttributes.Class |
TypeAttributes.AutoClass |
TypeAttributes.AnsiClass |
TypeAttributes.BeforeFieldInit |
TypeAttributes.AutoLayout,
null);
}
private static void CreateProperty
(TypeBuilder tb, PropertyInfo property, string? ignore = null)
{
var propertyName = property.Name;
var propertyType = property.PropertyType;
var attributes = property.Attributes;
var customAttributes = property.CustomAttributes;
var addNullable = false;
if (propertyType.IsInterface || propertyType.IsClass)
{
addNullable = true;
}
else if (propertyType.IsValueType &&
(!propertyType.IsGenericType ||
propertyType.GetGenericTypeDefinition() != typeof(Nullable<>)))
{
propertyType = typeof(Nullable<>).MakeGenericType(propertyType);
}
var fieldBuilder = tb.DefineField
("_" + propertyName, propertyType, FieldAttributes.Private);
var propertyBuilder = tb.DefineProperty(propertyName, attributes, propertyType, null);
foreach (var customAttribute in customAttributes)
{
if (customAttribute.Constructor.ReflectedType.Name != "NullableAttribute")
{
AppendAttribute(customAttribute, propertyBuilder);
}
}
if (addNullable)
{
var customAttribute =
MethodBase.GetCurrentMethod().GetParameters().Last().CustomAttributes.Last();
AppendAttribute(customAttribute, propertyBuilder);
}
var getPropMthdBldr = tb.DefineMethod("get_" + propertyName,
MethodAttributes.Public | MethodAttributes.SpecialName |
MethodAttributes.HideBySig, propertyType, Type.EmptyTypes);
var getIl = getPropMthdBldr.GetILGenerator();
getIl.Emit(OpCodes.Ldarg_0);
getIl.Emit(OpCodes.Ldfld, fieldBuilder);
getIl.Emit(OpCodes.Ret);
var setPropMthdBldr = tb.DefineMethod("set_" + propertyName,
MethodAttributes.Public | MethodAttributes.SpecialName |
MethodAttributes.HideBySig,
null, new[] { propertyType });
var setIl = setPropMthdBldr.GetILGenerator();
var modifyProperty = setIl.DefineLabel();
var exitSet = setIl.DefineLabel();
setIl.MarkLabel(modifyProperty);
setIl.Emit(OpCodes.Ldarg_0);
setIl.Emit(OpCodes.Ldarg_1);
setIl.Emit(OpCodes.Stfld, fieldBuilder);
setIl.Emit(OpCodes.Nop);
setIl.MarkLabel(exitSet);
setIl.Emit(OpCodes.Ret);
propertyBuilder.SetGetMethod(getPropMthdBldr);
propertyBuilder.SetSetMethod(setPropMthdBldr);
}
private static void AppendAttribute
(CustomAttributeData customAttribute, PropertyBuilder propertyBuilder)
{
var args = customAttribute.ConstructorArguments.Select(m => m.Value).ToArray();
var cab = new CustomAttributeBuilder(customAttribute.Constructor, args);
propertyBuilder.SetCustomAttribute(cab);
}
While it's certainly possible to create types on the fly as shown, the fact remains that this is a runtime mechanism. As such, many of the potential use cases for type transformation are either a lot harder to accomplish, or impossible.
There are, however, community projects such as Fody to manipulate assemblies and / or IL code for adding such things already at compile-time. The major issue with these is the compiler assistance / tooling. It's often not so easy to see what's really going on.
The Modern Way
A type remains a type. But wait! There is a little bit more to it. We have a lot of capabilities that either come with C# directly, with .NET, or are given by the ecosystem. In this section, we'll try to explore all of them.
Actually, while many of the syntax used in C# directly goes to some IL code or code constructs, some (mostly newer, but also as we will see really old) parts of C# tend to work closely together with the type system on a natural level. They either use existing interfaces, types, or other elements - sometimes creating new types without us even realizing. We already saw for instance classes for delegates (e.g., a Func<T>
) or local functions being created. Let's see what else is available!
Generating Iterators
Since the first versions of C#, we are able to generate types on the fly. Using yield
, we have the power to start our own iterator. Such an iterator is represented by a type that implements the IEnumerable
interface. It turns out that the only thing to do here is to somehow create an IEnumerator
instance. All the logic (and state) is then contained in the IEnumerator
instance.
Let's first code our own implementation. What we want is an enumerable of the first three numbers (1, 2, 3).
class MyEnumerable : IEnumerable<int>
{
public IEnumerator<int> GetEnumerator() => new MyIterable();
IEnumerator IEnumerable.GetEnumerator() => this.GetEnumerator();
class MyIterable : IEnumerator<int>
{
public int Current => current;
object IEnumerator.Current => current;
private int current = 0;
public void Dispose() {}
public bool MoveNext() => ++current < 4;
public void Reset() => current = 0;
}
}
The C# language has plenty of nice features to deal with enumerators. Certainly, the most used is the foreach
loop construct:
var enumerable = new MyEnumerable();
foreach (var item in enumerable)
{
item.Dump();
}
Obviously, this one is just syntax sugar for the following code:
var enumerable = new MyEnumerable();
var iterable = enumerable.GetEnumerator();
while (iterable.MoveNext())
{
var item = iterable.Current;
item.Dump();
}
A quick comparison at the MSIL code confirms this quite easily. The implicit version using the foreach
loop looks as follows:
IL_0000: nop
IL_0001: newobj UserQuery+MyEnumerable..ctor
IL_0006: stloc.0
IL_0007: nop
IL_0008: ldloc.0
IL_0009: callvirt UserQuery+MyEnumerable.GetEnumerator
IL_000E: stloc.1
IL_000F: br.s IL_0021
IL_0011: ldloc.1
IL_0012: callvirt System.Collections.Generic.IEnumerator<System.Int32>.get_Current
IL_0017: stloc.2
IL_0018: nop
IL_0019: ldloc.2
IL_001A: call LINQPad.Extensions.Dump<Int32>
IL_001F: pop
IL_0020: nop
IL_0021: ldloc.1
IL_0022: callvirt System.Collections.IEnumerator.MoveNext
IL_0027: brtrue.s IL_0011
IL_0029: leave.s IL_0036
IL_002B: ldloc.1
IL_002C: brfalse.s IL_0035
IL_002E: ldloc.1
IL_002F: callvirt System.IDisposable.Dispose
IL_0034: nop
IL_0035: endfinally
IL_0036: ret
For comparison, the explicit version compiles to be:
IL_0000: nop
IL_0001: newobj UserQuery+MyEnumerable..ctor
IL_0006: stloc.0
IL_0007: ldloc.0
IL_0008: callvirt UserQuery+MyEnumerable.GetEnumerator
IL_000D: stloc.1
IL_000E: br.s IL_0020
IL_0010: nop
IL_0011: ldloc.1
IL_0012: callvirt System.Collections.Generic.IEnumerator<System.Int32>.get_Current
IL_0017: stloc.2
IL_0018: ldloc.2
IL_0019: call LINQPad.Extensions.Dump<Int32>
IL_001E: pop
IL_001F: nop
IL_0020: ldloc.1
IL_0021: callvirt System.Collections.IEnumerator.MoveNext
IL_0026: stloc.3
IL_0027: ldloc.3
IL_0028: brtrue.s IL_0010
IL_002A: ret
Since the IEnumerator
implements the IDisposable
interface, we should have also disposed the resource correctly. The foreach
syntax does that for us. Another reason for always using the generated code - it just makes our life easier by doing the right things without us having to remember.
Still, it's not the foreach
part that strikes us, but rather the generation of the class for the IEnumerable
/ IEnumerator
implementation.
Let's use the yield
keyword to do that.
static IEnumerable<int> GetNumbers()
{
var current = 0;
while (++current < 4)
{
yield return current;
}
}
The interesting thing is that this small piece of code already represents the full iterator as specified above. The C# compiler generates all the necessary types for us to make it work. The usage is also the same, except that instead of an explicit constructor call (new MyEnumerable()
) we just call the function (GetNumbers()
). Great!
Let's recap what's so great about iterators.
Useful for | Avoid for |
- Custom iterations
- Generalized enumerations
- State machines
|
- Iterating arrays (if known)
|
Discards
The C# compiler imposes quite some restrictions on the developer. Some of these restrictions are included to safeguard against obvious errors, while others are included to shield the user from running potentially useless code. One of these restrictions forbids to use certain expressions without assignment.
Consider the following code:
static void Main()
{
2 + 3;
}
Now that's a strange code. It would compute the result of 2 + 3 but it would not do anything with it. In a nutshell, either the compiler would optimize this statement away or we would just waste some CPU cycles.
Personally, I think it's a strange restriction. Yes, the code above would be useless, but since C# allows operator overloading, there could be scenarios where simple add expressions would actually have meaningful side effects.
A scenario where the (negative or annoying) implications of this design choice can be seen more practically is a simple nullability test.
static void Run(Action action)
{
action ?? throw new ArgumentNullException(nameof(action));
action();
}
This could will not work. Instead, the following does:
static void Run(Action action)
{
(action ?? throw new ArgumentNullException(nameof(action)))();
}
Call expressions are considered okay by design. Obviously, the side-effect tendency of method calls was regarded very high - especially with respect to the "improbable" rating of operators.
We would, however, still like to keep version 1 as its more readable. For this reason, i.e., to mitigate the consequences of this historic design choice, a special kind of construct was introduced: Discards.
The idea behind discards is simple: Introduce a special variable called _
that can always be assigned to. It can never be read - it is a write-only variable that will be optimized away by the compiler anyway.
Using this variable, we can come back to version 1:
static void Run(Action action)
{
_ = action ?? throw new ArgumentNullException(nameof(action));
action();
}
That _
is a special kind of construct can be seen on multiple occasions. Let's say we have multiple of these checks:
static void Run<T>(Action<T> action, T arg)
where T : class
{
_ = action ?? throw new ArgumentNullException(nameof(action));
_ = arg ?? throw new ArgumentNullException(nameof(arg));
action(arg);
}
Obviously, the type of action
and arg
will most likely be different. In any case, the assignment is accepted. The same can be done with unnecessary out
parameters:
if (int.TryParse(str, out _))
{
}
Another useful instance is to "fire and forget" tasks. Earlier, I usually introduced an extension method that looks as follows:
public static class TaskExtensions
{
public static void Forget(this Task task)
{
}
}
The advantage was that now I could quite easily inform the C# compiler that a used task has been unused on purpose:
public Task StartTask()
{
}
public void OnClick()
{
StartTask().Forget();
}
Using discards, we don't need extra extension methods to transport such information. Also, users already know what "will happen" to the task (hint: the answer is nothing).
public void OnClick()
{
_ = StartTask();
}
We will also use discards in the pattern matching section.
Useful for | Avoid for |
- Throwing away information
- Run any expression
- "Forgetting" tasks
|
- Storing information
- Side-effect free expressions
|
Handling Asynchronous Code
We already touched the topic of asynchronous code briefly in the previous section. Since .NET 4, we have the Task
type, which is quite handy to tame multiple streams of work. Together with the task parallel library (TPL) and async / await (C# 5 / .NET 4.5), we have a powerful toolbelt that only improved over the years.
But why does async
/ await
require a specific version of .NET? Isn't this just a language feature? Like always (e.g., interpolated strings, tuples) if we require a specific version of the base class library (BCL), we immediately know that some code is generated which uses the types from the BCL. In case of a method being decorated as async
, it will generate a new class implementing the IAsyncStateMachine
interface.
The IAsyncStateMachine
interface looks as follows:
public interface IAsyncStateMachine
{
void MoveNext();
void SetStateMachine(IAsyncStateMachine stateMachine);
}
Interesting enough, it has a MoveNext
method just like the IEnumerator
interface. In fact, we could use a specialized version of an IEnumerator
to write our own async
/ await
implementation. Coming from JavaScript, we know that generators (the JavaScript name for the enumerator / yield syntax sugar) have been (ab)used to introduce async
/ await
capabilities before the feature arrived in the language. Even today, polyfills still use this (or fall back even one level before that in case generators are not available).
Let's look at a simple example of a method using async
and await
:
async static Task Run(Func<Task> action)
{
await action();
}
This little snippet generated a class to look as follows:
In MSIL, the generated class is then used in the given Run
method:
IL_0000: newobj UserQuery+<Run>d__1..ctor
IL_0005: stloc.0
IL_0006: ldloc.0
IL_0007: ldarg.0
IL_0008: stfld UserQuery<Run>d__1.action
IL_000D: ldloc.0
IL_000E: call System.Runtime.CompilerServices.AsyncTaskMethodBuilder.Create
IL_0013: stfld UserQuery+<Run>d__1.<>t__builder
IL_0018: ldloc.0
IL_0019: ldc.i4.m1
IL_001A: stfld UserQuery+<Run>d__1.<>1__state
IL_001F: ldloc.0
IL_0020: ldfld UserQuery+<Run>d__1.<>t__builder
IL_0025: stloc.1
IL_0026: ldloca.s 01
IL_0028: ldloca.s 00
IL_002A: call System.Runtime.CompilerServices.AsyncTaskMethodBuilder.Start<<Run>d__1>
IL_002F: ldloc.0
IL_0030: ldflda UserQuery+<Run>d__1.<>t__builder
IL_0035: call System.Runtime.CompilerServices.AsyncTaskMethodBuilder.get_Task
IL_003A: ret
In short, we instantiate the generated class, store the used arguments (captures) and set the state to be used within the async state machine. The static Create
method of the async task method builder is used to construct the associated builder state. Then we run the async task method builder to construct us a task for this. Finally, we return the Task
property from the builder.
Needless to say, a simpler version of the code above would have been:
static Task Run(Func<Task> action)
{
return action();
}
These two variants are not exactly equivalent. Beforehand, we returned a newly generated task, "wrapping" the original task. Now, we return the original one. Performance-wise, they are certainly not the same. In this version, we omit a full class generation. Also, the MSIL for running the method is super short in comparison:
IL_0000: nop
IL_0001: ldarg.0
IL_0002: callvirt System.Func<System.Threading.Tasks.Task>.Invoke
IL_0007: stloc.0
IL_0008: br.s IL_000A
IL_000A: ldloc.0
IL_000B: ret
Obviously, we would only use async
methods if we have multiple await
s or complicated structures (e.g., only await
in a certain if
block). In any other case, we should try to go with something lighter. Both, compile-time and runtime will thank us with faster execution.
This is even more true for wrapping standard items in a task. Consider we create a class that implements an interface demanding the following method:
public Task<Task> GetNameAsync()
{
}
If we already know the name, we could return it directly, but how to wrap it in a task? The simplest case is decorating the method with async
, but instead, we could also use Task.FromResult
:
public Task<Task> GetNameAsync() => Task.FromResult("constant name");
As a rule of thumb, always look for non-generated solutions first.
At this point, we could write about certain benefits, e.g., when to use ConfigureAwait(false)
and all the things we are allowed to do with async
/ await
these days (e.g., in try
-catch
blocks), but I feel that many articles (including my own) did that already. Instead, I want to touch the topic of iterator awaits.
Before C# 8, we had no good way of dealing with asynchronous streams. Awaiting the stream is equivalent to only reacting when the stream has finished. The alternative is to await until the first data comes from the stream. This, however, also does not solve the issue as data is not present. What we want is an implicit loop that may await
until certain chunks of data are available. The loop ends when the stream is finished.
All this sounds like a boost in the iterator. Again, the following is not the solution:
foreach (var item in await GetItems())
{
}
Potentially, we could wrap the stream resulting in:
foreach (var getNextItem in GetItems())
{
var item = await getNextItem();
}
But if there is no next item? We now place a callback. If there is none, we could either receive null
or throw an exception. Both scenarios have clear drawbacks. Hence, let's go with a custom data type.
foreach (var getNextItem in GetItems())
{
var state = await getNextItem();
if (state.Finished)
{
break;
}
var item = state.Current;
}
It's still all a bit messy, especially from a boilerplate point of view. Thus, we now have await foreach
. This one can be used in conjunction with the new IAsyncEnumerable
interface.
public async IAsyncEnumerable<int> GetNumbersAsync()
{
var current = 0;
while (++current < 4)
{
await Task.Delay(500);
yield return current;
}
}
I want to spare you now the details how this is generated (and what is exactly generated), but you can guess it's similar to the structures we inspected beforehand. In the end, it's just the state machine of async
/ await
combined with the iterator.
That all of this is similar can be seen directly by inspecting the async
enumerable - we see that this is quite like a "normal" enumerable. It just is now dependent on an async
enumerator called IAsyncEnumerator
(who would have guessed?):
public interface IAsyncEnumerator<out T> : IAsyncDisposable
{
T Current { get; }
ValueTask<bool> MoveNextAsync();
}
Great, so how's the syntax sugar for this looking like?
await foreach (var item in GetNumbersAsync())
{
}
Wonderful, so now also this gap is closed! The async
iterator can be super powerful especially for streams of events.
Useful for | Avoid for |
- Complex task logic
- Taming multiple work streams
- Handling asynchronous events / streams
|
|
Pattern Matching
In recent years, the direction of C# has certainly changed a bit. It picked up more and more functional concepts. One of the more interesting concepts is pattern matching. Pattern matching in C# comes in multiple ways, for instance, in an improved is
operator. Officially, they call it a "pattern expression".
Beforehand, we had to use all kinds of different operators to achieve something like a type transformation with a subsequent check. For classes, we could have used as
:
var element = node as IElement;
if (element != null)
{
}
But with the new power of the is
operator, such fragments / temporary variables are no longer necessary. We can just write code that reads well.
if (node is IElement element)
{
}
Perfect, right? Boring you say. Alright, so maybe the new switch
control structure is more for your taste. Personally, I would have liked to see a new match
construct, however, I can see why new reserved keywords have been avoided and I like the progressive approach.
switch (node)
{
case IElement element:
break;
}
Let's inspect the generated MSIL code for this construct:
IL_0000: nop
IL_0001: ldarg.1
IL_0002: stloc.3
IL_0003: ldloc.3
IL_0004: stloc.0
IL_0005: ldloc.0
IL_0006: brtrue.s IL_000A
IL_0008: br.s IL_0016
IL_000A: ldloc.0
IL_000B: isinst IElement
IL_0010: dup
IL_0011: stloc.1
IL_0012: brfalse.s IL_0016
IL_0014: br.s IL_0018
IL_0016: br.s IL_001E
IL_0018: ldloc.1
IL_0019: stloc.2
IL_001A: br.s IL_001C
IL_001C: br.s IL_001E
IL_001E: ret
Alright, no magic here. It's pretty much the same as if we write:
if (node is IElement)
{
var element = (IElement)node;
}
There are a few subtle differences though. Most notably, we have an explicit cast in the generated MSIL form. Using the previously mentioned pattern expression, we would be even closer to the MSIL code generated by using the new switch
construct. Thus, we can really say switch
is purely syntax sugar to avoid repetition.
Is it just syntax sugar over pattern expressions? Well, at least, it's nice sweet sugar. Especially, since it comes with extensions. In the context of a switch
branch, we can use the when
keyword to introduce more conditions.
The C# documentation lists a great example:
switch (shape)
{
case Square s when s.Side == 0:
case Circle c when c.Radius == 0:
return 0;
case Square s:
return s.Side * s.Side;
case Circle c:
return c.Radius * c.Radius * Math.PI;
}
Wonderful - this way, we avoid complicated constructs that would need to use goto
or local functions for avoiding repetitions.
The C# team even went one step further. Besides explicit types (which would check if a cast is possible), we can also implicitly use the current type. As usual, var
is the keyword that triggers type inference.
The official documentation mentions the following example:
switch (shapeDescription)
{
case "circle":
return new Circle(2);
case "square":
return new Square(4);
case "large-circle":
return new Circle(12);
case var o when (o?.Trim().Length ?? 0) == 0:
return null;
}
Hence, the specific white-space case uses the implicit type for triggering additional checks using when
. As an alternative, we could have written case string o when
. Nevertheless, var
should be preferred as it will also stand the test of time in case of refactoring. Furthermore, it will transport to the reader "hey I don't want to check the casting here, I just want to introduce more conditions". After all, transporting intentions to the reader is important.
Useful for | Avoid for |
- Avoiding casting repetitions
- Simplifying many branch splitting
- Bringing together scattered, but related blocks of logic
|
- Just replacing simple
if statements - Merging whole blocks of unrelated logic
|
Nullable Types
Finally, something about types! Potentially, the most important change in years (or ever) in C# development has come. Nullable types!
What? I mean, every class represents a heap allocated object that must be created first and otherwise points to a default address known as "null
pointer" or simply null
. The null
reference exception is potentially the most striking one and it speaks about the age of the language (or framework) that it's not covered by default in the type system. Luckily, there are some pretty smart people in the C# language team and they came up with a solution that is both, progressive and fitting.
Previously, we just received some type information from the methods we have been calling. An example would be the following code:
var element = document.QuerySelector("a");
With nullable types, every type T
is non-null
. This is now in alignment to value types, which require a wrapper be (fake) nullable (T?
or Nullable<T>
). For reference types, no such wrapper exists, however, the information is transported via the metadata instead.
Since this is a quite sensitive feature, it needs to be enabled first. The following lines must appear in the csproj file of the project where we want to introduce nullable types.
<LangVersion>8.0</LangVersion>
<Nullable>enable</Nullable>
Now instead of returning just the type, we can also decorate it using the question mark to signal a return that is potentially null
.
var element = document.QuerySelector("a");
The same holds true for method signatures. Let's consider the following signature:
public void Insert(IElement element);
Using the method with an IElement?
instance is not allowed. Instead, we have to introduce type guards.
public void InsertMaybe(IElement? element)
{
if (!(element is null))
{
Insert(element);
}
}
Long story short: Nullable makes our life easier by detecting where we require guarding and where not. We should treat nullability violations as errors.
Every method that works in the nullable context will be annotated accordingly with a NonNullTypes
attribute. In addition, references that are nullable are also explicitly marked as Nullable
. As a result, the C# compiler is capable of inferring the correct usage also from third-party libraries or BCL where no source code is given.
Besides the generated metadata, everything stays as usual. There are no MSIL implications. This is just an upgrade for making all applications more robust and better.
Useful for | Avoid for |
- Transporting intentions
- Warning / code robustness
- Avoiding too much guarding
|
- Trusting non annotated code
|
Important: This is a compile-time only mechanism for reference types (i.e., classes) and has nothing to do with Nullable<T>
, which is represents a value type (i.e., structs) that can be assigned null
.
Outlook
This has been the last part of the series. It's been a pleasure and a joy to compile this collection together.
With respect to types, the evolution of C# seems to not have finished yet. Languages such as F# (CLR based) or TypeScript (with JS transpilation) show what is possible and how. We can expect a lot more improvements in the ecosystem to arrive within the next couple of years.
Conclusion
The evolution of C# has not stopped at the used and generated types. We saw that C# gives us some more advanced techniques to gain flexibility without much help from outside tooling. Nevertheless, the help from external tooling gives us much more possibilities without making many sacrifices.
Personally, I hope that TypeScript's flexible type system can be used as a role model for bringing some advanced compile-time manipulation, creation, and evaluation of types to our tool belt.
Points of Interest
I always showed the non-optimized MSIL code. Once MSIL code gets optimized (or is even running), it may look a little bit different. Here, actually observed differences between the different methods may actually vanish. Nevertheless, as we focused on developer flexibility and efficiency in this article (instead of application performance), all recommendations still hold.
If you spot something interesting in another mode (e.g., release mode, x86, ...), then write a comment. Any additional insight is always appreciated!
History
- v1.0.0 | Initial release | 28.09.2019
- v1.1.0 | Added table of contents | 30.09.2019
- v1.1.1 | Added remark on partial | 01.10.2019
- v1.2.0 | Included
Nullable
type | 02.10.2019