Justification
When I finished the Contrained C# post I was talking about multi-threading and locks and I really wanted to write a new post about ideas on how to reduce multi-threading issues by adding more knowledge about shared states to the language itself.
Unfortunately, it will not be this time. The other post generated many comments (mostly complaints) and, even if some of them seem to come from people that didn't make any effort to understand what I was saying, there are some false assumptions made by more than one reader that I can attribute to comparison to other languages, so I decided to write this post to try to clarify my intent.
Constrained C# is not C#
Stating the obvious, Constrained C# is not C#. The entire idea of the last post was to present an alternative evolution to C# and, considering that C# already has its own evolution, it is guaranteed that it is not going to happen in the C# language itself. A new language can be made based on C#, and I am calling this language Constrained C# (yet I am not promising that such a language will be really written). If it is created, it will have lots of breaking changes compared to how C# works today because its purpose is to avoid common errors that are simply allowed in C#, and avoiding code that compiles today from compiling is a big breaking change. Simply creating new safety features in C# may help, but it is not enough as bad and distracted developers will forget to use them from time-to-time (and really good developers can possibly live without them anyway).
So, I started talking about managed languages to show that limiting actions actually has advantages, like reducing bugs, making it possible to create partial trust applications because developers can't corrupt memory even if they want to and allowing things like "compacting the heap" that are actually possible but extremely hard in unmanaged languages, requiring extreme discipline to make it work.
Compilers can keep that discipline 100% of the time and that's why managed references exist. So, my idea is to apply even more constraints to the C# language to make it safer and one of the constraints is to disallow developers from simply dereferencing null variables. Yet, to avoid excessive validations done by developers and at run-time (as today any reference-type can be null), allow them to mark which variables allow null and which variables don't, and this was the topic that caused most confusion, so let's see some more details and examples.
Required and Optional
I received complaints like:
- If we always need to use an empty string ("") instead of null we will still need to check for that value or some code may end-up executing and generating wrong results instead of throwing immediately. It would be even worse and what's the value that we should use for other types? Should all of them have an "IsEmpty" property or something?
- How will the compiler know if the variable is null or not? It will need to execute the code instead of only compiling it;
- Why should a string become a value-type when it is clearly a reference-type?
And the answers are:
- You are not expected to use fake default values. If you don't have a real value to initialize your variable, make your variable nullable;
- The compiler doesn't need to know if a method call is going to return null or not. It only needs to know if it is possible to be null or not. A required string used as input parameter, for example, is already known to be not-null inside that method. If you want to assign it to another required string, you can. A method that returns an optional string in its signature, even if it is implemented to always return "Test", will be considered to be potentially null as only the signature matters, so you will need to first check if the result is null or not before accessing its inner contents;
- A required string (or any required reference-type) is not going to become a value-type. This is not C++ where we can have a pointer to an object (that accept null or any invalid pointer in unmanaged C++) or the object directly (allocated in the stack or directly inside another object's memory, without a pointer). A required string is a normal reference string with the guarantee that it will not be assigned a null value.
And as I got some sample code to "prove" the arguments, I will use some sample code to prove my point.
Look at this C# code:
public interface IAnimal
{
void Talk();
}
public sealed class Cat:
IAnimal
{
public void Talk()
{
Console.WriteLine("Meow");
}
}
public sealed class Dog:
IAnimal
{
public void Talk()
{
Console.WriteLine("Woof woof");
}
}
public sealed class Human:
IAnimal
{
public void Talk()
{
Console.WriteLine("What do you want me to say?");
}
}
public static class AnimalFactory
{
public static bool AllowHumans { get; set; }
public static IAnimal Get(string animalName)
{
IAnimal result = null;
switch(animalName.ToLowerInvariant())
{
case "cat": result = new Cat(); break;
case "dog": result = new Dog(); break;
case "human":
{
if (AllowHumans)
result = new Human();
break;
}
}
return result;
}
}
Examples similar to this were used to ask/say things like:
- How will the compiler forbid null from being returned? The logic is relatively complex and the compiler can't know if null will be returned or not. The logic can always become more complex and fool the compiler;
- If the language doesn't have null, then we will be forced to return an "UnknownAnimal" to allow the code to compile, yet users aren't expected to use that object, so they will need to check for that particular object and it must be implemented to throw to make debugging easier;
- Any developer knows that he must check if results are null or not. If they don't do that they are stupid, there's no need for the language to do this.
And the answers:
Try declaring IAnimal result; without immediately setting it to null and try compiling this code in C#. Ignore Contrained C# for a moment. What will be the result?
An error saying "Use of unassigned local variable 'result'" on the return result
line.
That's what will happen in Constrained C# if you declare the result as being a required IAnimal. You will not be able to initialize it with null because Constrained C# will forbid such an assignement to a required variable but the logic that verifies if all the paths assign a value to the variable will be kept.
- I never said the language will not have null. I said it must not have the NullReferenceException, which is completely different. It has optional variables and required variables. Optional variables do have null. Don't return a fake result (that would be a kind of magic value, and I explained why they are bad in the post Design and Implementation Mistakes). If the method may not give a result, only make it clear that the result is optional. In this sample, it would be better to be an optional IAnimal. That is, declare the result as IAnimal? (with the ? after the type name) and everything will be fine;
Imagine that instead of using the static AnimalFactory you are using an interface, like this:
public interface IAnimalFactory
{
IAnimal TryGet(string animalName);
IAnimal Get(string animalName);
}
It is pretty clear that developers need to check for null if they call TryGet. But will you check for a null result when calling Get?
- If you say "no" because the contract says that null can't be returned, then your code will be bugged the day someone hands you an IAnimalFactory instance that actually ignores this rule. Yes, even if null shouldn't be returned there's a big change that your code will be considered bugged for being unprotected for that situation. In fact, the contract (the interface) says it returns an IAnimal and everybody knows that, in C#, IAnimal can return null. It is the comment, not the contract, that says it should not return null;
- If you say "yes" then you are writing non-optimized code. Why would you test for null when it is explicitly stated the null must never be returned? If you don't trust the implementations coming from other developers, then you simply made the Get() method useless, you only need a TryGet() method.
As you can see, this becomes a matter of choosing between performance or safety if you deal with code coming from external sources. Why not have both? In Constrained C#, you can.
Nomenclature - A Little out of Topic
Today there are many classes that have methods named Get without an equivalent TryGet and some of them return null while some others throw exception if they can give you a valid result. Before presenting the interface, that was the case of the AnimalFactory.Get method. Really confusing, don't you think?
Now, see the signatures in Constrained C#:
IAnimal? TryGet(string animalName);
IAnimal Get(string animalName);
By these signatures, the compiler will not allow you to call animalFactory.TryGet("Some Animal Name Here").Talk(); because the result is potentially null.
It will also not allow you to do:
IAnimal animal = animalFactory.Get("cat");
if (animal != null)
animal.Talk();
Telling you that animal can't be never null (or will at least give you a warning, as happens when you compare non-nullable structs to null in the current C#).
If it happens that the developer doesn't provide two methods and there's only a Get() method, you don't need to look at the documentation to know if it can return null or not. Look at the result type. If it is optional, null can be returned and you must check for that (and the compiler will enforce that). If it is required, then null will never be returned and it is very likely that unknown inputs will throw an exception. That's something that Constrained C# can't force you to do, but as a personal rule, if only one version exists, it must be the version that doesn't throw exceptions. Users of any API should be able to use it without receiving exceptions when the "list of available inputs" is unknown.
Talking about the interface again, in Constrained C# there's the added benefit that both TryGet and Get can't receive null as the animalName (it is a required string). Developers implementing the interface don't need to check for null inside the method bodies, making things easier for the developer and even a little faster at run-time as that null check can be avoided (a really small optimization looking at a single method, but maybe a big gain considering all the situations where it can be avoided in all the APIs that exist for a language, especially the low-level ones).
Required this - A Little out of Topic again
The C# language actually has the concept of required reference types, but in a very specific situation. Tell me, how many times did you do things like:
if (this == null)
throw new ArgumentNullException("this");
It looks pretty silly, but it is a problem in C++. In .NET invoking even non-virtual methods of null instances throws a NullReferenceException
before actually entering the method's body. This is still a case of a NullReferenceException but I believe it is better than the C++ alternative in most cases, simply because almost everybody forgets to verify if this
is null or not.
Do you want a situation where it can be a problem? What about this:
void SomeType::SetParent(SomeContainerType *parent)
{
if (parent != nullptr)
parent->_children.add(this);
if (_parent != nullptr)
_parent->_children.remove(this);
_parent = parent;
}
In this contrived example, the SomeType is a friend class of the SomeContainerType, so it can access the _children collection directly. This is OK in theory, as it adds itself, not any random value, to the new parent.
Even if it doesn't have the best performance, setting the parent to the same value will only add and then remove the same object into the collection, without any real problems.
Yet, if we do this:
SomeType *object = nullptr;
object->SetParent(aRealParentHere);
The SetParent will execute and do bad things. The parent object will receive a new children to its collection (a nullptr) and an access violation will happen when checking if the old _parent of the nullptr instance was set. Well, it is already too late, the parent control is corrupted now and using it will propably cause new access violations.
So, if you always check for nulls, I will ask again: How many times did you check if this
is null? When using C# I never do such verification, as that's simply impossible. If that's happenning, I am probably using unsafe code or my computer is on fire or something. That is, "this" is a required object in C#, independently if it is coming from a struct or a class.
Non-Null Empty Values
As the empty string was used by more than one person as an example of a default value (including accusations where I supposely told everyone to use empty strings instead of null) I decided to make it clear: I consider it terrible that some types can be null, non-null but empty or have real contents.
I understand that happening when we have mutable collections, as we usually start with an empty collection and then add some items, but I disagree on most other cases, which includes most structs that can answer that they are empty and strings.
The reason for structs that answer that they are empty is also comprehensible in many situations. Some of them are created as performance optimizations, and almost all good practices can be ignored when we look for the best performance. Many of them appeared before the concept of nullable, so they were responsible for telling when they were conceptually "null". And considering that today structs can always be default initialized (that is, their memory bytes set to zero, even if that wasn't supposed to be a valid value for the struct itself) many structs end-up in that inconsistent state they called Empty.
This is not the case of the string type. If written differently the string type will never allow a zero length instance, being completely empty (null) or having at least one character.
And I will say it again: I am not telling to initialize all your strings with a single space to have default values. When you don't have a real value to initialize your variable, declare it as nullable and let null do its magic work of saying "I am a variable that's not actually pointing to anything".
Default Values
With the misunderstanding that I was requesting the use of empty strings instead of null, there was a lot of complaints about how other types will be initialized, yet there was a common conclusion that it is OK for value-types (structs and primitives) as they can always be initialized to default.
Well... in the other post I said that I consider the default initialization of structs as a problem. That's actually one of the reasons to have null or empty when dealing with structs (the problem of the previous topic), as they can always have a "default" where their memory is simply filled with zeros.
This seems the kind of solution to a problem that happens in C++ that ended up creating another problem in C#. In C++ if we don't initialize variables to zero, null or the equivalent, they can contain garbage that was on the memory where they were allocated, as there's no process of cleaning the memory before use and the initialization of those variables isn't enforced. In .NET, the memory bytes are set to zero. In my opinion, it would be much smarter if all variables were requested to be initialized, as already happens on struct constructors (aside from the automatic default one) as this would avoid making the numeric 0 a magic value and will avoid other subtle bugs related to it.
Only to show an example, look at this mutable class that represents database records:
public sealed class Person
{
public string Name { get; set; }
public int Age { get; set; }
}
Imagine that in the database both Name and Age are NOT NULL and this code is real C#. You create a person and you try to save it immediately to the database, and the action fails. You need to set a Name.
You set a name, try again and if 0 is an invalid Age you will have an error because of that. If not, like a system where you register babies and newborns, you will be saving people with an Age of 0 simply because that information wasn't filled. You will not know which ones are real zero (newborns) and which ones are unfilled.
So, do the right thing, use a nullable type for the age (or you can ignore the idea that magic values are bad and initialize Age to -1... yet you will be initializing it with a value) and solve the problem by having a different value to represent not filled.
And that's why I consider that all required variables must be initialized, including structs and primitives. If under the hood all memory is initialized to zero, initializing ints to zero may become nothing in machine instructions, but the Constrained C# will not simply use automatic defaults for required variables. For nullable variables, I think it is OK, after all you are already requesting a nullable variable for a reason... but I can even change my mind on that and force even nullable variables to be explicitly initialized with null. Again, this may be reduced to nothing if the memory is already filled with zeros, but the developers will need to be explicit.
I really don't think it will be that annoying, after all local variables and struct constructors are always forced to initialize all variables, even to null. Only the fields declared in classes aren't required to be initialized and have a default. We can change that.
As a side note of default invalid values, I hate that when I register a new phone number in my cellphone and the combobox that allows me to fill if it is a cellphone number, work number, home number etc is naturally filled with Cellphone. If I don't know what the number or simply don't look the other fields, filling only name and number, I end-up registering a "cellphone". Later, when I look at my phonebook I never know if a number filled as a cellphone is a real cellphone (so I can send SMS messages) or if it is any other phone number that I never filled the type. Is this simply an error in the form or is this an enum initialized to zero, where zero means Cellphone?
That sample Person class in Constrained C#
If you simply get that last class and try to compile in in Constrained C#
(considering the compiler is made), you should receive 2 compile-time errors telling that both Name and Age are required but they were never initialized.
Considering you know 0 is a valid age (and I am considering that empty strings don't exist in Constrained C#), will you do:
-
public sealed class Person
{
public string Name { get; set; } = "Some fake name here";
public int Age { get; set; } = 0;
}
-
public sealed class Person
{
public string? Name { get; set; }
public int Age { get; set; } = 0;
}
Or this:
public sealed class Person
{
public string? Name { get; set; }
public int? Age { get; set; };
}
I know what's my answer and I hope you agree with me. Only to avoid complaints of people asking me why I chose the first or second option, I actually chose the third option as the best for a mutable class. Immutable classes are a discussion for another day.
Conclusion
I know that it is impossible to avoid developers from doing bad code, yet when the compiler can help them avoid common mistakes, it is already a win. When those extra guarantees can be put in interfaces expected to be implemented by external code, it is a bigger win as we don't need to keep asserting things at run-time because those are enforced by the compiler at compile-time.
I hope this time things are clear. CodeProject