Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / C#

Constrained C#

4.97/5 (9 votes)
28 Nov 2014CPOL22 min read 33.1K  
What if C# evolved differently?

Introduction

When I started to use computers people used to say that "computers do what we ask them to do, not necessarely what we want them to do". And that was related to normal computer usage. When developping, things are far worse. It is pretty easy to destroy memory contents and crash applications.

Well, managed languages have shown us that restricting the operations that developers can do over the computer's memory doesn't necessarely reduce the expected results but it definitely reduces all sorts of bugs. Yet the managed languages (at least the ones I know) still allow developers to write code that's guaranteed to crash. Some new languages sometimes have a safety feature that others don't, avoiding lots of bugs caused by simple lack of attention, and that feature is used as argument that we must change from one language to the other (and to even change from one paradigm to the other).

Actually I believe that changing from one language to the other is hard, for lots of reasons. Usually the languages with some extremely interesting features are lacking lots of others. They work as a proof of concept for the feature but we can't use it for real applications. There's also the difficulty into changing people's mind and, finally, most developers and companies have their own code developed during the years and they don't want to lose that work by going to a completely new language.

The "solution" to this is to include interesting features found in new languages into the other well-known languages, as long as they don't cause breaking changes. That last statement is the problem, as developers will have the option to use the safe feature or to continue using the old "unsafe" one, causing bugs with ease (this post is about C#, but the first language that comes to my mind about the old ways of doing things is C++).

Well, my idea here is to present a possible evolution to C#, that I am calling Constrained C#, which adds some extra safety features to the language, limiting the errors that can be committed in the language at the cost of losing backwards compatibility. Yet, it is not a completely new language, so C# developers don't need to relearn everything. They will simply need to understand that common mistakes will be avoided at compile-time and, of course, they will need to understand some new concepts, but nothing that big.

Note

I will only expose the idea of a C# alternative, without actually giving any new compiler. It is possible that in the future I will create a parser for Constrained C# that either compiles the results to .NET binaries or generates .cs files doing all the extra stuff needed, yet I will not promise anything.

I only hope I will have the time to really work on it.

NullReferenceException Should Not Exist

C#
string str = null;
Console.WriteLine(str.Length);

When I see this piece of code I know it is going to crash. It is not hard to figure out that we can't access the Length property of a null string. Yet the compiler allows us to do this and the result will be always the same. A NullReferenceException.

It is not important if no-one is going to write that particular block of code. I wrote it like that to make the code small. What happens is that many times when we receive a variable coming from somewhere else we assume it is not null and use it directly, having the possibility of the NullReferenceException when someone else actually gaves us a null. For this problem we can have solutions in two different areas:

  1. Have a way to tell that some variables can never be null;
  2. Don't allow the code to access the null variable.

Required / Optional Variables

Many time ago I suggested adding a "required" modifier in input parameters but I made a big mistake by saying that maybe we could use the ! symbol to make it similar to the ? used in nullable types. Many people immediately understood that I was requesting a wrapper type that avoided null and argued that adding a new value to an existing type is easy, but how can we avoid null on reference-types? What value would be used as the default value? We can't create non-nullable reference types.

Well, I was talking about input parameters, not about a new type that supported less values. I even showed the concept working when I wrote a small C#-like compiler/interpreter, named POLAR. In it, any method parameter could be declared as required. That actually only added those if (something == null) throw new ArgumentNullException("something"); with less code and without the risk of writing the wrong parameter name in the string.

That worked fine and I really believe it could be added to C# without any breaking changes. Yet, that's very limited and I don't like the fact that we have different syntaxes to write variables that support null or not when using classes and structs.

So, my idea is that any variable declared without explicitly telling it supports null doesn't support it at all. And the ? symbol will be used after the type to tell that it is nullable, independently if we are talking about structs or classes. And yes, it could be used when declaring any variable, independently if they are inside classes, structs, local variables etc.

Returning to the problem "what's the default value for a required class type", the answer is: There isn't. The compiler should not allow a required variable to be declared if it is not initialized immediately or inside the constructor, before any instance method call. Note that such kind of verification is already done when implementing the constructor of a struct. All member variables must be initialized, even if it is to the default value.

Optional to Required

It is important to note that having optional and required variables should be much more than simply putting an if (something == null) throw new ArgumentNullException("something");. A null value must not be accepted at compile-time. The run-time check will only be necessary when dealing with other languages, as the Constrained C# compiler will never allow a null to be set to required variable, even when receiving values from an optional variable. That is, some kind of "validation process" must be done.

If we look at how nullable variables work, we can think of optional variables to be of an "wrapper type" having properties like Value and HasValue. I am using that only as a comparison, as I actually don't like the fact that the Value property throws an exception when the real value is null. Property reads aren't supposed to throw exceptions. For example, look this:

C#
int? value1 = ReadInt();
int? value2 = ReadInt();
Console.WriteLine(value1.Value + value2.Value);

The last line of code doesn't exposes to me that an exception could be thrown. In fact, it becomes similar to a NullReferenceException but the situation is more bizarre as value1 and value2 can be null and we can still use the HasValue property, but not the Value property.

Now, isn't the following code a little easier to understand?

C#
int? value1 = ReadInt();
int? value2 = ReadInt();
Console.WriteLine(value1.GetValueOrThrow() + value2.GetValueOrThrow());

The simple fact of seeing that OrThrow at the end of the method name makes it clear that the call can throw. The developer can't say he had no idea that an exception could happen. He explicitly asked for that.

Yet, I should say that such a solution only makes the problem more apparent without really solving it. This can be useful for some developers as they will immediately do something to avoid exceptions, but this can be useless to other developers that will simply get used to call the GetValueOrThrow() blindly. And we will still be dealing with exceptions. So, why not go one step further and forbid such a situation?

For example, if we write this:

C#
string str;
Console.WriteLine(str);

The compiler will generate an error saying that str was never initialized.

So, it could do a similar verification and forbid this:

C#
string? str = MaybeGetAString();
MethodThatRequiresAString(str.Value);

Saying that we can't access the str.Value because it can be null. In this case, I return to the idea of having a property, not a method, as the property will never generate exceptions. It is the compiler that should guarantee that we are not in a "null accepting path".

So, to solve this, the developer will probably need to write something like:

C#
string? str = MaybeGetAString();
if (str.HasValue)
  MethodThatRequiresAString(str.Value);

And the compiler would not complain because the value would be validated.

Actually, I don't know if inside the if(str.HasValue) we would require to access the string through the str.Value property. Maybe when safe the variable could be treated like a required variable without any extra indirection. If that's the case, it would be necessary to avoid a HasValue property to avoid ambiguity if a class actually has that property, so we could have a hasvalue keyword:

C#
string? str = MaybeGetAString();
if (hasvalue(str))
  MethodThatRequiresAString(str);

Wouldn't it become tedious to test for non-null all the time when we know that some variables can't be null?

That's why variables are required by default. If the method wasn't MaybeGetAString() returning an optional string, but an AlwaysGetAString() returning a required string we will be able to declare the str variable as required too and the use of hasvalue() could even generate a compile-time error telling us that it can only be used with optional variables.

Surely this can be a problem when "importing" old code that never tells when variables are optional or not (and so everything is considered as optional) but as I already said in the beginning, I know this entire idea is a breaking change, so lots of code will need to be rewritten (or we will need to deal with terrible "imports").

Property Paths with Optional Values

C#
string countryName = person.City.State.Country.Name;

In the previous example of code, what happens if person, City, State or Country is null?

Talking about the actual C#, it will throw a NullReferenceException. In C# 6 there's the ?. operator (null conditional operator) that allows us to write this code with a safety guarantee:

C#
string countryName = person?.City?.State?.Country?.Name;

By using the ?. operator, null is returned if the expression on the left already returns null, without executing what is at right. It does something pretty similar to this:

C#
string countryName = null;
if (person != null)
{
  var city = person.City;
  if (city != null)
  {
    var state = city.State;
    if (state != null)
    {
      var country = state.Country;
      if (country != null)
        countryName = country.Name;
    }
  }
}

I imagine you can already see how good it is to use a single-liner instead of all those lines to simply "navigate a property path". The only problem is that in C# you can still continue to write person.City.State.Country.Name and receive NullReferenceExceptions if any item on the path is null. Also, as a personal comment, I don't really like the ?. symbol. Maybe it's me, but things like => and -> look like a single "symbol" even if they are composed by two other symbols. ?. appears that someone simply used the wrong punctuation and ended-up adding another punctuation instead of replacing one with the other.

So, I originally though: What if in Constrained C# using the dot to access items was always equivalent to a ?. ? It would make everything easier and simpler to read, right? Users will be able to use string? countryName = person.City.State.Country.Name and everything would work fine.

Well... I am only presenting this because I had that idea and I believe many other developers may think about it if I don't show it directly, but my final conclusion is that it is not a good idea. First, many developers need to use more than one language and if they get used to avoid null checks simply because one language allows them to do that, well, they may end-up writing terrible code in the other language(s) they are using. Maybe that can be used as an argument to avoid those "old" languages and use a more modern one, but I still only see a bad practice coming from it.

About a bad practice, that's the second problem I see. People may simply avoid checking for null, as null objects will become no-actions. Yet, in situations where two or more variables are involved it is very common that either all the actions are done or ignored and that automatic behavior will allow the actions from "variable a" to be ignored (as it is null) yet the actions from the other object will be executed. And this look like that kind of hard-to-find bug where some results are simply coming wrong and we don't know why. A code crashing will at least have a callstack presenting where something went wrong, but this "automatic" solution is avoiding that too.

So, my final conclusion is: Constrained C# should not allow possible null variables to be directly dereferenced using a simple dot and doing it should cause a compile-time error. It will not automatically consider null accesses to return null as that can cause bad practices. Maybe the ?. operator can be used if people really get used to it, but I personally prefer something like:

C#
string? countryName = propagate_null(person.City.State.Country.Name)

Where propagate_null is a special keyword that will treat all the dots inside its parenthesis as the current ?. operator and we avoid such ugly-operator. I really don't care if there will be more keystrokes to achieve the same result, I really believe it will be easier to understand. I also accept suggestions for the actual keyword, as I actually think about propagate_null or optional_path.

Better Dispose

Before Garbage Collection become popular, automatic memory management was usually achieved by reference-counting. When we get a reference to an object, we increase the count, when we lose the reference (because the variable goes out of scope or because it is assigned a new value) we decrease the count and, if it becomes zero, we immediately delete the object. The problem of such a technique is when objects have "crossed-references", that is, object A references object B, which references object A. The count will always be at least one and both objects never die.

Garbage Collection solves this problem by mapping which objects have "roots" but it has its own problem. We don't know when objects are going to really die. Objects first lose all references and, sometime later, they are collected. If we need objects to die immediately, we need an alternative. And that's why Dispose exists.

Well, actually Dispose is terrible. To implement and to use. First, users can Dispose() an object before really finishing using it, effectively doing new calls after the objects was disposed, so the implementation must deal with that situation (or we have ugly crashes). Second, users may completely forget to dispose objects and, in some cases, they prefer to forget than to risk deleting it while there are other references still alive.

So, why not combine Garbage Collection and Reference Counting?

To me, Dispose() should be like Finalize(). Users should not be able to call it directly. Yet disposable objects should count on an automatic reference counting mechanism, dying as soon as possible when the reference count becomes zero, which happens immediately when a single variable holds a reference to it, yet it could be given to different "owners" (be referenced by other variables) and the last one will be responsible for disposing it.

In fact I believe we can even have two kinds of disposable objects. The "die as soon as possible" and the "die when requested, even if you let some unfinished business". For the second, we may want to purposely call Dispose() to "make the object release all memory and become invalid", yet we will not need to implement verifications in all methods to generate the ObjectDisposedExceptions. Those can be achieved by using helper objects over disposable objects, but it is also possible to make them first-class citizens, so the class itself choses the dispose type, yet developers don't need to check everywhere to see if the objects are being disposed at the wrong moment.

So, what do you prefer (consider that in these examples only B is disposable):

  • C#
    A a = new A();
    B b = new B();
    try
    {
      C c = new C();
    
      a.Something(b);
      b.Something(c);
      c.Something(a);
    }
    finally
    {
      b.Dispose();
    }
  • C#
    A a = new A();
    using(B b = new B())
    {
      C c = new C();
    
      a.Something(b);
      b.Something(c);
      c.Something(a);
    }
  • A a = new A();
    B b = new B();
    C c = new C();
    
    a.Something(b);
    b.Something(c);
    c.Something(a);
    // Knowing that B will be disposed as soon as possible simply because its 
    // implementation required immediate disposes, without any need for the 
    // users to care about it.

I certainly prefer the last case. Who writes a component decides if it requires early disposal or not. The language/environment only enforces that by doing automatic counting and the users can use all managed objects in the same manner. If done by the environment, types A or C can change in the future to be disposable and everything will work fine, as users will not need to change their code or even recompile code on other assemblies that's already using the now disposable types.

Better Structs

Structs could be used as a required work-around if they didn't have one problem: They can always be "default" initialized.

Think about it: You create a Required<T> struct. You implement the constructor to check if the input value is null or not (we can actually do this kind of struct in native C++). Yet, in C# anyone can do: new Required<SomeType>() without giving any parameter.

In fact this problem goes one step further, as the .NET IL actually let us have a default constructor for a struct, yet we can use default(Required<SomeType>) or new Required<SomeType>, each one with a different result (which is quite confusing, in my opinion).

Why not do like in C++ and always invoke the default constructor if there's one or avoid creating a new struct instance if it is not initialized with an alternative constructor when there's not a default-one?

With that simple change, any struct would be guaranteed to be always initialized when they don't have a default constructor.

Oh... wait. I already gave this kind of idea before and one of the first answers was "structs are always blittable, so anyone can copy any memory block on them".

And I can answer that: First, that's not true. A struct that has reference-type fields is not blittable (you can put a string inside a struct, for example). Second, it is not important if this is how it works now. It shouldn't be that way. Having the option to be like that is great. Always being like that is terrible.

So, structs should not be blittable by default, they should support custom default constructors (that should be invoked even when using default()) as well as not having a default constructor at all and, to make them complete, they should support "copy operators" instead of having always a direct byte-copy of their contents. This would also help in creating "reference-counted" objects if we don't have the automatic "Dispose" system.

Going a little further - SynchronizationRequired instead of Synchronized

.NET came with a bad design idea that affects all objects: Any object can be used as a lock.

This is considered bad design because all objects become a little bigger to hold the locking information, yet most objects are never used as locks. In fact, it is considered a bad practice to lock objects that have their own behavior because they may be using locks over themselves. The other side of this problem is that it is also a bad practice to do a lock over this, as someone else can be using the object as a lock.

In the end, locks must be other objects, created exclusively for that purpose and all objects are bigger (consume more memory) than needed for a feature that should not be used. And, to make things more "complete", there's the [MethodImpl(MethodImplOptions.Synchronized)] attribute/flag, which is the exactly equivalent of doing a lock(this) over the entire body of a method (so, it is an alternative way of using the bad practice).

I don't know why that attribute/flag was created or if it was expected to be implemented differently. Maybe the purpose was to make it clear, from the signature of a method or class that it was thread-safe (and even making that information available when using reflection).

I must say that this idea seems pretty good if it was implemented in a completely different manner. Instead of putting some information in an object and having it to automatically apply a lock(this) over the entire body of its methods, the flag would be "locking is required to use this object".

Think about it, instead of doing:

C#
threadSafeList.Add(1);
threadSafeList.Add(2);
threadSafeList.Add(3);

And have the object locked/unlocked three times, it is better to lock it once and add three items, like this:

C#
lock(threadSafeList)
{
  threadSafeList.Add(1);
  threadSafeList.Add(2);
  threadSafeList.Add(3);
}

Yet, if we do this over an object that already has its own lock/unlock logic we will only add an extra lock instead of avoiding the inner locks. And if the object doesn't have its own locking mechanism, you will be able to call Add() without any lock, causing problems when there are multiple threads using it.

So, why not receive a compile-time error if you forget to lock the object? As a component writer you say that the object must be locked, but you don't do any locks, yet the component will never get corrupted because the compiler will forbid any user from simply forgetting to lock your component (or as an alternative approach will do the lock automatically, but that's another topic to discuss, so let's keep it simple now). The users will need to lock your component to be able to call any methods, yet they will be able to call many methods without releasing the lock if that's better for them.

That is, by writing your method signature like this:

C#
public lockrequired void Add(int value)

You allow this:

C#
lock(threadSafeList)
{
  threadSafeList.Add(1);
  threadSafeList.Add(2);
  threadSafeList.Add(3);
}

But you don't allow this:

C#
threadSafeList.Add(1);
threadSafeList.Add(2);
threadSafeList.Add(3);

This may not look this good for only an Add method, but think about the old "contains/add" situation:

C#
if (!threadSafeList.Contains(someValue))
{
  threadSafeList.Add(someValue)
}

If each method do its own locking, an item may be added between the Contains and the Add call. But by forcing users to lock, the code will probably look like:

C#
lock(threadSafeList)
{
  if (!threadSafeList.Contains(someValue))
  {
    threadSafeList.Add(someValue)
  }
}

Extending this a little further, a method that wants to manipulate an object could request a locked object, like this:

C#
public void DoSomething(lockrequired List<int> listToModify)

So, when calling DoSomething() the user will need to hold a lock over the listToModify. This means that the method itself doesn't need to lock the object, yet the users can't forget to lock it.

This kind of solution is great because we can have many methods that require locked objects calling each other without doing new locks. Today, as we never trust our users, public methods will always do a new lock, which affects performance and in some cases increases the risk of dead-locks. By making it explicit that the objects must be locked, public methods can simply avoid doing new locks and can call other methods that also require the object to be locked without any risk, as users will be locking the object before doing any call, also making it simpler to keep the object locked when two or more calls need to be "atomic".

Complications

Well, this part of the locking idea is much more complicated than it seems. Actually we have many kinds of locks (exclusive locks, multiple reader/single writer locks, spin locks etc and users can create their own locks) and it would be great to be able to request the appropriate locks, including users locks and also to support alternative locks as long as they are compatible and still have the guarantee that the same locking object is used by everybody.

About supporting alternative locks, an object that requires a reader-writer lock, for example, can naturally be used with an exclusive lock as long as all places use the same exclusive lock object. Also, two or more objects can share a lock, so we will probably need to have ways to tell which objects are tied to which locks. I still don't have a perfect idea on how to solve this. Maybe, only maybe, we could use something like the following when declaring variables of objects that require locks:

C#
lock-type data-type variable-name;

or

C#
lock-type { data-type fieldName, data-type fieldName } variable-name;

Where the locktype must implement some kind of interface like ILock (or be a special kind of class... or anything that fits... that's another part of the problem).

In the first case, we can have something like:

C#
MonitorLock List<string> threadSafeStringList;

So we can lock the threadSafeStringList and as soon as it gets locked, we can access the List<string> methods directly from the variable.

In the second case, we can have something like:

C#
MonitorLock { List<string> stringList, List<int> intList } lists;

We will need to do a lock over "lists", yet each list will be a sub-item of it. I actually don't really like this idea... it is only a basic example of it.

Also, there's the problem that today we never know when an object is used by many threads or not. Independently if a method wants a locked object, an object that's never given to another thread doesn't require a lock at all. Yet, today objects that already do locks in their own code will always lock even when that's not needed, so it will not make the problem worse than it already is. The other part of never knowing when objects are used by many threads or not is that we actually may not be requesting locks when they are needed (and this is the real problem).

Well, I have some ideas on how to solve all this in my mind, but I believe it starts to deviate too much from the current C#. So, for now only think about the idea of having the possibility to tell that a lock is required when declaring types, its members or when receiving input parameters and having the guarantee that users will do the locks because the compiler is enforcing that. It is probably going to reduce lots of common bugs, so, why not?

Conclusion So Far

The conclusion so far is that in a Constrained C# we will not have NullReferenceExceptions at all, having the guarantee that a required parameter will always have a value and also having the guarantee that we will not use a null accepting variable without checking for null by accident. Better yet, the performance hit of checking for not-null on required parameters only need to exist at compile-time, not at run-time.

The implementation of disposable objects will be simplified because developers of disposable objects will not need to check if the object was disposed or not in every method and the users of disposable objects will not need to dispose them at all. Losing a reference will be enough, even when it is a shared reference, as it is ref-counted.

Finally, objects expected to be used by multiple threads will be able to avoid doing their own locks in every method call and will still be safe by requiring locks in their declaration. The same principle that avoids developers from passing null to required arguments will avoid lock-requiring objects to be used without locks while avoiding excessive intermediate locking/unlocking on successive calls.

So, what do you think? Would those features make a programming language better?

Future

In a future post I will try to explore how a language could be built to avoid most of the common multi-threading issues. It is not only a matter of requesting locks, it is a matter of avoiding objects that aren't expected to be used by many threads to be exposed to them. If done correctly, this could even avoid the performance impact of checking if components are being used by the right thread (like happens in WPF).

 

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)