Introduction
I started this post trying to talk about .NET best practices but what I had in my head weren't .NET best practices. It was more a kind of "how to avoid common misuse of APIs". That common misuse is generally related to APIs that either expose too many implementation details or that aren't clear at all, so I ended up writing this document, in which I talk about some common API problems, in some cases explaining how to solve them as the developers of the API and how to avoid problems as the users even if the API is not corrected and in some cases simply explaining some considerations to be made.
Using an Implementation Detail - Simple Example
Do you know the Enumerable.Empty<TResult>()
method?
In its signature, it says it returns an IEnumerable<TResult>
. It doesn't say if it is an empty list, an empty array, if it was implemented by simply doing an yield break
or anything else. Its signature also doesn't say if it returns the same instance all the time or if it returns different instances on different calls.
Yet, many developers saw that it returns a singleton array and, when they want a singleton array, they call this method and cast the result.
That's a clear violation of the "program to an interface, not to an implementation" idea. And I am not talking about the fact that IEnumerable<T>
is an interface. It could be any non-sealed class. The interface, in this case, is the signature of the method that doesn't say the specific class of the result. If you discover that specific class, don't count on it. A future implementation may change it and, as long as the signature is not changed, any code that breaks will be breaking because it was coded incorrectly.
So, I can give the first "rules":
-
As the user of some API
Never assume something that's not guaranteed by a method's signature or by its documentation. Doing that may create a code that works today and that may stop working in the future.
That is, if a method return type is a base class, don't assume the result is of a specific sub-class and cast to it, except if this is part of the contract. For examples of when the return type is a base type but it is safe to cast, see methods like Convert.ChangeType()
and Type.GetCustomAttributes()
. Their return types are respectively object
and object[]
, but when the ChangeType
returns, the result will be of the given input type and when the GetCustomAttributes
returns that will be at least an array of Attribute
(or of the given input type, if one was given).
-
As the developer of some API
If you declare a public
method to return an interface or abstract
class, don't return an instance of a public
class, as your users may see that while debugging and can cast the result to the real type. If needed, create an internal class that redirects to a public
class.
By doing this, you guarantee that no-one will be casting the result (at least not with safe code) and you will be free to either change the implementation of your internal class or to use a different class in the future.
Only to complete the comment on implementation details and how they can change in the future, I believe that the Empty()
method should return a singleton IEnumerable<T>
that returns a singleton IEnumerator<T>
. The empty array, at least for now, always returns a new object. Returning always the same object will be faster and use less memory, yet I can't say it will be noticeable as enumerations can be big and this one is already empty.
Data-types and Nomenclature
Imagine that you are using some libraries and you see an int Timeout
property (maybe with a slight name variation like SessionTimeout
or ConnectionTimeout
) and you want to set it to 2 minutes. What's the right value to use?
The anwer is:
- If this value is in minutes, you should use two
- If this value is in seconds, you should use 120
- If this value is in milliseconds, you should use 120000
In most situations, timeouts are in milliseconds, but the SessionTimeout
in ASP.NET is in minutes while the ConnectionTimeout
used in database connections is in seconds (and I already found bugged drivers that were in milliseconds).
If the names were TimeoutInMilliseconds
, SessionTimeoutInMinutes
and ConnectionTimeoutInSeconds
, the problem will be solved by the signature, without any need to look into the documentation (or try the values) to be sure.
Another alternative would be a different data-type, like a TimeSpan
. Users would be able to set the timeout to TimeSpan.FromMinutes(2)
. I actually don't say TimeSpan
is the best solution because you can do a new TimeSpan(2)
and it will not be milliseconds, seconds or minutes. It will be in Ticks
. I really believe this constructor shouldn't exist and a TimeSpan.FromTicks(2)
should be used. But, as that's not the case, I only used the TimeSpan
as an example of a better situation, not as the perfect one.
Magic Values
Independently if we solve the previous problem by using a different name or a different type like TimeSpan
, how do we set a timeout to infinite?
I will not discuss if an infinite timeout is a bad thing (that's not the point now). The fact is that many APIs that have a timeout actually consider it optional and allow a value to be used to be infinite... and this value sometimes is:
- 0;
- -1;
- Any negative value;
- 0 or any negative value.
For example, in lock situations, 0 usually means that either the lock is acquired immediately or the action fails, so the -1
is used as infinite. In some file operations, as they aren't expected to complete immediately, 0 means infinite and negative values are simply an error.
The TimeSpan
type will not help in this situation. It allows negative values but doesn't represent infinite values (the fact that -1
millisecond can be used as infinite doesn't change anything). So, if you need a magic value, you are probably needing a new data-type. Maybe NonZeroTimeout
is an ugly name, but a type with this name could use factory methods like FromSeconds
, FromMilliseconds
, etc., avoiding zero and negative values and also avoiding the confusion of a Ticks
constructor.
Users will know if they are giving a time in seconds or milliseconds by the method they will need to use. If Infinite
is supported, a static Infinite
instance could be provided (with an IsInfinite
property to allow testing for it). If the internal implementation is to store an integer, representing milliseconds and using 0
or -1
as infinite, no-one except the writer of this type must know (and this could even be changed in the future without problems).
So, a simple rule: Don't use magic values. If needed, create a new type that does the necessary validations (if applicable) and that has the necessary values as well known, non-magic, values.
Mutable Results
What do you think by seeing the following definition inside a ConnectionManager
class or interface?
List<TcpClient> ActiveConnections { get; }
As the return type is a List<>
, you will see that it has methods like Add
, Remove
, Clear
and lots of others.
So, can you use those methods?
And, if you do, what's expected to happen? Will clearing this list disconnect users? Or will this list be a simple copy of the active connections?
Many developers will immediately see that we should not return a list. But the explanations on the why may be different.
Some developers say that a List
, as well as an array, is an implementation detail and should never be exposed. That we should use some abstraction, like IList
, ICollection
, IEnumerable
, etc. What if in the future we decide to use a HashSet
, as it is faster to remove items at random positions?
Even if this is already a good argument, it doesn't solve the biggest issue. A mutable List
or IList
gives the impression that users can add and remove items to the list or clear it to actually connect new users or disconnect them. Well, with the List
that's impossible, as it never notifies you about the modifications. With an IList
that's possible, but you must implement it accordingly, simply returning an internal List
seen as IList
will not give that possibility to the users (and can be casted back to the List
type).
Some developers may implement such an interface by copying the internal collection to give the appropriate result without risking having a corrupted inner collection. But in this case, some users may not understand why clearing such list is not disconnecting users. Others may start to manipulate that list for their own purposes knowing that it is a copy and, if in the future you do implement the Add
, Remove
, Clear
and other methods to correctly connect/disconnect, you will be breaking those who are using that List
considering it is a copy.
So, the rules that we can get at this moment are:
- Don't return a copy of data without making it clear that it is a copy.
- Don't return a modifiable object that you can't control if it is not a copy.
- In cases like collections, it is very common that we only want to allow users to iterate it, so returning a read-only
IEnumerable
is the best option. But remember that rule about not returning public types cast as interfaces, as some users will cast it back to the modifiable type.
As a user, if you ever see a modifiable result like array, List
, HashSet
or any type that doesn't seem to be related to the type returning it and it is not stated that it is a copy, don't modify it. If needed, make a copy by yourself.
Mutable Inputs
If mutable results are bad, mutable inputs are even worse. I saw many developers that knew how to avoid unexpected modification to their inner collections by using IEnumerable<T>
as the return type and by wrapping the result with a ReadOnlyCollection<T>
or by yield returning all the values, so no-one will cast the collection back to the original type.
But when their objects could be initialized with some values, they did things like:
public SomeConstructor(List<SomeType> initialValues)
{
if (initialValues != null)
_values = initialValues;
else
_values = new List<SomeType>();
}
In this situation, the problem is that who is calling this constructor has access to the initialValues
list and can modify it at any moment. It is not important how correctly we protect that received list from external modification after this point as we can't control how many places in the application already have a reference to this list.
So, if we really want to write a constructor that receives a collection and don't want to use an immutable collection, the best thing to do is to make a copy. Something like this is enough:
public SomeConstructor(IEnumerable<SomeType> initialValues)
{
if (initialValues != null)
_values = initialValues.ToList();
else
_values = new List<SomeType>();
}
Mutable Inputs - Part Two
Many users commit that mistake on constructors so often that they actually avoid the problem by never reusing a list they gave to another object's constructor. In fact, many simply consider that it is not an error to assign the mutable input value directly, as copying a list will consume extra memory and CPU time.
Well, I always consider that we should avoid exposing our components to situations in which they can get corrupted by other's faults but, as I am not discussing a specific architecture, I can accept this, as long as it is well documented. It would be great if this could be enforced in the API by some specific modifier, avoiding the caller to use the object again, but I am not counting on it happening aside from some specific static-analysis tools.
So, independently if you consider it OK to assign a mutable object directly in your constructor. What about methods (not constructors) that receive mutable objects?
When I see a method that receives a list or an array (or any other mutable object, yet collections are the most common case), except if it explicitly documented, I consider that my object can be modified inside that method (in many cases, that's the purpose). What I don't expect is that my object can be stored and modified later. And that might happen.
Well, actually I see the following situations:
-
People want to read an object but, as the objects (classes) they know are mutable, they simply request that mutable type and don't plan to modify it.
If this is the case, maybe creating a read-only interface is the answer. By writing a method that requests a read-only interface you let it clear that you will only read the object. This solution is not perfect, though, because you can't make a closed-source object implement an interface and, even if an adapter can solve the problem, many users will hate having to create a read-only adapter. There's no perfect solution if you can't change the type that's not implementing the read-only interface.
- People want to modify an already existing object because it is more efficient than always creating a new one and they don't plan to store that object or modify it later. Aside from documenting that behavior, we can't have the guarantee that it will not happen (again, maybe method signatures needed a way to tell this). The worst thing is that if you are calling a method from a real interface or
abstract
class, you will never have the guarantee that another implementation will not do it. - For some reason, the developers plan to receive an object (possibly already initialized) and store it, maybe reading it again, maybe modifying it in the future. If this is the case, it is extremely important to document this behavior or to make a copy of the input.
As a user, if you don't see any guarantees, then:
- Don't reuse a mutable object that you passed as an input parameter to some method. If needed, create copies of such an object before giving it as a parameter.
As you see, dealing with a mutable object and protecting both sides means that we may end-up doing two copies of the mutable object when one will be enough. Well, there's no perfect solution here. If you don't know if your object will be mutated later or not, better to copy it before giving it as input. If you need to store an object and don't know if it will be modified or not, better make a copy.
I can say that in C++, this problem usually doesn't exist because you usually can't delete an object that you didn't create, so no-one tries to store an external object directly. As .NET does the deallocation for us, that becomes a more common habit, but it may have bad consequences.
Read-only Versus Immutable
Many times, I created immutable classes using "ReadOnly
" as their prefix. It isn't necessarily wrong, as immutable objects are also read-only, but it was missing that extra information about being immutable.
To explain the difference, most functions to get a computer or process state are read-only. They can't modify that state. That doesn't mean it will not change. In fact, most time-related information is completely volatile. Call it now and one millisecond later and you will have a different information. Yet, those can live in read-only objects.
This is different from an immutable object. An immutable object is created with a state and that state will be the same until that object is collected.
So, what are the issues I usually see with readonly and immutable objects?
- An immutable object isn't necessarily holding immutable data. That can be considered a bug if, for example, you have an immutable record in memory and one of its fields is a normal byte-array. If that array has a length bigger than zero, anyone can change its contents.
- Reference to mutable data can't be avoided in generic types (unfortunately, there's no constraint for immutable types) and the biggest examples are the immutable collections. An immutable collection can only guarantee that the current instance of the collection will not be mutated. If it has thousands of mutable items, those items are still perfectly mutable.
- Read-only objects are usually treated like immutable objects, when they are not. When a method receives a read-only object (or a read-only view of an object) it is really expected that such object will not change during the current method execution. But if that object is stored to be used by another method, well, the contents may be different when the object is read again in a different call.
Trying to minimize the problems, if possible:
- Only create immutable objects that hold reference to other immutable objects. Most developers don't see a reference to another object as an independent object (especially when dealing with collections) so when they see that something is immutable, they believe that anything that's referenced is also immutable;
- When dealing with generic types, try to use only immutable types as the generic argument(s);
- If you receive read-only objects as input parameters, be sure that you get all the information you need and then you "forget" that object. Storing it and reading it again later may give different results.
Threading
I already saw all kinds of affirmations regarding multi-threading, like "objects should not be multi-threaded", "objects should always be multi-threaded" and "objects should be thread-bound, not multi-threaded".
As I am not talking about a specific architecture, I will not say that objects must or must not be thread safe. What I believe is that it should be clear when they are thread safe and when they are not (and components that give a safety guarantee will not have that guarantee if they aren't thread-safe).
Unfortunately, interfaces can't require thread-safe objects. That's not considered to be part of a contract but actually I believe it is a very important part of it. In multi-threaded applications, you can freely use objects from multiple threads if they are thread-safe, you can use your own locking strategies if they are simply thread-unsafe but you can't use them from multiple threads, with or without your own locking strategies, if they are thread-bound, having to completely review how to use them.
This can look pretty simply when seen in isolation, but now I will go back to the previous topic. Read-only and immutable.
- Real immutable objects are always thread-safe. So there's no problem with them.
- Read-only objects that have no mutator methods but use lazy-loading (or similar techniques) are many times considered immutable, but they are not, as they change from uninitialized to initialized when they are first used (and this might happen for each individual property). This means that if you read all properties of such kind of object from a single thread and then give it to other threads, everything will work. If two or more threads are the first to use these read-only objects at the same time, all kinds of errors may happen (depending on the exact implementation);
- Objects received by using a read-only interface are usually known to be mutable, but that's expected to happen after the current method returns. They aren't expected to change in the middle of the current method execution, but this can happen when multiple threads are involved, even if the object itself is thread safe. The current method may read the same property twice, expecting the same result, and it changes. So these objects need to be locked before calling the method that uses them as read-only.
More
Well, I actually have many more ideas on my head but I am already spending so much time in this that I decided to stop it right here.
Another day I will probably write another post to continue these ideas.
I hope it was a good read so far.
CodeProject