In my previous post (Design and Implementation Mistakes - Mostly .NET), I talked about design/implementation problems, presenting situations in which the method signature was exposing too much, the actual implementation was allowing the use of internal details and, of course, including the part where the users do that kind of stuff, even if by accident.
I ended that post saying that I had more ideas in my mind and now I am continuing that post.
String - The Super Type
See this interface:
public interface ISerializer
{
void Serialize(string filename, object itemToSerialize);
object Deserialize(string filename);
}
The idea is that you have a common interface for serialization/deserialization, so you can actually use a binary serialization, XML serialization, JSON serializer, etc. But when I see something like this, I must ask: Why is this interface using filenames? Why not ask for any Stream
and allow this code to work with memory-streams, TCP/IP streams, etc?
To me, this kind of error happens simply because it is easier to use string
s when we want to use a method to read or write directly to a file and we immediately think about files when we think about saving data. So, we give a name and everything works. But the string
version of the method should only exist as a helper to avoid users from having to open/close a file. The real signature must be flexible and allow any stream
to be used. So, the interface should be:
public interface ISerializer
{
void Serialize(Stream stream, object itemToSerialize);
object Deserialize(Stream stream);
}
In fact, the interface can have the overloads that receive the filename or those can be added by extension methods. Each one has advantages and disadvantages, but I will not cover these in this post.
The conclusion is: Don't use "string
s" when you are thinking about files. Use Stream
s and allow your code to be much more versatile. In fact, don't use string
s as the only way to represent anything that's not real text. Encoded string
s may be necessary when writing things to a file but they aren't necessary in memory. Use appropriate types for what you want to do and, if needed, add the string
overloads.
Thread-safe and read-only
As explained in the previous post, read-only doesn't mean immutable. A read-only interface means that a method receiving an object seen by the interface type is not expected to change it, but such object may change from "the outside" (that is, as soon as the method that received the read-only object finishes executing).
This means that any method receiving a read-only input should not store it in internal fields to use it later, as such object might change. This also means that the object is expected to be "immutable" during the duration of the method that received it. So, implementing a read-only interface by a multi-threaded mutable object is probably a terrible idea, as even if the object will not crash from mutation at random moments, the method receiving the "read only" object may crash by reading the same property twice and receiving different values.
So, instead of something like:
class MutableClass:
IReadOnlyInterface
{
}
It is better to have something like:
class MutableClass
{
IReadOnlyInterface Lock();
void Unlock();
}
With this approach, the mutable class is always thread-safe but it is not directly exposed by a read-only interface. When it is expected that it will be given to another method as a read-only object, it should be locked. After the read-only object is given to one (or more) method(s) that should see the mutable object as read-only, the Unlock()
must be called.
Actually, this technique could be improved to return some disposable helper that contains the "read-only" view of the locked object. This could look like:
class MutableClass
{
MutableClassLock Lock();
}
And the MutableClassLock
must implement both IDisposable
and IReadOnlyInterface
or it must be disposable only and have an IReadOnlyInterface
property.
Note that by the fact that a new instance is allocated, this approach may become useless for small objects. In that situation, it is simply better to return an immutable copy of the object.
It is also a good practice to make the read-only object invalid when the Unlock()
(or Dispose()
) happens, or else it could be used as a way to access the mutable object without holding a lock.
From Immutable to Observable
Independently on my insistence that a read-only object is not immutable, that's a normal misunderstanding and I already saw many projects use immutable objects/interfaces everywhere without any description if they were limited to be read-only or really immutable (for example, type definitions composed of read-only properties without anything else).
That wasn't a problem as everybody was considering the objects as immutable and implementing them that way. The problem was the breaking change that was about to happen: Making some of the immutable types become mutable (and sometime even observable)!
Let's explain this situation a little further. Adding new methods to an existing class will not break already existing binary code, as it will not be referencing the new methods, and can only break the compilation of code that references it if there's some kind of ambiguity (like new method overloads that receive reference types when null
was used as argument). This creates the idea that "adding is not a breaking change".
Well, adding mutator methods to types that were immutable is a big concept breaking change. Many places may be simply storing the objects counting on the fact that they aren't going to change but... surprise! When that old code reads some important information again, it is different (or simply isn't there anymore).
The biggest problem here is that any existing unit-tests will "pass", as the unit-tests will not be using the new mutator methods. Any code receiving the objects for the first time will continue to work. The errors will only be seen when those objects are mutated and then required again by some code that was already holding them (and in some cases they will not crash, they will simply give wrong results). That's one of the hardest kinds of bug to solve, as the places crashing or working incorrectly actually aren't the places mutating the objects. And the reactions will be like: "This was working before (maybe for years) and no-one changed it! How is it crashing now?"
Multiple Steps
Aside from simply deciding to make an immutable object mutable, the most common way to make this mistake is in "multiple steps".
First, the class is created as immutable. Then a new property is added, and that new property is the only one mutable. Maybe the object is immediately made observable (implement INotifyPropertyChanged
), maybe that's added later as WPF (or any other framework) is not showing the property changes. Finally, in the future, someone (probably a different person than the one that created the class) sees the object and thinks "Hey, this object is observable... they forgot to put setters in these properties."
At this moment, everything is still working fine. The bad things will only happen when those new setters start to be used.
Interface Types - What's Their Purpose?
Interfaces can only declare method, property and event signatures without any implementation or fields (member variables, if you prefer).
Many people love it, explaining how they are useful for unit-testing, talking that they allow a more reusable design, Aspect Oriented Programming and lots of other things. Well, at this moment, I will focus on something much more simple, that's the base to make all of the other arguments true: Interfaces don't provide an implementation on purpose, allowing the implementation to be done later and also allowing different implementations to be made.
Allowing the implementation to be done later and allowing different implementations are really the key points to me and that's what I am going to discuss. Surely the rules presented in the previous post still apply, but now I want to talk about problems that are directly related to interface design and that will not happen when using classes.
Bound to an Implementation
See the following interfaces:
public interface ISession
{
object Get(string sessionVariableName);
void Set(string sessionVariableName, object value);
}
public interface ISessionManager
{
TimeSpan DefaultSessionTimeOut { get; set; }
Session CreateNewSession();
}
Don't focus on the fact that the ISession
and ISessionManager
are very small. I only wanted to give an idea of their purpose.
See the ISessionManager
interface. The CreateNewSession
method is not returning an ISession
, it is returning an actual Session
class. Such class, that I did not even present, is an ISession
, but that doesn't matter as any implementation of the ISessionManager
class can't create different kinds of session, they will only be able to create that Session
class.
If I renamed the Session
class to be InMemorySession
the error will be more apparent. Why is an ISessionManager
bound to the InMemorySession
implementation?
And the answers can be:
- A real typo. The person writing the interface was thinking in the session as an idea, but ended-up writing
Session
instead of ISession
and, as there was a class with that name, everything "worked fine" (at least it was compiling); - Bad "interface extraction". Maybe the design started with classes, so there was the
Session
and SessionManager
classes. Extracting an interface from all the signatures inside the Session
class worked really fine, creating the ISession
interface. But when extracting all the methods from the SessionManager
to generate the ISessionManager
, the reference to Session
wasn't replaced by a reference to ISession
; - Simple vice of using classes all the time, even when thinking about a new interface;
- Well, any other reason that I am not describing. The human mind can be very creative.
So, how can we avoid these problems?
The best solution I know is to write interfaces first, implementations later. So, an ISessionManager
referencing a Session
class will not compile because that Session
class doesn't exist, forcing the interface declaration to be corrected to return an ISession
.
As this can be tricky when refactoring old code that only used classes, I suggest to create new empty library to put the interfaces into and then make the projects being refactored reference the new DLL with the interfaces.
If you don't want separate DLLs, I can try to put a simple rule like: Interfaces should only reference primitive types, string
s (when the string
represents a real string
, not a file name), other interfaces and possibly simple types like enum
s, struct
s and small immutable classes. About the simple types, see the next topic.
Simple Types - Enforcing a Contract or Binding an Interface to an Implementation?
If we take the idea of "designing interfaces first, doing implementations later" to the extreme, the interfaces will live in separate DLLs that can only contain interface and delegate types. Nothing else, so they will not reference struct
s or class
es.
The reasons for doing this include allowing everything to be tested, every implementation to be replaced and, of course, the idea that any implementation done at this moment, even if extremely simple, can be bugged and then we will tie interfaces to a simple but bugged implementation.
Well... if we are really extremist, any interface declaration may be "bugged" too. In fact, bad interface design is very common, so I don't see why we can't use simple types (usually immutable classes and struct
s) to enforce some rules the interfaces themselves can't enforce.
The Code Contracts try to overcome some limitations by allowing us to define some constraints even on interfaces. Yet, code contracts aren't a guarantee as they can be disabled (and most developers never saw them).
So, when we want a guarantee, not a hint, and when we want to avoid magic values, creating really small types can be considered as enforcing a contract instead of binding an interface to an implementation. Unfortunately, the most common guarantee, the non-null
, can't be enforced by new types. I really hope a non-null
or "required" modifier is created for input and output parameter signatures.
Note: I am not saying to create non-null
reference types as this always brings to question "what to do if it is not initialized". I am talking about creating an input/output modifier that can be used even on interfaces, doing an implicit "throw new ArgumentNullException
(...right name here...)" when input parameters are null
, which could also be compile-time validated for constants and giving a similar guarantee on return
types, so we can be sure that different implementations of an interface aren't going to return an invalid null
. Never. This could even be JIT optimized, so the if
s for null
s could be in the call site, instead of inside the method, and could be completely avoided if there's a guarantee that the value is not null
(like a previous if
over the same local variable).
Real Guarantees and Simple Types
Imagine that I am creating a new UI framework and I decided to start with the interface design.
I keep the idea that all components are rectangular or "contained inside a rectangular area" (I don't want to discuss if rectangular controls shouldn't exist anymore, it is part of my sample) and one of the things users are expected to do is to ask a control for their rectangular bounds.
So, I can see a method like IRectangle GetBounds();
as part of the interface.
But I want the following guarantees of the IRectangle
interface:
- It must not change. Users can't change the bounds of the control by changing the resulting rectangle and the returned rectangle should not be modified if a new function is called over the control moving, resizing or even disposing it;
- It must have the properties
Left
, Top
, Width
, Height
, Right
and Bottom
(all of type int
). How those properties are stored is not important, but the rule that Left + Width equals Right
and Top + Height equals Bottom
should be kept; Width
and Height
can't be less than one.
How Can We Guarantee All Those Traits?
And the answer is that with interfaces, we can't. We can use work-arounds like receiving the bounds and validating the Width
and Height
everytime. I can even see some helper methods created in the implementation to validate the rectangle but the helper will not solve any mutation issues, except if it is making a copy. So, why not start with that "copy"?
If we create struct
s or class
es, we can give much more guarantees. In fact, even a struct
can't give all the guarantees when multi-threading is involved, as an "empty" rectangle may have its contents replaced by a valid rectangle while another thread reads it, reading a partially empty, partially filled instance. With a class
, you either see the old reference or the new one, without "partial" situations.
And it is not hard to create such class. Here it is:
public sealed class Rectangle
{
private Rectangle(int left, int top, int width, int height)
{
Left = left;
Top = top;
Width = width;
Height = height;
}
public static Rectangle CreateUsingWidthAndHeight(int left, int top, int width, int height)
{
if (width < 1)
throw new ArgumentOutOfRangeException("width");
if (height < 1)
throw new ArgumentOutOfRangeException("height");
return new Rectangle(left, top, width, height);
}
public static Rectangle CreateUsingRightAndBottom(int left, int top, int right, int bottom)
{
return CreateUsingWidthAndHeight(left, top, right-left, bottom-top);
}
public int Left { get; private set; }
public int Top { get; private set; }
public int Width { get; private set; }
public int Height { get; private set; }
public int Right
{
get
{
return Left + Width;
}
}
public int Bottom
{
get
{
return Top + Height;
}
}
}
To avoid confusion, I used the factory methods CreateUsingWidthAndHeight
and CreateUsingRightAndBottom
. Looking at the CreateUsingWidthAndHeight
, you can see that Width
and Height
can't be less than 1
. Looking at the Right
and Bottom
properties, you can see that they actually use Left + Width
and Top + Height
. The CreateUsingRightAndBottom
actually uses the CreateUsingWidthAndHeight
, so the Width
and Height
are validated.
All the properties are "get; private set;
" and only the constructor is setting them, this means the Rectangle
type is immutable.
As you can see, a real class is giving real guarantees. Of course, unsafe code can always kill those guarantees, but let's focus only on normal usage. It will work. The focus of this API is not to have different rectangle implementations, is to have a replaceable UI framework architecture. A Rectangle
can exist as a class without problems and, if performance is a problem, even if it is not pre-storing Right
and Bottom
, it is probably faster than using the virtual dispatch of interfaces.
The null Value
Now, there's one other problem. What about the null
value?
In .NET, we can't enforce non-null
reference-types. Even if a language presents that feature, it is probably enforcing this by adding ifs
in input/output as the .NET IL doesn't understand non-null
references. This means that an interface created in a language that supports non-null
reference types can still return null
if it is implemented by another compiler targetting the .NET and this same null
value can be seen when used by any language that doesn't enforce the non-null
. So, there's no magic here, null
is always a valid value.
Returning to the UI framework idea. Should an invisible control have bounds?
If we look at WPF, the invisible controls still have bounds but the collapsed controls don't have. I can consider the same idea in my framework and, as our Rectangle
type doesn't support Width
or Height
of zero, this means that we must use something else. Putting an IsEmpty
property on the Rectangle
itself is only increasing its responsibilities. It is better to represent the lack of a rectangle existence by not having one at all instead of having one that says "I'm empty".
null
already represents the lack of an object and, as I am used to it, I could accept that null
is a valid result. We can say that null
is a kind of magic value, but it is the most known magic value, and it is used to represent lack of an object, which is perfectly fine for an invisible collapsed control.
The Optional Solution
I actually can live with null
meaning the lack of result. The problem is that a method named GetBounds
returning a Rectangle
class doesn't appear to be able to return null
. One of the solutions is to change the method name to, for example TryGetBounds
. That "Try
" makes it clear that it may fail and, as there's no out
parameter, it can be assumed that the null
result means the failure/lack of value.
Some developers don't like that solution, preferring to have an Optional struct
. An input or output of type optional
makes it pretty clear that the value may not be provided. And in many cases, a null
can be used to initialize that optional type. It isn't really enforcing anything, it is simply there to make things clear. If you don't use the optional type, a null
can still be returned, but that can be considered an error.
Well, I will not say one is better than the other. As long as it is clear from the method signature that it may or may not return a value, it is OK for me. So, if the method doesn't return an optional type and the name doesn't say that either, the best is to consider that null
s are invalid and always validate input/output values that are considered required. The worst thing is to have a mixed solution where half the parameters and returns types can return null
and the other half can't without any clear indication in which category they are.
The End... or Not
The null
situation is only one of many of the non-enforced guarantees that we must deal when coding. Unfortunately, interfaces and method signatures don't give us lots of guarantees but we can still live with it by creating our own rules, allowing us to make some assumptions, thanks to the names or even the documentation of some methods.
I want to explore that further but, again, this post is getting long and I believe it will lose focus if I continue in that direction. So I will stop it right here and maybe someday I will post something about how to deal with those non-enforced guarantees. I am saying maybe because I can simply change focus and start posting things about completely unrelated topics, so I am not promising anything.
I hope this post was helpful to understand some problems and to give an idea on how to solve them, as well as to give an idea about when interfaces can or can't be bound to actual classes.
CodeProject