Introduction
Lately I am talking a lot about Quality in software development and that's something that's not really easy to figure out.
First, many managers say that quality is delivering products in time. I surely agree that when we have deadlines, not respecting them reduces the overall quality of the work. But after a product is delivered, its quality has nothing to do with the time spent to deliver it. Someone new, that looks at the product (in our case, the software) for the first time may see quality as the sum of those factors:
- The program does what it needs to do;
- The program does not crash;
- The program is easy to use;
- It has a good looking interface.
But if the person is the developer that will continue the development of such software or simply correct bugs found in it, then quality may be the sum of these:
- The program has almost no bugs (so the new developer only focus on new things instead of chasing the source of those bugs);
- The program is easy to read (it has good method and variable names, useful comments, follows a standard etc);
- The program has a good structure that makes expanding it easy (you don't need to change configuration files and create 5 different classes if you only want to add a form with a single button in it);
- And, even if the programmer is not aware of that at the first moment, all the methods the programmer will use never corrupt states and give immediate exceptions if something is wrong.
SOLID
A lot of articles are talking about SOLID these days and many of them are presenting SOLID as being real OOP. One of the main arguments is that by using SOLID (or should I say, by respecting the SOLID principles?) a program is easier to maintain and evolve. So, that's quality right?
Well, I am not going to discuss SOLID in this article as there are a lot of articles out there that do it (I already talked about the L, which means Liskov Substitution, here), but what I will say is that by respecting the SOLID principles we are only respecting the third item on the list I just gave. After all, using SOLID we can still have bugs (the item 1), we can still have ugly method names, even if they are really loosely-coupled (item 2) and we can still have methods that simply don't validate input parameters, filling lists and other objects with invalid values and throwing exceptions when used by other methods later (item 4).
So, I finished talking about SOLID here. From now on I will focus on the fourth item.
Rock Solid Quality - Never corrupt states
Rock Solid Quality has nothing to do with the SOLID principles and it is not a combination of many words. It really means extremely resistent to state corruption.
The idea is simple from one point of view but very problematic from another point of view. Most developers immediately understand that internal object states should never be corrupted, as corrupting such states lead the errors to happen later and makes it harder to find the source of the bug, yet most developers simply consider validating inputs a waste of time (theirs and the CPU time).
Worse than that, they consider that everything is OK if their actual code is not corrupting data even if they don't do anything to protect data from being corrupted elsewhere. The most common arguments are:
"It is working, isn't it?"
"Hey, if someone does such invalid call it's their fault, not mine!"
"It's the Quality Assurance's team responsibility to do the right tests and guarantee that bugged code is not going to production."
And, if you believe those arguments are right, I should ask: Why do we have those visibility modifiers (public
, protected
, internal
and private
)?
If we get a working application that has the right visibilities applied we can change everything to public
and the code will continue to work (well, as long as there is no reflection expecting to search for non-public members, but that's a corner case and can also be easily corrected). What I mean is: By making everything public
the code can still work fine as it will continue to access the right methods and properties instead of acessing the internal fields directly. But if we do so we will make the code much more susceptible to bugs, as users of such code may start to access fields directly, either corrupting them or reading values that weren't initialized yet.
So, what do you think? Is "the code is working" a good measure of quality?
In my opinion it is not. To me, the working code is the minimum that should be given, but to have quality a working code must be guaranteed to behave properly even if in the future new units make all kinds of bad calls to such code. In such case, the Rock Solid components must throw exceptions, making the new code crash, but the components should not be left with corrupted states. That is, independently of what kind of stupid things the new code does, the working modules, that were already tested, will remain functional.
This is easily seen in server applications. If new pages are passing null
values or doing other wrong calls to shared components those pages can crash, yet the users using the old pages will not receive exceptions caused by garbage left in internal structures of such shared components. So, those shared components are the Rock Solid ones.
How to achieve Rock Solid Quality
Rock Solid Quality requires components to guarantee that they will never have corrupted states, independently of the reasons that may cause such corrupted states. So I will present some guidelines to achieve that.
It is important to note that when I say never have corrupted states there are some considerations to be made:
- I said "never have corrupted states" instead of "never corrupt states" because of the implications of both. In "never corrupt states" it simply means that the actual code will never do something bad that will create a corrupted state, yet it may still allow someone to put invalid states in it (like letting a field be
public
or returning a mutable and non-observable inner collection); - Never is a strong word. We can't really give such guarantee. In .NET, reflection may be used to access and modify internal states. Also we can't solve possible hardware problems, but we should do our best to avoid all cases that we can have control;
- Performance critical code is an exception for most rules. Some of the Rock Solid guarantees affect performance, so you will probably not apply those principles if you are writing a kernel driver.
So, ignoring the impossible cases, let's understand what must be done to give the Rock Solid Quality guarantee:
Part 1 - Encapsulation
We start by encapsulation. As already shown letting states be changed directly (or even read directly) can become a problem, so fields should never be public
and they should always be accessed through method or properties (which internally use get
/set
methods).
For example, if later we decide to apply lazy loading, the external code that is calling a get method will continue to work fine, as it will use the new lazy loading technique.
As all state changes happen through methods, changes to invalid states can be validated, throw exceptions and keep the state intact. So, here is the other key point: Always validate input (and possible state combinations) before doing the actual state change.
Who applies those rules?
We can say that these first rules are well respected in base .NET libraries. For example, we can't insert an item at position 8 if a list has only 1 item, even if its internal array (its capacity) is of size 32. We can never corrupt memory by acessing invalid array indexes and methods that can't accept null
values throw NullArgumentException
s before doing any work.
There are some public static readonly
fields though, which can be considered problematic if a change in the internal behavior is wanted, yet the static readonly
fields are only initialized when the classes that declare them are first needed, so the static constructor can use any logic to initialize them and that already gives a lazy loading result.
Part 2 - Thread Safety / Thread Ownership
Even with all the input validations, most of the .NET base classes can still get corrupted because they aren't thread-safe at all, so it is possible that one thread reads an intermediate state while another thread does a change or even that two (or more) threads change states at the same time, effectively finishing with a state that isn't a complete state of any thread. This can be be said to be "by design" because by not being thread-safe and avoiding thread-checks the code is faster and the objects can still be accessed by multiple threads if the user knows what he is doing and does the appropriate locks.
But such thinking is not shared by all components. WPF dependency properties, for example, always do thread-checks. It is simply impossible for another thread to change a dependency property directly, even if the user knows how to do the locks. The only way to do such changes is to ask for the component's owner thread to do the change (achieved by using the Dispatcher
). So, we can say that the Dependency Properties (and WPF components in general) are achieving better Rock Solidity than the BCL components.
That gives us the rule that Rock Solid components must be thread-safe. If they are thread-safe because they do locks, because they are immutable or because they can only be used by an owner thread is not important to the rule. It may still be important to the expected use of the component, as locks are slower than thread-checks if the component is expected to be used by a single thread, while the thread-checks can be a problem if the component could be accessed by many concurrent readers.
About the immutable objects it is important to note that a "read-only" object that uses lazy-loading is still mutable. Two calls to the same property, done by the same thread, will return the same result yet the initial state will be "not loaded" and the load process will be a change in the state, so such loading must also be thread-safe in some way.
Part 3 - All state transitions must go from one valid state to another in a "single step"
This case is usually seen in database operations, yet it is not limited to database objects and we can say it is still related to encapsulation.
In database operations we achieve the "single step" (also known as atomic) behavior by using a transaction. In our code we can still do many separate changes (like updating two different records in two different operations) yet the result will be seen as "all or nothing" because we will commit or rollback the entire work.
When working with memory objects we can also face that kind of situation, in different ways. So, let's see some of them:
-
Allocating many objects and setting fields.
It is a common practice that we should do all our allocations first and only at the end set the state fields to the new allocated objects.
For example, see these two situations:
_x = new X();
_y = new Y();
And
X x = new X();
Y y = new Y();
_x = x;
_y = y;
I am using the names X and Y on purpose. It is not important what X and Y do. But imagine they can be large objects, which a complex logic and their allocations can throw exceptions (OutOfMemoryException
is always a possibility).
In the first example, if the new Y()
throws we already lost the previous value of _x
. In the second case, if the new Y()
throws, the values of _x
and _y
will still be intact and the allocated X will be collected sometime in the future, as there are no more references to it.
But, if X and Y are disposable, we can improve such code to be like this:
X x = new X();
Y y;
try
{
y = new Y();
}
catch
{
x.Dispose();
throw;
}
if (_x != null)
_x.Dispose();
if (_y != null)
_y.Dispose();
_x = x;
_y = y;
Note: This code is also considering the the Dispose()
will never throw, but if a Dispose()
is throwing, then there's something very wrong.
-
Different properties that must be set together
The fact that we can read properties independently sometimes gives us the bad habit of allowing them to be set independently, when that should not be always possible.
In any situation where two or more properties should be either all set or none set we should do such state transation by a call to a single method. Considering we are already making the code thread-safe, it is not important that there will be many internal instructions to do the job, the important is that a caller will not be able to set the first property and never set the other properties that are mandatory.
If compared to database situations, we can compare to a situation where if there is a discount, a reason for such discount must be given. So, setting only the discount is not valid, but the method that will do such action will receive both parameters at once, will be able to validate both and, if everything is OK, will do update both properties.
-
Many, many changes with a single validation
In some situations we have a large hierarchy of objects that may be changed. We really want to add many items, remove many other items and so on, but the validation must be done only at the end, as any validation in the middle may fail.
Many common patterns for this include methods like DisableValidations
/EnableValidations
or something like using(DisableValidations()) { /* some code */ }
. But doing that is still problematic for shared components. What will happen if you do many changes and, when validating, the validation fails? Or even if the EnableValidations
(or the Dispose
) is never invoked?
A better solution, that will guarantee the Rock Solid Quality, is to create a separate collection of changes. In such collection you do all the actions you want, but it is only registered what you plan to do. Then, with a single method (like Apply
) you give such collection. Then all the validations may be done before a single change is applied and, if everything is OK, all the changes are applied.
Part 4 - All results must be either immutable, copies or other Rock Solid components.
It is usually said that we should never return collections directly, that we should always use some kind of abstraction. Many people then use beautiful explanations like "we should program to interfaces, not to implementations" but they don't actually explain what's wrong in returning a list directly.
So, let's explain the problem: If you return a mutable list directly, anyone will be able to add, remove and replace items without any validation. So, users may be able to add null
values, duplicated values and any other kind of value that is invalid to your component. This is already very problematic, but there are other problems, like: What will happen if you decide to change from a List<T>
to a HashSet<T>
? Changing the return type of a public
method is a breaking change. This particular change (the return type from one class to another) is the reason to say "program to an interface, not to an implementation". List<T>
and HashSet<T>
are the implementations. IEnumerable<T>
, for example, is an interface that both implement.
But the Rock Solid Quality is not the same as "program to an interface" idea. In fact, returning a List<T>
or HashSet<T>
cast as IEnumerable<T>
is still dangerous. Users can always cast received objects back to their original types and modify them, so they will be able to corrupt states. Also, to the Rock Solid Quality it is not important if you program to an interface or to an implementation. You can return collections of a specific type as long as such collections do the right validations. That is, you can program to an implementation and still have Rock Solid Quality (we may not have the same overall quality, but the Rock Solid Quality is related to guaranteed states, not to abstractions).
Trying to reduce the problem to what is really important to the Rock Solid Quality, the results must either:
- Be a copy. This may be performance inefficient but there is a guarantee that the object referenced internally will be fine even if the result is modified. I don't indicate that because it may give the impression that you can modify the result to modify the original content, which is a problem, but this does not affect the Rock Solid Quality. Also, it is important that a deep copy is given, as returning a copy of a list where each item is a mutable reference is still a problem;
- Be immutable. In this case we have results of primitive types and strings (which are the most common results) and we can have our own immutable types. Most value types are immutable, even if that's not mandatory, yet value types are always a copy, so they will usually fit in the previous item (but if a struct has a reference, well, such reference must be immutable or have Rock Solid Quality);
- Be Rock Solid too. If your result is a reference to another Rock Solid component that knows how to modify states respecting the rules of the actual component, then everything is fine. For example, you can return a modifiable
ColumnsCollection
if such collection is able to do the right validations and use the right update process, even if in this case we are programming to an implementation (the ColumnsCollection
) and not to an interface.
Abort Resistence
Even if the previous rules already make the components Rock Solid, there are still somethings that may go wrong. Thread aborts and asynchronous exceptions in general. But at the first moment, let's focus only to thread aborts. For those components that can only be accessed by the owner thread that is usually not a problem as if their owner thread is aborted, no other thread will access them. That is, the thread may be aborted and left those components with inconsistent states. That will be a problem only if they have finalize methods that use such states. If not, no-one will ever see they were corrupted.
For those components that can be accessed by multiple threads, they may still end-up in inconsistent states even using all the right locks. That's because thread aborts may happen at any IL instruction, so when setting two fields with already allocated values, for example, it is possible that only the first field gets set. It is possible to solve that situation by putting the code in finally blocks (as finally blocks are executed even when Abort()
happens) but the truth is: You will never be safe against thread aborts.
The reason, in this case, is related to the .NET libraries themselves. They are not Abort()
safe, except for rare components, and so any code that depends on non-abort safe libraries may fail if facing thread aborts.
If you really want to know more details about that I suggest these two articles that I wrote about the problem:
Using Keyword Is Not Abort Safe
AbortIfSafe
Other assynchronous exceptions
Thread aborts generate asynchronous exceptions and are the most common of its kind, but there are other asynchronous exceptions. In general, any method that you invoke may not be jitted yet and, to jit it, memory must be allocated (and OutOfMemoryException
s may be thrown) or the callstack may be full and StackOverflowException
s may be thrown.
There are ways to try to protect your code from them (the CER - Constrained Execution Regions are exactly for that), but I really don't consider that needed to create Rock Solid components, and the reason is because they are extremely rare and, when they happen, the application will usually die immediately (in fact, I never faced such exceptions in real situations, only when forcing them to happen). And again, even if you protect your components for them, the .NET itself is not protected from such exceptions.
Conclusion
Rock Solid Quality is a concept directly related to the internal states of the components, giving the guarantee that once tested and bugfree such components can be shared by many modules without the risk of one module jeopardizing the others by using the components incorrectly. Exceptions will only come from Rock Solid components if they are receiving invalid input parameters or if they really depend on an external factor (like the network) that is not working, but never because a previous call corrupted the component.
This concept is not a direct result of other principles and does not intend to replace them. We will not avoid the need for unit tests by using this concept, but if well applied this concept will make unit tests easier, as there is a guarantee that a given component will not respond incorrectly if it is accessed by another module or if a previous call gave invalid parameters. It will not help you in documenting the application or solving business issues, yet it will help you identify any future code that's bugged by throwing exceptions immediately when invalid parameters are received.
Breaking your teeth with Rock Solid Spaghetti
I know that usually the conclusion comes at the end, but I decided to add this topic after the conclusion because it is not directly related to what the Rock Solid Quality is, but about the problems you may face if you try to apply it into already existing Spaghetti Code.
I know that many developers are afraid on touching on any old Spaghetti Code. You find a bug, you correct it and, because of that correction, other bugs appear. Maybe such code was returning a negative value that conceptually should be positive, but there are places that are already subtracting such negative value (to make it positive) and so correcting such method will, inevitably, make the code that worked with the wrong result to stop working.
The same will happen if you start to apply the Rock Solid Quality principles to old Spaghetti Components. You add a validation and now the code that worked, but made the entire system unstable, will not work at all, and that's the decisive moment. When such things happen, managers go crazy, they say that the new version is a regression, with much more bugs than the previous one and that all those modifications should be rolled back.
Well, I will not say that all managers are exactly like that, but that's a real risk. When transforming normal components into Rock Solid components all the errors will become more apparent. Instead of finishing an action and making the system unstable (which may not affect everyone immediately and can be solved by resetting the server or application) the action will not finish at all. Surely the server will not become unstable, but as the action never completes because of such new validation, there are big chances that someone will get pissed off.
The new errors will probably be very easy to solve as they will be happening at the exact place where there is an invalid parameter in the call, but until the call is corrected, such action is not finishing anymore. So the normal view is: Before things were working sometimes and failing sometimes. Now the "system never works".
Surely this is false. The entire system continues to work and the problematic action is not allowed to complete to protect the rest of the system, but that's not how the clients and the managers usually see it.
So, if you are the developer that is adding the new validations and you are facing such problem, well, I should say that when doing such kind of modification you should let it clear the possible problems but you should also emphasize the advantages of receiving such exceptions so all the errors can be corrected instead of living in that "it works sometimes, it doesn't work some other times".
I should say that in my experience there were many situations in which all the developers were "chasing a specific bug" while I was simply making the components more Rock Solid. In many of those situations I was able to find many errors (while doing my own tests) and correct them, effectively correcting that bug that everyone else was chasing, without really knowing what of those many errors was the real source. Yet, if some other place (almost never used) started to give errors, I was accused as the one that "introduced a new bug". But at the end of the day I was always capable of solving that "new bug" that was always there and prove that "a single action never working" is better than "that action finishes and now the entire system is unstable".
A New Conclusion
So, to give a conclusion involving all the topics: The Rock Solid Quality is extremely useful but as most principles it should be there from the start. Changing an existing project to apply new principles is always possible, but there will always be the problematic moments. In this case I think that the best solution is good communication and if possible a good test team to try to test every situation that may fail with the changes. I still consider it extremely useful to apply the Rock Solid Quality as any existing bug will become a constant at a specific place instead of becoming a "sometimes it fails at random places". As any principle, it's up to you to decide when to apply it and if in your case it will bring any good or not.
Version History
- July, 3rd, 2013. Formatted some text and added the topic "Breaking your teeth with Rock Solid Spaghetti";
- July, 2nd, 2013. Initial version.