Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / DevOps / unit-testing

Step 1. Start TDD, Step 2. ???, Step 3. $Profit$

4.90/5 (9 votes)
5 Jun 2013CPOL23 min read 24.7K  
Use cost of market delay and your learning costs to figure out whether TDD, or its alternatives, makes sense for you

The magnitude of learning Test Driven Development (TDD) is nothing to be sneezed at. This is largely why the TDD debate is so polarized. The really experienced TDDers have long ago learned the techniques and think they are completely obvious. The inexperienced developers are overwhelmed by the amount they need to learn. And many of the foolishly brave ones who try to use TDD-style techniques, end up feeling burned. They cry into their beer, namecalling the technique Tedium Driven Development. Nonetheless, a solid test harness which enforces clear separation of concerns, constantly gives feedback on whether your code works as expected, and generates code which is cheaper to maintain. Huh? 

At one end of the spectrum, automated testing can make it very difficult to introduce changes into your code. This is true regardless of whether you are adding a new feature or changing an existing one. For example, if you change a method signature of a very popular class or interface, there are ripple effects across your entire code base. At the other end, if you make your code too flexible, there will be massive amounts of extra code, just to provide you with greater "flexibility" most of which you won't use anyway. In either case, it's helpful to know how the software's users will expect to interact with it.

Moreover, serious problems can happen even to extremely talented developers. Properly done TDD, i.e. with automated acceptance tests, requires solid non-technical domain knowledge. You need to know what tests are sensible to write. Typically, these requirements are utterly non-technical. If you don't understand "the business", you risk tying a noose around your own neck, the one from which you will hang. So basically, TDD implicitly requires close work with domain experts, who may be unavailable or uninterested. When you do get access to them, you don't want to be struggling with the concepts.

From a business point of view, TDD is problematic because it potentially costs time. For example, in the case of a startup, there is a very real opportunity cost of releasing late. You risk making fewer sales if your competition beats you to a market. Even if they have more bugs, they will already be earning cash. In most markets, according to Ries & Trout's booklet the 22 Immutable Laws of Marketing, there is space for three players in your users' mind. How many brands of toothpaste can you come up with, without checking a store? More poignantly, do you know who the second person to cross the Atlantic was, without checking Google or Wikipedia? (Hint, it wasn't Charles Lindberg or Amelia Earhart). You are up against the limits of human perception and memory, at least within your niche.

In addition to the cost of delay, there is the obvious costs related to having and housing a development team: developers, testers, computers, office space. Even if it's all virtual, there is still an actual cost. Once you have clear sense of the value of time, you can evaluate your approach to TDD.

Starting From Scratch

On one hand, you need to learn the relevant patterns and technologies, and then invest the time to apply them. From the point of view of a clean slate, you need to have a functional understanding of a number of techniques and technologies in order to do TDD effectively. If your whole team is experienced this may not be a major concern; however, if you are starting completely from scratch...You don't need to go all of the way and be a TDD purist; you may get enough practical mileage from just a subset of the following:

  • Interfaces
  • Dependency Injection and Inversion of Control (IoC)
  • Stubs and Mocks
  • Unit Testing
  • Automated Acceptance Testing

To be honest, it took some time for me to grasp all of the above concepts well enough to be productive with them as a developer. I quickly realized that I needed to understand the mechanics of writing unit tests, dependency injection, and mocking before I even had a chance at going after TDD. Once I understood these concepts, I had to figure out what they actually meant within Visual Studio, which wasn't always as clear as I would have hoped, especially for the particular code in C# and C++ I was working with. This meant finding the right tools, learning how they worked, followed by actually writing a sensible test.

All of these patterns are key parts of automated testing and, as a result, they unfortunately are prerequisites for effective TDD when you actually do it. Unfortunately, if you "just" want to do "spec-first development", you need to understand all of the above!

If you manage to cover all of this ground, you will reap a number of significant rewards. TDD may save you massive amounts of time, once your code base becomes large and complex enough. XProgramming.com explains: "During the life of a project an automated test can save you a hundred times the cost to create it by finding and guarding against bugs. The harder the test is to write the more you need it because the greater your savings will be. Automated unit tests offer a pay back far greater than the cost of creation." This effect is empirically visible in the studies done by Capers Jones. Despite both being "agile", Extreme Programming (XP) clearly outperformed Scrum on projects up to 1000 function points. His dataset consists of thousands of software projects. Jones considers TDD the key difference.

There is more good news. In addition to being prerequisites, the techniques above are also alternatives to TDD you can employ on their own. It doesn't make sense to imprison yourself with perfectionist visions of a test suite that prevents you from ever checking in a regression, one that pours "holy water" over your program. It's beneficial to learn and apply these prerequisites on their own. Each one is useful in solving a certain class of problems. Once you understand how each one works, you will be able to apply it to solve immediate problems you actually have.

Doing Nothing At All, Nada, Zip

Ok, I'm being a bit facetious, but it is true. Sometimes you are doing a prototype, just to learn something. A programming class comes to mind. Sometimes, you are just creating a tool for your own purposes, such as a one-off script to automate something tedious. Sometimes, you don't need to care about bugs at all; no one will call you a dork for having a bug, for example, since no one will see what you made.

More realistically, if you are spiking a technical solution to a problem, you simply want to know if a few technologies and techniques work together. That "info nugget" could have massive value to you. Ward Cunningham describes spiking on the C2.com wiki: “I would often ask Kent [Beck], ‘What is the simplest thing we can program that will convince us we are on the right track?’ Such stepping outside the difficulties at hand often led us to simpler and more compelling solutions. Kent dubbed this a Spike. I found the practice particularly useful while maintaining large frameworks.” Writing automated tests would just make creating the spike take longer, particularly if you don't think you are going to use the spike as part of your production code.

Before you even consider your efficiency, consider your effectiveness. To borrow Mary Poppendieck's phrase, "First Build the right thing, then build the thing right" from a purely technical perspective. TDD-based techniques may get in the way.

Interfaces

The big bang for the buck approach, without really changing your approach much, is to use more interfaces. What, you aren't already using them? Interfaces separate out the concept you understand, from any particular implementation. You already think this way naturally; this helps your code behave the same way.

The concept of say, a table, is independent of any particular instance of a table. Regardless, there are a fixed number of uses for a table. You would expect every table to support certain features. You can put something on a table, i.e. a cup, a plate. A table can gather people around it, for a morning breakfast, for example, and thus instigate a discussion. Certain specialized tables, can be used as a desk, but that will not invalidate how a table can be used. To some extent, regardless if you use the word "table" in English, or "tabla" in Spanish, or "Biǎo" in Chinese, the speakers of that language would also have the same assumptions about how a table is used and, to a lesser extent, its inherent characteristics.

In short, an abstract concept underlies any instance of an object, so it behooves you to organize your code into sensible abstractions. It will be much easier for you to change classes as you build out functionality.

Moreover, even back-fitting interfaces on existing code provides an additional architectural benefit: you prevent unintended side effects. The aggressive act of introducing an interface into existing code forces the code to be self-contained. The interface is a conceptual scoping tool. In the same way that a variable can locally scope to a method, a class, or a namespace, an interface is like a domain-specific scope. It forces specific, conceptually related, components to work together, without interacting with other code. This seriously reduces code complexity, because the code will map to your intuitive understanding of the problem.

Using too many interfaces, though, can easily lead to an explosion in the number of classes. For example, in C#, if you create an interface for each class, you will increase your code base by 50%, without adding any functionality at all. In large scale systems, this adds up to a lot of extra lines of code.

To get benefits from this technique, add interfaces if they serve a function, or where they lie on an important boundary. For example, having an interface class for sending messages with one "send" method, completely breaks your dependency on any particular messaging technology. It's also easier to add a new message bus into your code in the future. Having interfaces at the boundaries of your components means that your code is more modular, self-contained, and easy to modify.

Inversion of Control and Dependency Injection

Programming is a focused form of problem-solving via learning. It's inevitable that you will know less about your problem before you have implemented a solution, rather than after your solution goes into production. Using interfaces will make it easier to try multiple alternatives.

Once you grok how interfaces themselves work, dependency injection and inversion of control (IoC) are a great way to use them. They are not easy to grasp the first time around, but worth studying in detail. While IoC describes a general relationship in terms of contracts, dependency injection specifically uses interfaces as a type of code-based contract.

Ragu Pattabi explains inversion of control this way:

If you follow these simple two steps, you have achieved inversion of control:
1. Separate what-to-do part from when-to-do part.
2. Ensure that when part knows as little as possible about what part; and vice versa.

With IoC, not only is there conceptual separation between code and implementation, but you also decouple your code further...based on time, i.e. when, and assumptions about the preexisting conditions for your code to be called.

In a small code base, this might seem to make the code more complicated. In a large system, it's a project saver. Each component of a system, i.e. the what-to-do part, is very clearly defined and self contained. It's completely separate from the when-to-do part; for example, the bootstrap loading resources at the start of a program. Then, it's much easier to snap together the individual components, like Lego blocks. It is easier to think about your code, as it maps to your understanding of the interactions over time.

What do you get from IoC?

  • decoupling of run time implementations
  • avoiding "plumbing code" that causes coupling
  • components only care about contracts, and don't need to make any more assumptions
  • no side effects on other modules

This ability to get rid of many assumptions is extremely powerful. It simplifies your code, as it gets rid of interdependencies. Often there are hidden assumptions in code; IoC guarantees that all assumptions are at the interface or contract level only. Priceless.

Dependency injection is a common pattern to help you achieve IoC, specifically using interfaces. Externals are passed into a class as interfaces, so that the class only uses the interface, and the actual code being executed can be determined at runtime.

There are three common approaches to DI, including passing interfaces at:

  1. the constructor level: on initialization, you pass in any object which implements a specific interface. Within the class it only uses the interface methods.
  2. the getter/setter level: like the constructor level, although the implementation can be passed in at any moment during the object's life. while more flexible, it's also slightly less predictable.
  3. a specialized container which resolves dependencies. Given an interface, the container returns a specific implementation of that interface.

DI is a tool to separate out dependencies explicitly, so that they can be added as needed--at run time. DI also makes it easy to inject in test stubs or mocks when testing class functionality. It forces you to create classes with high cohesion and low coupling. What should be together is together. What should be able to vary independently is separate.

Pretending With Stubs and Mocks (Nanny, nanny, boo, boo)

Once you can inject in interfaces into your object, you suddenly have the ability to change objects and interactions quickly. Immediately the question arises, what do you change them to? In order to understand your code and test it effectively, you want to build a "scientific" model in the form of a test harness. Such a model helps you hold everything constant, such as:

  • other classes
  • external assemblies
  • inputs

so that you can isolate each individual method, or some part of your component. From the point of view of a hypothesis test, the only thing that matters is how the object looks and acts from the perspective of its dependencies. The object's privates are up to you.

Test doubles are typically the most effective way to do this. A form of the GoF proxy pattern, test doubles help you pretend the object being tested is in production conditions, without actually being in production conditions. Effectively, test doubles help you conduct a thought experiment. They are a tool to help your computer "imagine", to create with hypothetical situations, in order to simulate scenarios which you expect will occur.

In case you aren't sure, you only use test doubles to tease out specific behavior from the "real" class you are testing. This was surprisingly assumed to be obvious in the online documentation I found. There is no point in testing the test doubles, unless if you are writing a testing framework of course.

There are many test double types which will do the trick, with a wide variety of weird and wonderful names: Dummy vs. Stub vs. Spy vs. Fake vs. Mock. In practice, stubs and mocks cover most of the typical scenarios you will encounter.

Stubs are the workhorse of automated testing. When you can't control the indirect inputs of a test, stubs break real dependencies which are dependencies of the object you are testing. They're easier to instantiate. Stubs provide canned answers when you call the dependency, without requiring you to instantiate its dependencies. The upshot? You can create a "bubble" around your object, and interact with it very precisely using stubs, in order to confirm that it does what you want in each scenario.

For example, if your object determines you have a lottery ticket worth over $1,000,000, it should print "Congratulations" to standard output. You would use a stub which implements an interface shared with a lottery ticket. In order to test whether your object has good manners and congratulates you, this stub would simulate giving you a ticket worth $1,000,001. In that case, your object needs to give its congratulations. A stub makes it easy to create such a simulated ticket.

Mocks are a little more sophisticated, helping you make your assumptions explicit about how the object behaves. Mocks help you ensure that your object calls the methods of its dependencies. Instead of looking at input data, mocks help you determine whether a specific dependency's method is called at runtime.

If your object tells a dependency to quack, it really should quack.

Martin Fowler's article on stubs and mocks was helpful, when I was getting started, even though his examples are in Java. Nobody's perfect. Smile | <img src= " /> Martin recommends to avoid using more than one mock object per method under test. Then you are testing multiple interactions in one unit test. If necessary, you can use any number of stubs, as needed, in order to isolate your class and the specific method.

A good mocking framework helps you generate stubs and mocks from interfaces that you define. This helps you avoid coding up masses of throwaway classes, just for the purpose of isolating code. You also don't need to worry about dependencies. You can wire up your classes quickly, and all you need are interfaces.

Unit Testing

Finally, we are at the stage where unit testing makes sense, as you have the pieces in place. Roy Overshove's book The Art of Unit Testing was very helpful here. At any given moment, you write a unit test, in order to check that a specific method behaves according to your expectations. Because your class is instantiated with interfaces at its boundaries, you can easily create stubs and mocks from those interfaces. Cha ching!

In unit testing, the fundamental atomic element is the method under test (MUT). Ideally, each test confirms only one aspect of one function within one class. The test is named appropriately, you know immediately what tests are failing. Try to test one logical path through the code, to as fine a granularity as makes practical sense. Once you have enough tests, you can prove that each method works as expected, just by running your tests.

When writing unit tests, ideally you should:
1. start with "happy cases" or tests of the intended functionality,
2. then boundary cases, followed by
3. the smelly cases, i.e. reported bugs
Usually creating the happy cases is enough with unit tests, as it means the others can easily be added once you are convinced you need them, with your architecture being flexible enough to do so easily.

By creating automated unit tests, you can be relatively sure that:
* the functionality of your methods doesn't accidentally change
* the class continues to do everything you expect if it passes your tests after a refactoring
* interactions amongst classes are clear

Often they will help you find any problems with your code early, even before you give it to someone else to look at. You won't need to use the debugger. The test is also a software contract, as it immediately tells you if the code stopped working as specified. To some extent, it aids design. It specifies how the solution will look, without going into implementation details. It's easier for you to focus on the simplest possible way to address the spec.

Automated Acceptance Testing

Unit testing is really niggly, though. It's down in details, which means you risk missing something big and obvious, yet typically, big and obvious is what your customers care about the most. Big and obvious tends to be what they actually see. Customers expect features to work together, without knowing how each part hangs together with another. They want to drive their car, without having to adjust engine parts in order to get something done.

Enter automated acceptance testing. This is my catch-all term for a number of approaches and techniques, like behavior driven development (BDD), integration testing, or customer-facing tests. They help you confirm that your customer will still be happy, as you haven't squashed his pet feature with a change you just checked in. They're tests which try to capture the essence of the customer's needs. As a result, they tend to operate at a much higher level than a unit, pulling together lots of related classes in order to check that they work as expected.

Most types of automated acceptance testing accept dependencies as given. If a particular object requires the dependency, it just instantiates the object. It doesn't mess around with trying to decouple things. This is both its greatest strength and greatest weakness. It's generally easier to work at this level, because the tests will prove that a particular feature works as your customer expects it to work. The tests exist to confirm your expectation. This confirmation has a lot of business value. At the same time, you generally try to avoid decoupling code. The process of decoupling is painful and time consuming, yet if you put it off too long, your code will be a mess. It will be difficult to work with. It will be difficult to change. In contrast, properly implemented unit tests, i.e. with dependency injection, slice up your code into fully isolated methods and classes. All changes remain extremely local. This is true regardless of whether you write the tests before or after the code exists. If you have enough unit test coverage, changes are very easy to implement. The tests confirm you haven't broken any other existing functionality. They also make your estimates much more accurate and reliable. You don't even need to use the debugger. Automated acceptance tests don't keep you as honest.

Automated acceptance testing can still be really useful when refactoring. You prevent "accidental regressions" which are important to the customer, which actually happen pretty frequently on longer projects juggling many features. Often potential regressions will happen many times, even before your initial beta release, as you work on your software. If you know you just broke something, you can fix it, without bothering anyone else.

As you discover new requirements, or "expected customer expectations", you can write acceptance tests so that you confirm whether or not your code actually does this. As you add features, you necessarily experience a combinatorial explosion of possible paths. Having automated acceptance tests can be quite helpful in making sure you have the basics covered. You can focus on writing a good algorithm, because a large number of simple tests will let you know whether you've satisfied them.

They serve as a safety net-as you experiment. For example, thanks to good acceptance tests, you don't need to constantly retest manually, whether your approach to overwriting a renamed file doesn't break the scenario of just writing out a non-existing file. Each logical path would have a few atomic tests, and give you immediate feedback as you try things out.

My favorite tool for this is Fitnesse. Tables on a wiki serve as examples. These are then hooked into test harnesses, which instantiate a clump of classes, and confirm that they do what you expect. Because everything is on a wiki, everyone on the team can contribute to building out the test suite: from the business analysts, to the developers, to the testers. It makes it easy to discuss the problems you are solving, in order to get a precise understanding of the problem. The developers participate in this also. As a result, they are much more likely to produce code which solves the right problem.

Misunderstood requirements are a major form of waste on many software projects, particularly with significant requirements churn, say, over 50%. Moreover, they can often have a big negative impact on a project. Scott Ambler summarizes: "The costs to recover from a problem can be substantial if the error, which could very well be the result of a misunderstood requirement, corrupts a large amount of data. Or in the case of commercial software, or at least “customer facing” software that is used by the customers of your organization, the public humiliation of faulty software could be substantial (customers no longer trust you for example)." As a result, reducing the probability of such an error increases the chances that the project will actually give the customer what he wants.

Why bother, then?

While this list was presented somewhat linearly, in fact you can use any combination of the above, and still reap some benefits as you go. In some cases, just introducing interfaces will make your code more maintainable, so you don't really need to go all the way. Having your concerns separated may be good enough, since you can be sure that there won't be side effects on either side of an interface. In other cases, only a full battery of unit and acceptance tests will be good enough, particularly for the core logic of a mission critical system. For example, in air traffic control software, where people's lives are at stake, a bug is just not acceptable, particularly since these types of techniques can help you prevent most of them...certainly the big and ugly ones which creep in because of neglect. In this case, you might even want to use formal methods to prove that it's not possible to have a bug in the calculation of a critical value.

A lot of useful technical detail gets lost in the ideological warfare about whether or not TDD is the best approach to use on a specific project. Sometimes it is. Sometimes it isn't, and something more lightweight is. It's worth teasing out which of these will work best for you, yet ultimately, your decision will depend on whether you think it will save or cost you time. It should also take into consideration when your product starts needing those tests.

In my opinion, TDD is so controversial because individual team members have a drastically different assumptions about time. Product consultant Don Reinertsen popularizes the cost of delay as a highly relevant metric when launching new (read:greenfield) products, both in large and small companies. At the start of his in-house training sessions, within most product teams, the range of initial cost of delay estimates is about two orders of magnitude. Each time it's the same company, the same product, the same target market, the same technologies, and the same people. Nonetheless, this lack of estimate agreement is consistent across companies. Presumably, no one even bothers bringing it up, so everyone just assumes that everyone else has the same assumptions.

As a software developer, you expected to provide extremely accurate estimates of your own work, sometimes at the risk of sacrificing your first born child if you are wrong to "the business". Typically though, we don't even ask "the business" for a cost of delay estimate. While it might require some basic accounting and marketing skills to produce such a number, you don't really need to know how it's made in order to take advantage of it. In the same way, your product owner or project sponsor doesn't need to know why your development effort estimates are big or small in order to take advantage of them. Even if it isn't exactly accurate, it's likely to be more accurate than any value on your team's current range (which, if you recall, may be 100x too large or too small).

Until your team has a working agreement on how to value time to market, or even an externally provided estimate of cost of delay, then you can't possibly determine whether you can afford to do TDD, or one of its "younger cousin" prerequisites. Once you do have a clear picture of the total cost of delay, you can figure out whether the long-term savings, and the bug prevention from test driven development, will actually make help your product make you money.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)