Introduction
I want to provide thorough information for the everyday coder - without the
"I want to sell you something so I have to look extra smart" obfuscation layer.
These are my personal views, acquired by analyzing my own development
"challenges", browsing the web, discussing it at CP and elsewhere, and trying it
myself. It won't be a "brief introduction", so here's an overview in case you
want to skip something:
We all know this: Your project has a neat design you're really proud
of. You did care for all eventualities that came up in the early design studies,
the schedule is approved, there's even an extra week "padding for the
unexpected" - and you are happy you can finally start coding. Two weeks
into it, the first change requests arrive. Nothing special, just the usual "can
we do this, too?" - "Yes, no big problem, we just need to plug an Carbunkulator
into the Arglebargle".
Halfway through it, things look less shiny. A few more functionality tweaks,
a few bugs, your best coder one week in the hospital - the schedule lags behind
big time. Your boss returns from a talk with a client, after they played around
with the first beta. It turns out they never really needed an arglebarge, it is
just in the spec because their old system had a big one that was very expensive.
What they really need is a big gonkulator, and it must be fast - much faster
than now. Oh, and the one feature that gave you headaches while designing - you
can scrap that: the only one guy who insisted on this feature (although no one
understood why) moved on to greener pastures.
Whatever the reasons - the application ends up different from what it was
envisioned. Chances are, it's a mess of crooks and shortcuts across a baroque,
utterly inefficient infrastructure. You might even get afraid of touching it -
'cause a little change here breaks something there. Every time you try to fix
some nasty behavior, you have to wad through tons of interdependent code, and
every function, every class you see screams "rewrite me". Far from what you
wanted.
Interestingly, you can arrive at the same place by leaving out the
formal design process altogether: You have an idea, a rough plan how you can
make it, and start coding. It starts well, but after some time, it gets tricky:
an important library refrains doing what you expect, some things didn't work out
as you thought, you're forced to hold much more distributed state information
than you can juggle in your head.
The whole thing turns out a bit fragile,
and although it mostly does what you want it to, it's a pain to use. As much as
it's brittle to the user, the code feels brittle to you, probably no one will be
able or willing to continue working on it, you're reluctant to change anything
yourself, because, once you start to weed out the crooks, you wish you had the
strength to start over again.
What went wrong? In the first scenario, the design (likely perfect for
the initial requirements) did not live up to the changes that are inevitable in
the course of a project. In the second, a reasonable design failed to
evolve.
The solutions I discuss here are aimed at the course of the project, to help
you avoid situations like this. Once you are stuck with a huge unmaintainable
code base, it's much harder to stay on the success track (or get back on it
again). At least, even when you feel you're stuck, many of the techniques here
can help you not to give up on the way - neither economically nor stress-wise.
AP is a collection of principles and techniques that try to overcome the
inflexibility of the strictly-design-based development cycle. Three things make
AP very powerful:
- You are not required to model the entire development process after AP (of
course you can). You can change project management slowly and incrementally, and
you need to adopt only what really helps you.
- The Agile process does not require extreme excellence at design or
development - rather, it's aimed at the average team with some
experience.
- The techniques are simple, so simple that most old-timers consider them
"common knowledge" - if only they were!
Here is what I understand as the core rules:
- Simple
Design: use the simplest design that solves your immediate needs
- Design as you
go: Always scrub and exercise the code you work on while the project
develops, to make sure it remains well structured, designed and written.
(Techniques for this are called Refactoring)
- Incremental
steps: When changing or adding code, take the smallest step you can, then
compile and test again.
- Independent
steps: Don't mix up the things you do - when you fix a bug, fix the bug,
when you add a feature, add the feature.
- Know and use your
tools with purpose: Especially for tasks beyond writing code - like design
and documentation, know the available tools, use those that help you (not just a
single one), and always understand why you do what you do.
- The Meta
Rule: Use only the principles and techniques that actually work for
you.
|
Simple Design and Design as you go
I considers this the very heart of the AP approach - and the one with the
biggest potential to change the development process.
Instead of planning ahead for all nooks and crannies, make sure "Version 0.1"
works out well. Concentrate on your next task, and pick the most simple design
that makes it possible. This does not mean forget about design! Design
remains an important part of the entire process, and classic good/bad rules
still apply. The additional rule is: make your next step happen, not the
10th. Don't go far out of your way for something you think you need
later. When you really need it, new possibilities will have opened, and
priorities sure will have changed.
To keep the design evolving with the project, you always need to pay
attention to the code base. With only some primitive techniques and trust in
your instincts, you can get along very well for most of the time - so you're
less burdened when you have to face te real challenges. Exercising the part
you're working on means: over time the "hot spots" of your application get most
attention automatically.
Although individual things, like renaming
variables, might appear silly as itself, the cumulative effect is impressive.
It's wonderful when the feeling of understanding your code base kicks in - don't
miss it!
The initial design will have a great impact on your project as well (although
you typically end up more flexible than with a strict design based approach).
But don't worry too much: Different designs can support the same product, a
simple one will give you something to work with, and refactoring will make sure
your design grows with the application.
If the analogy is allowed: Agile Modeling is replacing the intention of a
"perfect creation" with an evolutionary process: Although 7 neck vertebras can't
be the perfect design for both the mouse and the giraffe, it does it's job very
well in both cases.
Advantages
- You can react much more flexible to requirement changes and additions
- The overall design remain simple almost "by itself" - baroque arabesques are
usually rooted out very early, before they grow big
- By scrubbing the code you're working on, the most important parts get most
attention, and you don't invest extra time into changing what doesn't need to be
changed.
- When cleaning up your code is technically part of the development process,
you have much better chances to end up with a well commented and documented
orthogonal readable code base
- You might be able to start coding earlier (although you won't necessarily be
faster overall)
- You won't end up in the dead ends
A good designer/developer can achieve the same with a "less agile" approach.
But chances are, a wizard will get very close to the agile approach himself, if
you let him do as he pleases. And for us non-wizards, we're all fallible to the
stress and strains of development, and forget to follow idolized "good practice"
in those dreaded one-nighters.
Incremental, independent Changes
Incremental changes are the key to happiness, and the core idea of
refactoring. However, I want to separate the principle from the techniques,
that's why it gets it's own paragraph. To repeat the two rules:
Take the smallest step possible into the direction you want to go. Then
compile and do a basic test that it's still working. And always do only one step
- don't try sneak in a feature while you refactor - tempting as it may be.
For me, these rules still require some discipline, and a conscious effort.
Sometimes it just seems easiest to scrap a class, and write it anew. Yet, when I
get interrupted, it's much easier to say "five minutes" - and finish the
search&replace at hand; or jot down a quick note what I was doing. When I
return to my desk, I can continue without looking back and forth where I left
of, without the fear I forget something.
Advantage: You always have working code you can deliver. Don't take
this literally and skip QA - but in case of emergency (e.g. a bug at a customer
site) you're much more ready to leave your current task in a working
condition. In-house testing can get a new version anytime. You are quicker
to react to new requirements: No more "I need to finish the Gonkulator rewrite
before I can add this graphic feature that everybody suddenly seems to need
urgently."
Also, your code passes much more often through the compiler, and a basic
"does it work?" test - especially so if you do Automated Unit Testing. This
gives a bit more confidence in complex code, and can be a real live saver.
There's one human reason behind this rule: Only a limited amount of state
information is present in your mind (the often-mentioned "seven things").
Conscious splitting into steps with the least state information tries to saves
you from a "short term memory overflow", which makes you forget things you
wanted to do, and feel overwhelmed by the complexity of the code. And there is a
Murphy reason: Every step you take will be a little bit more complex, have a few
more dependencies and side effects than you expected. E.g. when rewriting two
classes into one, a problem with header inclusion order can sidetrack you so far
that you just forget to initialize an important variable again.
Know your Tools, and know your reasons
Besides writing code, many things belong to the development process: Design,
Documentation, QA...
The first question should be: Why do I do that? The importance of these
artifacts is as well known as a rich number of techniques and methods for them,
that all to often claim or at least suggest to be exclusive. But, to take an
example: why do you actually document your code? Do you still want to understand
your code in 6 month? Should a 3rd party be able to write plugins based on your
API? Is it to inform co-workers of changes in the interface or implementation
specifics? Is it because you plan to retire to the Bahamas, so the code base
needs to be passed on to a still-to-be-hired guy? These are quite different
goals, and for each of them, different techniques are appropriate.
In the example, documentation comes in many flavors. UML charts, formal code
comments that can be extracted by a parser, inline comments, a separate Word
file describing your intentions, Source Control change logs, etc. You are much
better off when you understand and use more than one tool. Look out for new
tools, and don't forget about unused features of the tools you have.
Advantage: The time spent on non-coding tasks is used more effective,
and doesn't feel wasted. Again: It's just to make you happy!
Refactoring is nothing magic, refactoring is a fancy word for cleaning
up the code. A more formal definition would be:
Refactoring means continuously improving the design and appearance of
your code base in small steps confined to surveyable
areas. |
All techniques are allowed that:
- improve code quality, readability, design
- are simple, or even "dumb" (such as automatic search-and-replace of a
variable name)
- are small, and independent steps
Refactoring has two major uses: first, to keep an application well designed,
to enable the "design as you go" principle. Second, instead of rewriting a
larger module or class, you can refactor it into something much better. This
requires much more discipline (controlling one's enthusiasm to make it
better) than a rewrite, but is often the less risky yet more rewarding
route.
I split the discussion in two parts - a formal list of basic techniques, and
a real life example that contains suggestions for less automatable ones.
Basic Refactoring Techniques:
- Rename a variable / type / class / function
Every developer or
team has it's coding standards - usually both formally defined and informal.
Apply them to your code! If you have a function ReadData, and a
complementary function DataWrite - rename one, so they are consistent. if
you have a member that misses the m_ - prefix everyone else uses, spend the one
or two minutes to change that. When you plan to change multiple identifiers,
change only one at each step then compile and run. Use unique names - so when
you forget to rename one place, the compiler catches it. (Oh, if the
function is in an interface declaration shared by all modules of your
10-developer-project, ask your co-workers before you do!)
- Reformat a function to conform to your coding standards
We want
readable code - make it so! All-nighters tend to produce interesting,
almost-working code that's horrible to understand. We don't want to throw it
away, so first apply some formal beatification to it, then look deeper.
- turn a code sequence into a function
If the complexity of a method
increases beyond what makes you feel well, or if you notice that similar
functionality is used at different places, make it a function.
- move functionality shared by multiple classes to a common base class (or
helper class)
This can break down the complexity of a single class back
to a reasonable level.
- Separate independent functionality into different classes /
functions
The inverse of the above.
Notice a theme in c-e: We introduce a base class only when it seems
necessary. Early design decisions are often intentionally immutable: because the
design guru said it so, because it was the result of heated discussions, etc. In
this course, the technical reason for a decision often gets lost, and with it:
simplicity.
All these steps will take around 5..10 minutes - usually including "compile
and test". You can take them anytime: when you're bored, while you're waiting
for another project to compile, when you don't want to leave shortly before your
boss. Whatever. Even taking one step will make your code base a little bit
better, and you will have working code. They are easily undone (assuming you
know to use your tools: editor, and source control)
While the decision what to do requires that you understand the
structure of the code you're working on, executing it does not: they are
simple search-and-replace or copy-and-paste tasks, and under VS.NET there are
nifty tools available that can automate them safely.
Also, AP does not tell you how to design, only when. You still need to know
what makes a good design, and find it yourself.
Other Refactoring techniques - A real life example
When I plan to refactor a complex class or module, I start with the things
mentioned above. This has two purposes: First, the code gets easier to read,
more compact, and unnecessary arabesques are removed in these steps, so I have a
much easier time later on. Second, I get fairly accustomed to the class again,
refreshing my memory. I find out which members are hot spots, discover old
comments telling me what I wanted to do, etc.
Only when I'm through with the basics, I begin the actual changes. Again, I
try to take the smallest step that takes me closer to my goal and keeps
the code working. Here you need to be more creative - the techniques are
not that straightforward anymore, and you need to plan ahead. That's why I'll
take a real example, to illustrate some possibilities.
Recently, I refactored a class implementation that simulated a map<int,
struct> by two arrays: a data array holding the values, and a key array,
holding the key for each value at the same index as the data array. To speed
things up, I tried to store the values at their "native" position: e.g. the
value for key 17 I would first try to insert into index 17. To look up a value,
I checked the "native position", then I had to search the key array for the
index where the key was stored, then retrieve the value from the same index in
the data array. The whole thing looked like this:
if (keyArray.size() > key && keyArray[key] == key)
return dataArray[key];
else {
int index = FindKeyInKeyArray(key);
if (index >= 0)
return dataArray[index];
}
(This atrocity to common sense grew from a quick side hack into a generic
datakeeper class. I'm really ashamed of this - well, no more)
The first step were to rename the arrays (originally named data and
map) to the ones above, so I wouldn't get a name clash later on - neither
in the code, nor in my mind.
Ultimately, I would have to remove the keyArray
index
lookups completely, and replace the dataArray
lookups. So I did a
"Find in Files" for "keyArray[" and "dataArray[", just to see how often they
were used. I was shocked - over 20 times each. I needed to break this down a bit
further, before I "injected" the map<>.
So I moved some rarely used extra functionality that affected most functions
to a derived class - due to the prior usage this wouldn't break any client code.
While this moved no "hot spots" out of the class, the complexity of the hot
spots itself was greatly reduced. Compile and run - still working. (Later I
found I introduced a bug in this step, that even escaped my quickly written unit
test. But due to the new cleaner code structure, it was found quickly).
The remaining lookup complexity, especially when inserting/changing values,
was dominated by the "native position" handling - it probably didn't help much,
and made everything ugly. I decided to remove this altogether. While the code
would still work, performance might take a hit - this was a small risk I had to
take. The worst thing that could happen would be rolling back to before
this step (so I made a check-in at this point).
After removing the extra lookup, most of the hot spot functions did something
similar to this:
int index = MapID(key);
if (index >= 0) {
}
else {
}
I figured, to replace this with a map, it wouldn't be a simple m_map[key]
-
the dataArray[index]
was often used multiple times but I wanted the map lookup
to happen only once, and I didn't need the operator[]'s feature to insert a new
element silently. So I wrote a helper function, that contained all the
functionality that I intended to change:
ValueType * GetValPtr(int key)
{
int index = MapID(key);
if (index >= 0) {
return dataArray[index];
else
return NULL;
}
And started replacing the lookups by
ValueType * pVal = GetValPtr(key);
if (pVal) {
}
else {
}
Again, very simple replacements, especially since I had made sure before
local variable and parameter names are consistent. I renamed dataArray
and MapID()
in the class declaration and the GetValPtr
implementation, so the compiler caught all occurrences where I was still
relying on them. I picked "pVal" as name for the new local variable, since this
name was used nowhere in the class.
After this step, I had a sleek implementation of a horrible idea. Quite an
improvement.
Everything worked fine, so I took the last step: introducing an
std::map<int, ValueType>
member into the class, commenting
out the the dataArray
and keyArray
declaration, and
replacing the GetValPtr
implementation with a std::map.find
call:
ValueType * GetValPtr(int key)
{
std::map<int, ValueType>::iterator it = m_map.find(key);
if (it == m_map.end())
return NULL;
else
return &(it->second);
}
Of course, replacing the two arrays with a map had some other side effects,
temporarily breaking the storage functions (that needed to iterate over all
values), and turning the array allocation/cleanup functions into syntax errors.
This was a single big step, I had no ideas how to break this down further (and
maybe started to get a little bit impatient). But due to all the preparation, it
took no more than 40 minutes to do the change, replace the keyArray
iteration with an map iterator, and get the code compile and run
again. The thing is working fine now, I felt very happy, and I sleep much
better.
While scrubbing the code, I marked commented-out sequences with a special
comment tag, so I could search for these places. Thus, removing all the dead
code (that I left in initially for reference and rollback), was a matter of a
minute or two.
Of course, a few things still could be done. There's still a naming
inconsistency in the "insert new item" implementation, and the GetValPtr
function could be removed altogether, replacing the ValueType *
with anmap::iterator
. But the task at hand was done, and a
new task was waiting for the next day, so I left it at that.
Refactoring techniques used in the example
A short overview of the things I used:
- Move "hot spot" functionality to be changed to a helper function, that has a
the same calling syntax for the old and the desired new implementation - so you
separate syntactic changes at many places (that are semi-automatic and can be
caught by the compiler) from functional changes (that need to be tested if they
still do the same)
- Move functionality to be replaced "inline" to a temporary helper function
that you can remove later
- Use "Find in files" to find occurrences of a certain construct in your
project - so you find hot spots, and know if you can replace it in one step.
- Pick names that make the compiler catch mistakes, or places you forgot to
change
- Use simple refactoring techniques, like generalizing variable names, until
you feel you can handle the complexity of the trickier steps
OK, since you didn't fall asleep yet, you'll probably pondering one question:
How do you convince your boss that renaming variables is worth your pay?
- The best selling point is success.
Just try some of the techniques and
ideas presented here on a small scale. In an ideal situation, they help you
solve a tricky problem efficiently - maybe one that has been bugging your team
for a time. Your boss might ask you "Nice! How did you do that?" Just mention
that you "tried some new techniques you read about recently"...
- Refactoring is a fancy word for cleaning up code
There is a reason
to use a fancy word: it sounds new, it sounds smart, and it makes you think
about "usual things" from a different perspective. "Agile Programming" and
Refactoring are buzzwords, chances are, your boss might already have heard
something of this and wonder if he misses something.
- Unless you have a very strict development process, Agile techniques can
sneak in step-by-step. A key point of all Agile techniques is: only do what
works for you. You don't have to revolutionize the entire development process.
Start with "Incremental changes" for tasks assigned to you. If your tasks is to
rewrite something, consider refactoring it instead. Try new tools, and unused
features of existing tools, first for minor design and documentation tasks.
- Remember the prime strength and original intent of Agile Programming:
Additional flexibility towards requirement changes. Requirements do change over
time. Clients can change their priorities at a large scale after trying the
first beta. New features need to be added. 3rd party components (libraries, or
OS components) change over time.
AP is not the holy grail either. There are some requirements that must be met
to make it work:
- You need an open, friendly team, .
If you have to stand mind games among
the coders, if communication is bad, or if your co-workers take changes to
"their" code as personal insults, it won't work.
- You need some experience in the team*
While you may not need a design
ueberguru, you need a decent amount of real-live experience in your team. AP can
stand a certain percentage of newbies, but if your 10-headed team consists of 9
freshmen and one experienced developer to guide them, you're probably better off
with a more formal approach
- AP alone is not sufficient
You will need other techniques. Focusing
solely on AP techniques, you can quickly loose the "big picture" of your
application. bad thing - you still need to know what you do, how things interact
etc. AP is one tool, to make some of these tasks easier.
- Refactoring won't change the construction plans
The basic structure of
your code base can rarely be changed through refactoring. Usually, you can work
towards something that might even look completely different, but uses the same
basic mechanisms as the old code. If the implementation is good but the
structure is wrong, a rewrite might be faster. Relying on AP alone might
stall the large-scale changes that are necessary from time to time.
- AP doesn't tell you how to design
Although certain techniques became
popular together with AP (designing around user stories, design patterns, etc.),
there is no formal mechanism. As I said, old design principles still hold true,
but some designs work better with AP than others.**
*) One strain of Agile Techniques (Xtreme programming, the one that
gets all the press) strongly emphasizes knowledge propagation. Definitely worth
looking at when you have a diversely skilled team.
**) In my (still small) experience, data-centric
designs with well separated layers tend to go along best with Agile Techniques -
but that might be due to the fact that I personally prefer them.
Links
- Here on CP, Marc Cliftons Organic
Programming Environment and Automation Application
Layer are well worth reading if you're looking for design concepts.
According to Marc, they go along very well with agile techniques. (Sorry to the
CPians with valuable related articles - I'm just not aware of them. If you know
a related article, why not leave a comment?)
- The WIKI - An interesting
"open database" mainly concerned with modern design and development techniques -
a good starting point is WhatIsRefactoring
- Agile Modeling - A
very good web site, with much more information than I can (or want to) present
here, well written and not too heady.
Why is Refactoring called Refactoring?
Although there are different explanations, the one that feels most natural to
me is this one: Refactoring stems from the mathematicians "desire" to reorganize
a term like
F = xyz + 2xy -7xz + 3yz - 14x + 6y - 21z - 42
into
it's factors:
F = (x+3)*(y-7)*(z+2)
While both are absolutely
identical, the second one exposes it's inner structure and important information
on one look. Also, there are parallels between the processes.