Introduction
You can often avoid repeating code by placing distinct parts of it into an array:
if(strcmpi(command, "north") == 0) {
if(cur_location->north)
GoToLocation(cur_location->north);
else
Print("Cannot go there");
}
else if(strcmpi(command, "east") == 0) {
if(cur_location->east)
GoToLocation(cur_location->east);
else
Print("Cannot go there");
}
else if(strcmpi(command, "south") == 0) {
if(cur_location->south)
GoToLocation(cur_location->south);
else
Print("Cannot go there");
}
else if(strcmpi(command, "west") == 0) {
if(cur_location->west)
GoToLocation(cur_location->west);
else
Print("Cannot go there");
}
A shorter equivalent:
enum SIDE {SIDE_NORTH = 0, SIDE_EAST, SIDE_SOUTH, SIDE_WEST};
struct COMMAND {
const char * name;
SIDE side;
};
static const COMMAND commands[] = {
{"north", SIDE_NORTH},
{"east", SIDE_EAST},
{"south", SIDE_SOUTH},
{"west", SIDE_WEST},
};
for(int i = 0; i < NUM_OF(commands); i++)
if(strcmpi(commands[i].name, command) == 0) {
SIDE d = commands[i].side;
if(cur_location->sides[d])
GoToLocation(cur_location->sides[d]);
else
Print("Cannot go there");
}
The second version is not only smaller, but also more abstract and hence easier to maintain. To implement short command names, you have to modify the code in just one place. If more commands will be added, you can easily replace the array with a hash table.
Such arrays are indispensable in many programs: from compilers and disassemblers to tax calculators and web applications. This article highlights some examples of their usage.
Price Calculation
In Refactoring, Martin Fowler discusses the following code:
double result = 0;
switch(movieType) {
case Movie.REGULAR:
result += 2;
if(daysRented > 2)
result += (daysRented - 2) * 1.5;
break;
case Movie.NEW_RELEASE:
result += daysRented * 3;
break;
case Movie.CHILDRENS:
result += 1.5;
if(daysRented > 3)
result += (daysRented - 3) * 1.5;
break;
}
In order to improve it, he replaces the switch statement with inheritance (one class for calculating the regular price, one for new releases, and another one for children's movies).
A simpler solution uses arrays:
enum MovieType {Regular = 0, NewRelease = 1, Childrens = 2};
const double initialCharge[] = {2, 0, 1.5};
const double initialDays[] = {2, 0, 3};
const double multiplier[] = {1.5, 3, 1.5};
double price = initialCharge[movie_type];
if(daysRented > initialDays[movie_type])
price += (daysRented - initialDays[movie_type]) * multiplier[movie_type];
It's evident that the same formula is used in all three cases, so you don't have to make a complicated class hierarchy for them. The table-driven code is faster, much shorter, and easier to maintain.
Generally, when you think in terms of the only paradigm (in this case, OOP), you will often miss a better solution. By learning more different approaches (including table-driven approach), you can extend your mental toolbox and solve more programming problems with less lines of code.
Shipping Rates and Progressive Taxation
Another situation when the tables are useful is a simple or a piecewise linear function.
Imagine you are developing a web shop script. The total price shown to a user should include a shipping rate, which depends on the weight of items and the shipping distance. For example, the rates of Russian postal service are:
Distance | Price for each 0.5 kg |
less than 601 km | 27.60 rubles |
601..2000 km | 35.00 rubles |
2001..5000 km | 39.60 rubles |
5001..8000 km | 46.35 rubles |
more than 8000 km | 50.75 rubles |
The dependence between distance and price cannot be represented with a formula, so the program has to search in an array to find the applicable rate:
def shipping_fee(weight, distance, value):
# weight in grams, distance in kilometers, declared value in rubles
rates = [
# max distance, price
( 600, 27.60),
(2000, 35.00),
(5000, 39.60),
(8000, 46.35)
]
# Find the applicable rate for this distance
for r in rates:
if distance <= r[0]:
rate = r[1]
break
else:
rate = 50.75
# For each 500 grams of the parcel
fee = rate * math.ceil(weight / 500)
# For each ruble of the declared value
fee += 0.03 * math.ceil(value)
# Apply VAT (18%)
return round(fee * 1.18, 2)
The array is usually small, so linear scanning works faster than binary search.
For example, sending The practice of programming from Moscow to Khabarovsk (0.3 kg, more than 8000 km, 214 rubles) should cost 67.46 rubles (around $2.20).
Progressive tax calculation is a similar, but slightly more complex example:
def get_tax(income):
if income < 0:
return income
MAX_INCOME = 0; RATE = 1 # array indexes
tax_brackets = [
( 8350, 0.10),
( 33950, 0.15),
( 82250, 0.25),
(171550, 0.28),
(372950, 0.33)
]
tax = 0
prev_bracket = 0
for bracket in tax_brackets:
tax += ( min(bracket[MAX_INCOME], income) - prev_bracket ) * bracket[RATE]
if income <= bracket[0]:
break
prev_bracket = bracket[MAX_INCOME]
else:
tax += (income - prev_bracket) * 0.35
return tax
Disclaimer: These functions are only meant to illustrate the table-driven technique; the author doesn't claim they calculate the shipping rate in Russia and the income tax in USA correctly. For the latter, please use the calculator from IRS site, which takes into account various deductions.
Assemblers, Disassemblers, and Code Generators
Writing an assembler or disassembler is a good exercise in table-driven design. Typically, you have arrays for one-byte and two-byte instructions. Mnemonic and operand types of each instruction are stored in these arrays. Gary Shute provides a small example of RISC assembler that uses this technique.
x86 code is harder to assemble because of its irregular and complicated structure. The best assemblers/disassemblers use additional tables to deal with opcode extensions (see, for example, CADT by Ms-Rem). The Table Driven Assembler by Jan Nikitenko shows that it's possible to write a generalized assembler, which can be retargeted for any ISA.
Character Properties
The ctype.h header file in C includes macros for character classification (isdigit, isupper, isalnum, etc.) Their comparison-based implementation is clumsy and ineffective:
int isalnum(int ch) {
return 'a' <= ch && ch <= 'z' ||
'A' <= ch && ch <= 'Z' ||
'0' <= ch && ch <= '9';
}
If there are several character properties, you can assign a bit to each of them, and then make a 256-element array storing the properties for each character. Most C run-time libraries use this method for ctype.h macros:
static const unsigned char properties[] = {
0, 0, 0, 0, 0, 0, 0, 0, 0, 16, 16, 16, 16, 16, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
16,160,160,160,160,160,160,160,160,160,160,160,160,160,160,160,
204,204,204,204,204,204,204,204,204,204,160,160,160,160,160,160,
160,202,202,202,202,202,202,138,138,138,138,138,138,138,138,138,
138,138,138,138,138,138,138,138,138,138,138,160,160,160,160,160,
160,201,201,201,201,201,201,137,137,137,137,137,137,137,137,137,
137,137,137,137,137,137,137,137,137,137,137,160,160,160,160, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
};
#define islower(ch) (properties[ch] & 0x01)
#define isupper(ch) (properties[ch] & 0x02)
#define isdigit(ch) (properties[ch] & 0x04)
#define isalnum(ch) (properties[ch] & 0x08)
#define isspace(ch) (properties[ch] & 0x10)
#define ispunct(ch) (properties[ch] & 0x20)
#define isxdigit(ch) (properties[ch] & 0x40)
#define isgraph(ch) (properties[ch] & 0x80)
If you have only one property to store, use a bit array, which is smaller (256 bits = 32 bytes), but needs extra operations to retrieve the value:
inline int isalnum(int ch) {
static const unsigned int alnum[] = {
0x0, 0x3ff0000, 0x7fffffe, 0x7fffffe, 0x0, 0x0, 0x0, 0x0,
};
return (alnum[ch >> 5]) & (1 << (ch & 31));
}
The alnum
array has ones in the positions corresponding to alphanumeric characters, and zeros in other positions.
The range of matching characters is often limited. If you have a range of 32 or less characters, you can store the bits in an integer constant:
inline int isvowel(int ch) {
int x = ch - 'a';
return (0x1104111 >> x) & ((x & 0xffffffe0) == 0);
}
When x is negative or greater than 31, the result of shift by x is undefined in C language. An x86 processor uses the lower 5 bits of x; some other processors always return 0 for the big shift amounts. For portability, we have to do an extra check: (x & 0xffffffe0) == 0
. Both MSVC++ and MinGW generate a branchless x86 code for this condition (using SBB or SETE instruction). The method is not as fast as the table-driven approach, but the code is smaller.
The arrays for these methods can be generated with a script. For Unicode character properties, use multi-stage tables. If the range of characters is continuous, a comparison should be more effective than a table lookup.
Decision Tables
A complex condition can often be replaced with a short decision table. Here is an example. A program counts the number of digraphs to determine the encoding and language of document (e.g., Windows Latin-1, German language). All characters are divided into 3 classes:
- letters of the language (e.g., a to z, ä, ö, ü, and ß for German)
- letters not used by the language (e.g., è or ô for German);
- separators (space, period, comma, etc.)
If two adjacent separators are found, they should be ignored, because somebody may use spaces for indentation. A non-used character should not appear near a letter, so the expected frequency of this digraph is zero (e.g., if we found êt digraph, the text is probably in French, not German). A word consisting of non-used characters should be ignored: this will improve statistics for multilingual texts (e.g., English trademarks mentioned in an Arabic text).
A series of if statements for these rules would be long and hard to read (try to write it!). Such conditions are better to be represented with a decision table. Here is a fragment of code that builds the expected_frequencies
matrix:
enum LETTER_CLASS { ST_NONUSED = 0, ST_LETTER, ST_SEPARATOR };
enum DECISION { DT_SETDONTCARE, DT_SETZERO, DT_LEAVEASIS };
const DECISION decision_table[3][3] = {
{DT_SETDONTCARE, DT_SETZERO, DT_SETDONTCARE}, {DT_SETZERO, DT_LEAVEASIS, DT_LEAVEASIS}, {DT_SETDONTCARE, DT_LEAVEASIS, DT_SETDONTCARE}};
...
switch (decision_table[letter_class1][letter_class2]) {
case DT_SETDONTCARE:
expected_frequencies[letter1][letter2] = DONTCARE;
break;
case DT_SETZERO:
if (expected_frequencies[letter1][letter2] != 0)
OutputMessage("Illegal letter combination %c%c\n",
letter1, letter2);
expected_frequencies[letter1][letter2] = 0;
break;
case DT_LEAVEASIS:
break;
}
Further Reading
Books
- Code Complete by Steve McConnell, chapter 18 (Table-driven methods). Covers access by index and linear search in array. McConnell also criticizes the mechanical application of OOP.
- Programming Pearls by Jon Bentley, chapter 3.1, 3.3 (A Survey Program, An Array of Examples). The chapter shows how to make a statistical program shorter by using arrays and provides more examples of where table-driven methods can be useful.
Web Pages
History
- 29th September, 2009: Initial post