Introduction
When I was once involved in the design of a routing optimization software for a telecom consultancy, a required component was a mechanism for finding least cost routes
for traffic demands between communicating network sites. In other words, an efficient means of searching for the best tariffs provided by all of the telecom carriers.
There are a huge number of tariffs for routing different types of telephone calls, be they geographic, non-geographic, mobile, premium rate, freefone, international, etc.,
which are frequently subject to change. Of concern was the problem of creating a mapping between strings representing telephone numbers and their associated tariffs
in ways that were accurate and quick.
When looking up tariffs for many thousands of numbers on large networks, a sequential search for anything but trivial examples is impractical for a commercial
least-cost routing package. Binary searches on sorted lists were investigated and were a big improvement, giving a good performance even on very large lists,
akin to an average of O( log n ). It was also realized that hash tables are among the fastest means of lookup, usually executing in constant time O( 1 ).
About Hash Tables
A hash table is a software data structure based on the idea of using an indexing lookup table to significantly accelerate the mapping. Given that the telephone
numbers are usually represented by strings and the tariffs represented by numbers, we cannot directly use the strings to index an array. Instead we calculate
the string’s index value using a special hash function, and use this index value to access the array containing the pre-defined tariffs that we want.
Real-world Hashing
In reality, perfect hashing is less than straightforward to implement. What usually happens is that the hashing function will sometimes map two or more different
strings on to the same index value, known as collisions. To get around this, our hash table needs to map each string onto a linked list of potential choices,
not just a single one, so that by searching the candidate linked list we can find the string we are after, as well as the tariff value it contains.
When using sequential searches (for loops), the average number of comparisons needed to find what we are after is n/2. When using hash tables, the average number
of comparisons required is one, plus the one hash function calculation.
Defining a Hash Table Class
Here is the definition of the class to implement the hash table. The hash table itself is an indexed array of linked lists. The majority of these linked lists
will contain none or one entry. When collisions occur with more than one string getting hashed to the same index value, the entry gets added to the same linked list
via the Add
function:
class HashTable
{
private:
int hash( const char* str ) const;
LinkedList list[ TableSize ];
LinkedList const & FindList( const char* str ) const;
public:
void Add( const char* str, int id );
bool Get( const char* str, int& value );
};
Here is the choice of hash function, which is based on a shift-and-add algorithm as a means of randomizing the strings.
int HashTable::hash( const char* str ) const
{
assert( str != NULL && str[ 0 ] != NULL );
unsigned h = str[ 0 ];
for ( int i = 1; str[ i ] != NULL; ++i )
{
h = (h << 4) + str[ i ];
}
return h % TableSize;
}
To find the appropriate linked list in the hash table, it is a simple matter of calculating the hash value and using this value to index the correct linked list:
LinkedList const & HashTable::FindList( const char* str ) const
{
int h = hash( str );
return list[ h ];
}
Example Usage
Here is some code outlining how to use the HashTable APIs to add new items and search for items using the hashing algorithm:
HashTable hTable;
int value = 0;
hTable.Add( "01316530969", 11 );
hTable.Add( "01314405220", 12 );
hTable.Add( "02920591318", 10 );
hTable.Add( "02920565967", 14 );
bool success = hTable.Get( "01314405220", value );
The Visual Studio project is also downloadable from here.