Hash Tables as an Efficient Means of Search

AndyUk06

4.64/5 (4 votes)

16 Nov 2011CPOL3 min read

34.2K

710

An implementation of hash tables as a means of fast lookup in C++.

Download source code - 2.26 KB

Introduction

When I was once involved in the design of a routing optimization software for a telecom consultancy, a required component was a mechanism for finding least cost routes for traffic demands between communicating network sites. In other words, an efficient means of searching for the best tariffs provided by all of the telecom carriers.

There are a huge number of tariffs for routing different types of telephone calls, be they geographic, non-geographic, mobile, premium rate, freefone, international, etc., which are frequently subject to change. Of concern was the problem of creating a mapping between strings representing telephone numbers and their associated tariffs in ways that were accurate and quick.

When looking up tariffs for many thousands of numbers on large networks, a sequential search for anything but trivial examples is impractical for a commercial least-cost routing package. Binary searches on sorted lists were investigated and were a big improvement, giving a good performance even on very large lists, akin to an average of O( log n ). It was also realized that hash tables are among the fastest means of lookup, usually executing in constant time O( 1 ).

About Hash Tables

A hash table is a software data structure based on the idea of using an indexing lookup table to significantly accelerate the mapping. Given that the telephone numbers are usually represented by strings and the tariffs represented by numbers, we cannot directly use the strings to index an array. Instead we calculate the string’s index value using a special hash function, and use this index value to access the array containing the pre-defined tariffs that we want.

Real-world Hashing

In reality, perfect hashing is less than straightforward to implement. What usually happens is that the hashing function will sometimes map two or more different strings on to the same index value, known as collisions. To get around this, our hash table needs to map each string onto a linked list of potential choices, not just a single one, so that by searching the candidate linked list we can find the string we are after, as well as the tariff value it contains. When using sequential searches (for loops), the average number of comparisons needed to find what we are after is n/2. When using hash tables, the average number of comparisons required is one, plus the one hash function calculation.

Defining a Hash Table Class

Here is the definition of the class to implement the hash table. The hash table itself is an indexed array of linked lists. The majority of these linked lists will contain none or one entry. When collisions occur with more than one string getting hashed to the same index value, the entry gets added to the same linked list via the Add function:

C++

class HashTable
{
private:
    int hash( const char* str ) const;
    LinkedList list[ TableSize ];
    LinkedList const & FindList( const char* str ) const;

public:

    void Add( const char* str, int id );
    bool Get( const char* str, int& value );
};

Here is the choice of hash function, which is based on a shift-and-add algorithm as a means of randomizing the strings.

C++

// Hashing function to convert string into integer index
int HashTable::hash( const char* str ) const
{
    assert( str != NULL && str[ 0 ] != NULL );
    unsigned h = str[ 0 ];

    for ( int i = 1; str[ i ] != NULL; ++i )
    {
        h = (h << 4) + str[ i ];
    }

    // Return remainder
    return h % TableSize;
}

To find the appropriate linked list in the hash table, it is a simple matter of calculating the hash value and using this value to index the correct linked list:

C++

// Return the linked list containing one or more items
LinkedList const & HashTable::FindList( const char* str ) const
{
    int h = hash( str );
    return list[ h ];
}

Example Usage

Here is some code outlining how to use the HashTable APIs to add new items and search for items using the hashing algorithm:

C++

HashTable hTable;  
int value = 0;    
      
// Add some entries  
hTable.Add( "01316530969", 11 );  
hTable.Add( "01314405220", 12 );  
hTable.Add( "02920591318", 10 );  
hTable.Add( "02920565967", 14 );  
      
// Get the value from the return linked list  
bool success = hTable.Get( "01314405220", value );

The Visual Studio project is also downloadable from here.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)