Contents
Introduction
Hashtables are a convenient choice when it comes to searching, particularly because in the collection, an element is stored depending upon the information in the element itself. This information is extracted in an (almost) exclusive way by use of hash-functions. While hashtables are a useful data structure, they do lack one thing - the order in which elements are stored in a hashtable is not preserved. That means, given a Hashtable
, I can get an Iterator
on the set of Hashtable
keys, (by keySet()
method) or I can get an Enumeration
(by keys()
method). But none of the two is guaranteed to retrieve elements in the same order as they were put in. This is plausible because Hashtable
s have generally been used more for their hashing capability rather than anything else.
Why Ordering
In our project, we had a situation where the results returned by an SQL query was stored in a Hashtable
(so that it could be searched easily later). But at the same time, we needed to preserve the ordering returned by SQL, since ordering at database side is much faster than doing it programmatically. Moreover, database can handle multiple and complex sort orders quite well (using the order by
clause). Here was a situation where we needed to implement ordering capabilities into a Hashtable
.
The Algorithm
The obvious place to look was into the source code of the Java programming language, which gave us Hashtable.java
. The solution came up right in the form of writing another class - OrderedHashtable.java
which uses some techniques like inner classes.
The algo was simple: Take a Hashtable
and a Vector
. Whenever you put the objects into the Hashtable
, put the corresponding keys in the Vector
. Since Vector
is ordered, Enumeration
s or Iterator
s on it are guaranteed to traverse the keys in the same order as they were put in. So if later you need to iterate through all the objects in the Hashtable
(in the order they were put in), just take hold of the Iterator
of the Vector
and get the corresponding objects from the Hashtable
. There you have an OrderedHashtable
! Of course it still has the hashing capabilities of a Hashtable
, and you can of course retrieve an object directly if you have its key (same way as in a Hashtable
). Only new thing is that, Iterator
(or Enumeration
) is guaranteed to be ordered.
Some Issues to be Careful of!
Of course, there are some important things to take care of, like synchronization. Since items can be added as well as removed from OrderedHashtable
, we need to make sure that the internal Vector
and Hashtable
of the class remains thread-safe. If two threads access it at the same time and one adds and other deletes the same object, it could leave the Hashtable
in an arbitrary, indeterminate state, where either the Vector
or the Hashtable
or the internal COUNT
of number of elements (or all of the three) may be incorrect. This calls for certain sections of code to be synchronized.
Similarly, there is another important issue of overflow. Both Hashtable
and Vector
are resizable, but the way their size increases is different. In Hashtable
, we have a loadfactor
which specifies how full the Hashtable
is allowed to become before the size increases (by calling the rehash()
method). Whenever the number of entries in a Hashtable
exceeds the product of loadfactor
and current capacity
, the capacity is increased by twice the current capacity
. Default values of initial capacity
and loadfactor
for Hashtable
are 10 and 0.75 respectively. So in the default case, when the 9th object is put in, the Hashtable
is doubled to 20.
The Vector
however operates differently. In a Vector
, you can specify a capacityIncrement
value. Whenever the Vector
becomes full, its capacity is incremented by capacityIncrement
(or doubled in case the capacityIncrement
is 0). Default values of initialcapacity
and capacityIncrement
for a Vector
are 10 and 0. Here in the default case, the size grows only when the 11th object is put in and the size grows to 20. Again when 21st object comes in, size is doubled to 40 and so on. One would need to be careful in put()
and get()
methods to handle this correctly. Other issues like deletion etc, can also be handled easily with slight care.
Implementation
Now comes the magic of inner classes
. Iterator
s and Enumeration
s are implemented as inner classes. (Obviously, how else?). When either Enumeration
(enumerateKeys()
method) or an Iterator
(iterateKeys()
method) is called, it is this inner class which iterates on a private Vector
and returns the corresponding object from the Hashtable
and makes sure to keep it within error limits.
Performance
A final thought that comes to mind immediately after this is - "What is the performance overhead of implementing this ordering?" Since we are using a Vector
, which is known to be one of the slow performing classes among all Java classes, we do expect our OrderedHashtable
to be slower than Hashtable
. I have included two small test files apart from the source code of OrderedHashtable
, in the download package. The code is well commented, and I hope it is easily understood. While one of them (Test.java) establishes the fact that ordering is properly implemented (after all that is what this article is about!) the other file (Performance.java) does some basic performance testing. It populates a Hashtable
and an OrderedHashtable
with one million objects and compares the time spent in accessing them. While Test.java prints the items retrieved to the standard output stream to show that they are indeed in the same order as they were inserted, Performance.java doesn't print the objects themselves. There are two reasons for this - one, I don't want to be flooding the screen with two million lines, and second (more importantly) the System.out.println()
will itself take some time, whereas we just want to compare the time spent in accessing the elements of the Hashtable
s.
On my system, a number of trial runs indicate that for very large values (like 1 million), the time required by both are almost same (31 milliseconds). This is surprising because I had expected the Vector
to slow down the OrderedHashtable
quite a bit. Yes, once in a while, the OrderedHashtable
takes twice the time as that of Hashtable
, but such instances are rare. You are welcome to test-run it on your system. Your mileage may vary because of the system, the system settings and the current load on it. If there are any issues, comments, suggestions, or new findings, you are most welcome to enlighten me and other fellow programmers at CP.
Final Notes
The source code presented here implements most of the methods of java.util.Hashtable
. But there are a few methods which have been left out, some because they are not in line with the concept of ordering (e.g. those which return a Set
), and some because of their being declared as protected
or private
in the underlying Hashtable
class (e.g. rehash()
). However, enough methods have been implemented to provide the full functionality of a Hashable
(for almost all situations) while maintaining ordering at the same time. But if you feel that it should be having some other methods also, which could improve the functionality/usefulness of the OrderedHashtable
, then you are most welcome to provide any feedback and discuss it with the CP community. :-)
Thanks for reading the article. Have a great day!