Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / SQL

Persistant Hash Table

4.00/5 (1 vote)
27 Oct 2011CPOL3 min read 26.1K   552  
Key value store

Introduction 

I got inspired by the NO SQL and wanted to start writing one myself, I saw few very good open source C++ NO SQL like “MongoDB” and “levelDB” and wanted to write a key value store myself. I wanted to start off with a minimalistic set of goals to start and complete in a weekend. So with these parameters, I am posting this one. I hope it will be helpful to someone who wants to develop key store as a beginner.  

Background

There are many ways to create a table and access it, each one has its own advantages and disadvantages. Here, I am using a hash function to access a record in a file. Other methods include maintain index tables, sequential access, b+trees, persistent pointers, etc.

Using the Code

I have used a simple logic to construct the persistent hash table. If we define number of buckets, their size and record size, it is easy to store and retrieve. When we want to write a record or read a record, key is passed to the hash function, with the hash value, the bucket offset is derived. Then we need to jump to that offset and write the record or read the record. Before doing that, a check is made if a record is already present in that location, if present then we jump to the next record and then that record is written.

  • CHash(long nBuck,long bucksize,long recsize): is one of the overloaded ctor where we pass number of buckets, bucket size and each record size (table size). For this table, key should be long.
  • CHash(long nBuck,long bucksize,long recsize,unsigned long (*uh)(void *)): is another ctor where user defined hash function pointer can be passed. key can be user defined.
  • createdb: creates the hdb file with the name given, writes and reads can be made.
  • writedb: opens the file for writing
  • readdb: opens the file for reading
  • closedb: closes the file handle
  • writerec: write record has two overloads, one for key type long and another for user defined key type, writes user defined data, writes data into the bucket if record is present (i.e. collision occurred) then it jumps to the next record until it finds a vacant place to write.
  • readrec: readrec  has two overloads, one for key type long and another for user defined key type, reads user defined data. Reads the record from the bucket offset, if the value searched is different, then readnextrec is used to fetch the next record with the offset taken from previous readrec call.
  • readnextrec: as mentioned above, this needs to be used to read consecutive elements in the same bucket by passing in the offset from readrec.
  • writeoffset,readoffset,GetbucketOffset are the helper functions to move around the file for writing and reading.
C++
int CHash::GetbucketOffset(long buckno)//returns the bucket offset
{
   int offset = 1;
   offset = buckno * m_bucketsize;
   return offset;
}
C++
int CHash::writerec(void* key,void *data)
{
	long buckno = GetbucketOffset(hash(key));
    void *temp = malloc(m_recsize);
	while(0 != readoffset(buckno,temp,m_recsize))
	{
		buckno += m_recsize;//go to the next record
	}
	free(temp);
	writeoffset(buckno,data,m_recsize);
	return 0;
}
C++
int CHash::writerec(void* key,void *data)
{
	long buckno = GetbucketOffset(hash(key));
    void *temp = malloc(m_recsize);
	while(0 != readoffset(buckno,temp,m_recsize))
	{
		buckno += m_recsize;//go to the next record
	}
	free(temp);
	writeoffset(buckno,data,m_recsize);
	return 0;
}
C++
int CHash::readrec(long key,void *data,long &offset)
{
	long buckno = GetbucketOffset(hash(key));
	readoffset(buckno,data,m_recsize);
	offset = buckno;
	return 0;
}
C++
int CHash::readrec(void* key,void *data,long &offset)
{
	long buckno = GetbucketOffset(hash(key));
	readoffset(buckno,data,m_recsize);
	offset = buckno;
	return 0;
}
C++
int CHash::readnextrec(long offset,void *data)
{
	offset += m_recsize;
	readoffset(offset,data,m_recsize);
	return 0;
}

Testhash.cpp is written to give a demo of how to use this class.

Limitations

May be in my next article, I can improve on the pre-definition of the size for buckets and tables. Instead, I will build these tables with some other class or use some other technique. Another limitation I found in this is if sufficient space is not allocated to a bucket, then data corruption can happen. All these are due to fixed sizes.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)