Introduction
I got inspired by the NO SQL and wanted to start writing one myself, I saw few very good open source C++ NO SQL like “MongoDB” and “levelDB” and wanted to write a key value store myself. I wanted to start off with a minimalistic set of goals to start and complete in a weekend. So with these parameters, I am posting this one. I hope it will be helpful to someone who wants to develop key store as a beginner.
Background
There are many ways to create a table and access it, each one has its own advantages and disadvantages. Here, I am using a hash function to access a record in a file. Other methods include maintain index tables, sequential access, b+trees, persistent pointers, etc.
Using the Code
I have used a simple logic to construct the persistent hash table. If we define number of buckets, their size and record size, it is easy to store and retrieve. When we want to write a record or read a record, key is passed to the hash function, with the hash value, the bucket offset is derived. Then we need to jump to that offset and write the record or read the record. Before doing that, a check is made if a record is already present in that location, if present then we jump to the next record and then that record is written.
CHash(long nBuck,long bucksize,long recsize)
: is one of the overloaded ctor where we pass number of buckets, bucket size and each record size (table size). For this table, key should be long. CHash(long nBuck,long bucksize,long recsize,unsigned long (*uh)(void *))
: is another ctor
where user defined hash function pointer can be passed. key can be user defined. createdb
: creates the hdb file with the name given, writes and reads can be made. writedb
: opens the file for writing readdb
: opens the file for reading closedb
: closes the file handle writerec
: write record has two overloads, one for key type long and another for user defined key type, writes user defined data, writes data into the bucket if record is present (i.e. collision occurred) then it jumps to the next record until it finds a vacant place to write. readrec
: readrec
has two overloads, one for key type long and another for user defined key type, reads user defined data. Reads the record from the bucket offset, if the value searched is different, then readnextrec is used to fetch the next record with the offset taken from previous readrec
call. readnextrec
: as mentioned above, this needs to be used to read consecutive elements in the same bucket by passing in the offset from readrec. writeoffset,readoffset,GetbucketOffset
are the helper functions to move around the file for writing and reading.
int CHash::GetbucketOffset(long buckno){
int offset = 1;
offset = buckno * m_bucketsize;
return offset;
}
int CHash::writerec(void* key,void *data)
{
long buckno = GetbucketOffset(hash(key));
void *temp = malloc(m_recsize);
while(0 != readoffset(buckno,temp,m_recsize))
{
buckno += m_recsize; }
free(temp);
writeoffset(buckno,data,m_recsize);
return 0;
}
int CHash::writerec(void* key,void *data)
{
long buckno = GetbucketOffset(hash(key));
void *temp = malloc(m_recsize);
while(0 != readoffset(buckno,temp,m_recsize))
{
buckno += m_recsize; }
free(temp);
writeoffset(buckno,data,m_recsize);
return 0;
}
int CHash::readrec(long key,void *data,long &offset)
{
long buckno = GetbucketOffset(hash(key));
readoffset(buckno,data,m_recsize);
offset = buckno;
return 0;
}
int CHash::readrec(void* key,void *data,long &offset)
{
long buckno = GetbucketOffset(hash(key));
readoffset(buckno,data,m_recsize);
offset = buckno;
return 0;
}
int CHash::readnextrec(long offset,void *data)
{
offset += m_recsize;
readoffset(offset,data,m_recsize);
return 0;
}
Testhash.cpp is written to give a demo of how to use this class.
Limitations
May be in my next article, I can improve on the pre-definition of the size for buckets and tables. Instead, I will build these tables with some other class or use some other technique. Another limitation I found in this is if sufficient space is not allocated to a bucket, then data corruption can happen. All these are due to fixed sizes.