Introduction
The Standard Template Library bitset
class is a useful class suitable for most bitset operations, but if you are dealing with lots of large bitsets, then it may be that the STL bitset
class is missing a trick.
In many applications using bitsets, you will find that a typical bitset is either sparsely populated with bits, or conversely almost fully set. In these cases, a large bitset can use a lot of memory for what is actually being represented.
To overcome this, we have developed the CMJBitset
class as a plug in replacement for bitset. The CMJBitset
class depending on compilation options may take as little as 7 bytes to represent a bitset of any size, assuming all the bits are set or reset. In comparison, a 1024 bitset will take 128 bytes. In essence, the CMJBitset
operates by run length encoding a bitset if the bitset is either almost all set/reset, but otherwise uses the STL bitset
class. We will refer to the different encodings as normal, sparse and full.
Using the code
General Usage
To use the CMJBitset
class should be simple, replace bitset<N>
in your code with CMJBitset<N>
and include the CMJBitset.h header file. Since CMJBitset
presents the same interface as bitset
then this should be enough to get things working.
Loading/Saving the bitset
The class would not give a fraction of its potential benefit if you were unable to save and load the bitset to and from file. Hence member functions:
bool load(FILE *)
and:
bool save(FILE *)
are provided, but perhaps more to the point, all members of the class are public, enabling you to load and save at will.
Compilation Options
If you examine MJBitset.h, then there are a number of compilation options.
For debugging
This is used by the example project to test the CMJBitset
class. I would also recommend that when trying out the CMJBitset
class, you do some testing with this as well. As of the current date, a "full coverage" test has not been carried out on the class, and hence bugs are a definite possibility.
For maximum bitset size you need to deal with
The default is for a maximum of 0xffff.
#define CMJBITSET_USE_SHORT
The example project
The example project is a simple Win32 console application that attempts to regress the CMJBitset
by creating a succession of bitmaps with random population levels and performing a reasonably comprehensive set of operations on the bitsets.
You may easily alter the regression test to compare performance between CMJBitset
and bitset
.
Points of Interest
Performance
The performance of the CMJBitset
class, is quite hard to quantify. For bitsets that it does not encode as they are neither full or sparse, the CMJBitset
class is clearly several times slower than bitset
.
If you run the example regression test and make it use sparse bitsets, then the times will still be a little faster than bitset
. But the comparative speed will dramatically increase once the operations cease to be completely in memory either due to paging or loading/saving to files.
We have tested the class in the WhereWasI product and saw dramatic reductions in memory usages and very significant speed increases.
It is also notable that some operations with CMJBitset
are dramatically faster than bitset
, flip()
, for example, simply needs to invert the full and sparse representation type.
At the top of the source file, there are a number of compilation options which affect the way CMJBitset
allocates memory, the key option is set by default and reserves space within the class for bitsets with up to 4 bits set/unset. This often avoids the overhead of time and memory of using malloc
/free
, and in our examples, is a definite performance bonus. Your mileage may vary and you may want to tune performance by changing some of these values.
History
- June 14th 2004 - 1.00 released.
- June 18th 2004 - 1.01 released. Contains a number of optimizations