Introduction
The Bit Array structure provides a compacted arrays of Booleans, with one bit for each Boolean value. A 0 (1) bit corresponds to the Boolean value false (true), respectively. We can look at a stream of bytes as a stream of bits; each byte contains 8 bits, so any n bytes hold n*8 bits. And the operation to manipulate this stream or bits array is so easy, jut read or change the bits' state or make any Boolean operation on the whole bits array, like �AND�, �OR�, or �XOR�.
As each byte contains 8 bits, we need to divide the bit number by 8 to reach the byte that holds the bit. Then, we can seek to the right bit in the reached byte by the remainder of dividing the bit number by 8. So to read or change the bit state, the operations will be like that:
#define GetBit(a, b) ((((BYTE*)a)[(b)>>3] >> ((b)&7)) & 1)
#define SetBit(a, b) (((BYTE*)a)[(b)>>3] |= (1 << ((b)&7)))
#define ResetBit(a, b) (((BYTE*)a)[(b)>>3] &= ~(1 << ((b)&7)))
#define XOrBit(a, b) (((BYTE*)a)[(b)>>3] ^= (1 << ((b)&7)))
Note that to divide by 8, we need only to shift right by 3 (>>3), and to get the remainder of dividing by 8, we need only to AND with 7 (&7).
Bits Array Encapsulation:
To deal with the bits stream in more sophisticated operations like get �1�s count in the bits stream, or get the order of any bit that includes �1� value in the �1�s values, or AND all bits stream with another bits stream, it�ll be so hard to do all operation steps each time in the code. So, it is needed to encapsulate all bits stream operations in a class to handle all or most of bits stream functions. So, I did that in a simple class and named it CBitArray
.
class CBitArray
{
...
}
This class includes some useful functions like:
GetCount()
GetAt()
SetAt()
ResetAt()
XOrAt()
Compress()
Decompress()
All boolean operators can be applied with any two CBitArray
objects (&|^). You can enjoy your time using this class and it has very fast and useful functions. I added many tricks to make it so fast like GetCount
function which includes a very good trick to get the count in a very direct way without iterating each bit value; just keep all bits count of all byte values in an array, and just by each byte value, we can get bits count from direct array like this code:
int CBitArray::GetCount()
{
if(m_nCount == -1)
{
static BYTE bitCount[256] = {
0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4,1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,
1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,
1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,
2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7,
1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,
2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7,
2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7,
3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7,4,5,5,6,5,6,6,7,5,6,6,7,6,7,7,8 };
BYTE by;
m_nCount = 0;
for(int nByte = 0; nByte < m_nLength; nByte++)
if(by = m_pBuffer[nByte])
m_nCount += bitCount;
}
return m_nCount;
}
int CBitArray::GetCount()
{
if(m_nCount == -1)
{
static BYTE bitCount[16] = { 0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4 };
BYTE by;
m_nCount = 0;
for(int nByte = 0; nByte < m_nLength; nByte++)
if(by = m_pBuffer[nByte])
m_nCount += by == 0xff ? 8 : bitCount +
bitCount[(by&0xf0) >> 4];
}
return m_nCount;
}
int CBitArray::GetCount()
{
if(m_nCount == -1)
{
BYTE by;
m_nCount = 0;
for(int nByte = 0; nByte < m_nLength; nByte++)
if(by = m_pBuffer[nByte])
{
by = ((by&0xaa)>>1) +(by&0x55);
by = ((by&0xcc)>>2) + (by&0x33);
m_nCount += by + (by>>4);
}
}
return m_nCount;
}
The best thing that is done in this class is the indexing of the one's in the bits array, that is done by indexing all one's locations in the bitmap zero based, and keeping these indices in an array, taking in consideration that if number of '1's exceeds a certain value (SEG_COUNT
), the indexing will be half index and the operation of seeking for any '1' index will be delayed a little bit to reach the right index that is not indexed depending on the previous indexed '1'; you can check that in the function Index()
in the class code.
#define SEG_COUNT 10240
void CBitArray::Index()
{
if(GetLength() == 0)
return;
m_nBitSeg = GetCount()/SEG_COUNT + 1;
if(m_nIndexes)
free(m_nIndexes);
m_nIndexes = (int *)malloc(sizeof(int)*(m_nCount/m_nBitSeg+1));
m_nIndexesCount = m_nCount = 0;
BYTE by;
for(int nBit, nByte = 0; nByte < m_nLength; nByte++)
if(by = m_pBuffer[nByte])
{
nBit = nByte<<3;
while(by)
{
if(by&1)
if(m_nCount++ % m_nBitSeg == 0)
m_nIndexes[m_nIndexesCount++] = nBit;
by >>= 1, nBit++;
}
}
}
So, at any time after initializing the bitmap, you can get any '1' index bit, or get any bit index using the functions:
int CBitArray::GetIndexBit(int nIndex);
int CBitArray::GetBitIndex(int nBit);
The previous two functions contain a very complicated code, specially GetBitIndex
. It is so difficult to describe it here, but any one can contact me for more details about bits array operations. And I'll write an article about bits array compression soon (run length), its code is included in the zip file, but it is not described here to not complicate this article.
CBitArray and STL bitset
There are a lot of differences between
CBitArray
and STL
bitset
:
CBitArray
can change its length (number of bits) dynamically during run time through many functions like: SetLength
(), SetAt
(), and others. So it can change dynamically while setting any bit out of its range, while bitset
has a specific size that is fixed at compile time in accordance with the size specified by the template parameter N
when the bitset<N>
is declared.
CBitArray
can hold any size depending on the available memory, and can be in a compressed format to be compacted in the memory: CBitArray bitArray(true);
And the function SetAt(int nBit)
checks for the internal flag m_bCompressed
and sets in bit in the compressed buffer, which is a very complicated operation. You can check its code in the static function SetAt
:
static bool SetAt(BYTE *&src, int &nSrcLen, int nBit);
which can set nBit
in the compressed input buffer.
CBitArray
supports all bitset
functions and more except shift
operators and to_string
, as I didn't find any benefit in them.
CBitArray
indexes its bits to make it so fast to get any '1' index or any index bit, and to iterate through its '1's, while the bitset
class does not have iterators and is not a Standard Template Library container.
int nCount = bitArray.GetCount();
for(int nIndex = 0; nIndex < nCount; nIndex++)
{
int nBit = bitArray.GetIndexBit(nIndex);
}
CBitArray Usage
CBitArray a;
a.SetAt(4578);
a.SetAt(323);
int nCount = a.GetCount();
a.XOrAt(323);
nCount = a.GetCount();
CBitArray b;
b.Attach(buffer, length);
b &= a;
CBitArray c = b;
Source code files
BitArray.cpp, BitArray.h, mem.cpp, mem.h
Thanks to...
I owe a lot to my colleagues for helping me in implementing and testing this code. (JAK)