Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

Bits Array Encapsulation

0.00/5 (No votes)
12 Dec 2004 1  
Encapsulate all bit stream operations in a class to handle all or most of bit stream functions.

Introduction

The Bit Array structure provides a compacted arrays of Booleans, with one bit for each Boolean value. A 0 (1) bit corresponds to the Boolean value false (true), respectively. We can look at a stream of bytes as a stream of bits; each byte contains 8 bits, so any n bytes hold n*8 bits. And the operation to manipulate this stream or bits array is so easy, jut read or change the bits' state or make any Boolean operation on the whole bits array, like �AND�, �OR�, or �XOR�.

Sample Image

As each byte contains 8 bits, we need to divide the bit number by 8 to reach the byte that holds the bit. Then, we can seek to the right bit in the reached byte by the remainder of dividing the bit number by 8. So to read or change the bit state, the operations will be like that:

    // returns bit state 0 or 1

    #define GetBit(a, b)                ((((BYTE*)a)[(b)>>3] >> ((b)&7)) & 1)
    // set bit Boolean value to 1

    #define SetBit(a, b)                (((BYTE*)a)[(b)>>3] |= (1 << ((b)&7)))
    // set bit Boolean value to 0

    #define ResetBit(a, b)              (((BYTE*)a)[(b)>>3] &= ~(1 << ((b)&7)))
    // toggle bit Boolean value

    #define XOrBit(a, b)                (((BYTE*)a)[(b)>>3] ^= (1 << ((b)&7)))
    
    // where 'a' is the byte stream pointer, and b is the bit number.

Note that to divide by 8, we need only to shift right by 3 (>>3), and to get the remainder of dividing by 8, we need only to AND with 7 (&7).

Bits Array Encapsulation:

To deal with the bits stream in more sophisticated operations like get �1�s count in the bits stream, or get the order of any bit that includes �1� value in the �1�s values, or AND all bits stream with another bits stream, it�ll be so hard to do all operation steps each time in the code. So, it is needed to encapsulate all bits stream operations in a class to handle all or most of bits stream functions. So, I did that in a simple class and named it CBitArray.

    class CBitArray
    {
        ...
    }

This class includes some useful functions like:

    // to get '1' bits count in the stream of bits

    GetCount()
    // to get the state of any bit '0' or '1'

    GetAt() 
    // to set the state of any bit to '1',

    // with dynamically allocating class internal buffer

    SetAt() 
    // to reset the state of any bit to '0'

    ResetAt()
    // to xor the state of any bit, with dynamically

    // allocating class internal buffer if needed

    XOrAt()    
    // to keep the bits stream in a compressed form

    // to save memory in cases of larg bitmaps

    Compress() 
    // to decompress bitmap to its normal state buffer

    Decompress()

All boolean operators can be applied with any two CBitArray objects (&|^). You can enjoy your time using this class and it has very fast and useful functions. I added many tricks to make it so fast like GetCount function which includes a very good trick to get the count in a very direct way without iterating each bit value; just keep all bits count of all byte values in an array, and just by each byte value, we can get bits count from direct array like this code:

int CBitArray::GetCount()
{
    if(m_nCount == -1)
    {
        static BYTE bitCount[256] = { 
            0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4,1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,
            1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,
            1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,
            2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7,
            1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,
            2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7,
            2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7,
            3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7,4,5,5,6,5,6,6,7,5,6,6,7,6,7,7,8 };
        BYTE by;
        m_nCount = 0;
        for(int nByte = 0; nByte < m_nLength; nByte++)
            if(by = m_pBuffer[nByte])
                m_nCount += bitCount;
    }
    return m_nCount;
}
// or

int CBitArray::GetCount()
{
    if(m_nCount == -1)
    {
        static BYTE bitCount[16] = { 0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4 };
        BYTE by;
        m_nCount = 0;
        for(int nByte = 0; nByte < m_nLength; nByte++)
            if(by = m_pBuffer[nByte])
                m_nCount += by == 0xff ? 8 : bitCount + 
                bitCount[(by&0xf0) >> 4];
    }
    return m_nCount;
}
// or (thanks for Don Clugston)

int CBitArray::GetCount()
{
    if(m_nCount == -1)
    {
        BYTE by;
        m_nCount = 0;
        for(int nByte = 0; nByte < m_nLength; nByte++)
            if(by = m_pBuffer[nByte])
            {
                // add neighbouring bits. Each bit is 0 or 1.

                by = ((by&0xaa)>>1) +(by&0x55);
                // now each two bits of 'by' is a number 00,01 or 10.

                // now add neighbouring pairs

                by = ((by&0xcc)>>2) + (by&0x33);
                // now each nibble holds 0000-0100.

                // now add the nibbles

                m_nCount += by + (by>>4);
            }
    }
    return m_nCount;
}

The best thing that is done in this class is the indexing of the one's in the bits array, that is done by indexing all one's locations in the bitmap zero based, and keeping these indices in an array, taking in consideration that if number of '1's exceeds a certain value (SEG_COUNT), the indexing will be half index and the operation of seeking for any '1' index will be delayed a little bit to reach the right index that is not indexed depending on the previous indexed '1'; you can check that in the function Index() in the class code.

#define     SEG_COUNT        10240

void CBitArray::Index()
{    // (^_^) 21/2/2004 hmh

    if(GetLength() == 0)
        return;
    // calculate number of ones that will be include in each index

    m_nBitSeg = GetCount()/SEG_COUNT + 1;
    if(m_nIndexes)
        free(m_nIndexes);
    // allocate buffe of the indices array

    m_nIndexes = (int *)malloc(sizeof(int)*(m_nCount/m_nBitSeg+1));
    m_nIndexesCount = m_nCount = 0;
    BYTE by;
    // loop in the bitmap buffer to index '1's locations

    for(int nBit, nByte = 0; nByte < m_nLength; nByte++)
        // copy buffer byte into by and check if it is not 0

        if(by = m_pBuffer[nByte])
        {    // get bit number by multiply by 8 (or left shift by 3)

            nBit = nByte<<3;
            while(by)
            {    // if the first bit in the byte is '1'

                if(by&1)
                    // check if the bit in the head of the index

                    if(m_nCount++ % m_nBitSeg == 0)
                        // add this bit to the indices

                        m_nIndexes[m_nIndexesCount++] = nBit;
                // shift right to move second bit to the byte head

                by >>= 1, nBit++;
            }
        }
}

So, at any time after initializing the bitmap, you can get any '1' index bit, or get any bit index using the functions:

    int CBitArray::GetIndexBit(int nIndex);
    int CBitArray::GetBitIndex(int nBit);

The previous two functions contain a very complicated code, specially GetBitIndex. It is so difficult to describe it here, but any one can contact me for more details about bits array operations. And I'll write an article about bits array compression soon (run length), its code is included in the zip file, but it is not described here to not complicate this article.

CBitArray and STL bitset

There are a lot of differences between CBitArray and STL bitset:

  1. CBitArray can change its length (number of bits) dynamically during run time through many functions like: SetLength(), SetAt(), and others. So it can change dynamically while setting any bit out of its range, while bitset has a specific size that is fixed at compile time in accordance with the size specified by the template parameter N when the bitset<N> is declared.
  2. CBitArray can hold any size depending on the available memory, and can be in a compressed format to be compacted in the memory:
    CBitArray bitArray(true);

    And the function SetAt(int nBit) checks for the internal flag m_bCompressed and sets in bit in the compressed buffer, which is a very complicated operation. You can check its code in the static function SetAt:

    static bool SetAt(BYTE *&src, int &nSrcLen, int nBit);

    which can set nBit in the compressed input buffer.

  3. CBitArray supports all bitset functions and more except shift operators and to_string, as I didn't find any benefit in them.

  4. CBitArray indexes its bits to make it so fast to get any '1' index or any index bit, and to iterate through its '1's, while the bitset class does not have iterators and is not a Standard Template Library container.
    // example for fast iterate CBitArray '1's
    
    int nCount = bitArray.GetCount();
    for(int nIndex = 0; nIndex < nCount; nIndex++)
    {
        int nBit = bitArray.GetIndexBit(nIndex);
        // use the bit
    
    }

CBitArray Usage

// initialize CBitArray object

CBitArray a;
// set the bit 4578 in the bit array buffer (at byte 4578/8, at bit 4578%8)

a.SetAt(4578);
// set the bit 323 in the bit array buffer (at byte 323/8, at bit 323%8)

a.SetAt(323);
// get the count of '1's

int nCount = a.GetCount(); // return 2

// xor bit number 323

a.XOrAt(323);
// get the count of '1's

nCount = a.GetCount(); // return 1

// initialize CBitArray object

CBitArray b;
// attach buffer which is allocated with some bytes

b.Attach(buffer, length); 
// AND b with a

b &= a;
// initialize CBitArray object with the value of b

CBitArray c = b;
// ...

// and so on

Source code files

BitArray.cpp, BitArray.h, mem.cpp, mem.h

Thanks to...

I owe a lot to my colleagues for helping me in implementing and testing this code. (JAK)

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here