Alchemy: BitLists Mk1

Paul M Watt

0.00/5 (No votes)

12 Feb 2015CPOL8 min read

4.8K

Alchemy: BitLists Mk1

A continuation of a series of blog entries that documents the design and implementation process of a library. The library is called Network Alchemy[^]. Alchemy performs low-level data serialization. It is written in C++ using template meta-programming.

At this point, I was still in the make it work phase of implementation. However, I learned quite a bit from experiencing the progression of design and multiple implementations. So much that I thought it was valuable enough to share.

After the initial creation of the Datum type, it was relatively simple to abstract the design and implement the Bit-Field object. The next step was to combine the collection of BitField objects into a single group that could be statically-verified as type-safe. This was also relatively simple. Again I just had to model the implementation of the message structure. I called this collection a BitList. Primarily to differentiate it from the std::bitset.

The real trouble appeared when I began the integration of the BitList with the Message in a way that was similar to the Datum. At this point, I started to see the potential this library could provide. Through experience, I have learned that it is important to get the programming interface correct in the initial design. Any mistakes that were made, or code that is not as clean as you would like can be refactored or even re-implemented behind the original interface.

The BitList

My initial approach was straight-forward. As meta-programming was new to me, I looked for opportunities to take advantage of the compiler to verify invariants wherever I could apply them. Therefore, there is a type_check that verifies the number of bits specified by the user does not exceed the capacity of the data type used for storage.

C++

template < typename BaseT,
            size_t B0,   size_t B1=0, size_t B2=0, size_t B3=0,
            size_t B4=0, size_t B5=0, size_t B6=0, size_t B7=0
          >
struct BitSet
  // Are you receiving a compiler Error: 
  //    'Hg::type_check< false >' : base class undefined?
  // The sum of the provided bits must be &lt;= sizeof(T) in bits
  : type_check< (B0+B1+B2+B3+B4+B5+B6+B7)
              <=(sizeof(typename BaseT::value_type)*8)>
  , public BaseT
{...

I thought the best approach at the time was to store a reference to the primary value, to which all of the BitFields referred. This was an attempt to avoid a mechanism where the child calls the parent for the value then the operation is performed. I was focused on the general usage of the object, and didn't pay too much attention to the construction of this type. As you can see, the construction of this BitList is quite expensive compared to the amount of information that is encoded in the type.

C++

//  Typedef **************************************
typedef typename
  BaseT::value_type                      value_type;

BitList(value_type &data_field)
  : m_data(data_field)
  , m_field_0(GetFieldAddress< 0 >(m_field_0))
  , m_field_1(GetFieldAddress< 1 >(m_field_1))
  , m_field_2(GetFieldAddress< 2 >(m_field_2))
  , m_field_3(GetFieldAddress< 3 >(m_field_3))
  , m_field_4(GetFieldAddress< 4 >(m_field_4))
  , m_field_5(GetFieldAddress< 5 >(m_field_5))
  , m_field_6(GetFieldAddress< 6 >(m_field_6))
  , m_field_7(GetFieldAddress< 7 >(m_field_7))
{
  m_field_0.attach(m_data);  // The assignment in the
  m_field_1.attach(m_data);  // constructor assigns the
  m_field_2.attach(m_data);  // reference of data_field to
  m_field_3.attach(m_data);  // the data member, m_data.
  m_field_4.attach(m_data);
  m_field_5.attach(m_data);
  m_field_6.attach(m_data);
  m_field_7.attach(m_data);
}

There are two parts to the previous section of code, the initializer list, and the constructor body. The initializer list is used to associate address of the <var>BitField</var> object with the local reference held by the <var>BitList</var> container. This is how the container is able to enumerate over the values in the class. The second part attaches a reference of the internal data field with the child fields.

Because of this twisted setup, the majority of what remains is simple and looks much like the other value classes I have previously shown. The one oddity that remains is a set of static constants defined to help navigate the set of sub-fields. The code below depends on the knowledge of the size and relative offset for each value.

C++

static const size_t k_size_0   = B0;
  static const size_t k_offset_0 = 0;

  static const size_t k_size_1   = B1;
  static const size_t k_offset_1 = value_if< (B1>0), size_t, k_size_0, 0 >::value; 

  static const size_t k_size_2   = B2;
  static const size_t k_offset_2 = value_if< (B2>0 ), size_t, 
                                            k_offset_1 + k_size_1, 
                                            0>::value; 
  static const size_t k_size_3   = B3;
  static const size_t k_offset_3 = value_if< (B3>0 ), size_t, 
                                            k_offset_2 + k_size_2,
                                            0>::value; 
  // ... continue through 7

Those definitions allow the actual value types fields to be defined with the code below:

C++

value_type                                     &m_data;

  Bit_field< k_offset_0, B0, value_type>          &m_field_0;
  Bit_field< k_offset_1, B1, value_type>          &m_field_1;
  Bit_field< k_offset_2, B2, value_type>          &m_field_2;
  Bit_field< k_offset_3, B3, value_type>          &m_field_3;
  Bit_field< k_offset_4, B4, value_type>          &m_field_4;
  Bit_field< k_offset_5, B5, value_type>          &m_field_5;
  Bit_field< k_offset_6, B6, value_type>          &m_field_6;
  Bit_field< k_offset_7, B7, value_type>          &m_field_7;

Unfortunately with this design, there will always be eight sub-fields per <var>BitList</var> object, regardless of the actual number of bit-fields that are used. Imagine my surprise when shortly after I released this into the wild, I had requests to support 32 bit-fields per <var>Bitlist</var>. At that point, I started to think of ways that I could eliminate all of these unused references. Unfortunately, I was not able to implement my ideas until the second version of this type. So those ideas will have to wait.

Euphemism

To organize the Hg::BitList similarly to the message required an outer container to be defined for the user type. I am not going to show all of the details because they are not that interesting. Also, the current implementation for this feature in Alchemy barely resembles this concept at all. Here is a high-level overview of it's original structure.

Essentially, the MACRO below is a euphemism for the morass of partially defined parameterized errors waiting to be compiled.

C++

// MACRO declaration of a BitList
HG_BEGIN_BIT_LIST(uint32_t, info)
  HG_BIT_FIELD(0, 4, first)
  HG_BIT_FIELD(1, 4, second)
  HG_BIT_FIELD(2, 4, third)
  HG_BIT_FIELD(3, 4, fourth)
  HG_BIT_FIELD(4, 4, fifth)
  HG_BIT_FIELD(5, 4, sixth)
  HG_BIT_FIELD(6, 4, seventh)
  HG_BIT_FIELD(7, 4, eighth)
HG_END_BIT_LIST

The name <var>info</var> defines a new BitList type the user will place in the type-list definition that defines the overall message format.

In the end, I spent much more time on this section of code that I am comfortable admitting. So let me just show you a few snippets of what the MACRO definition hides and move away from this.

C++

// ...
// The header starts the declaration of a new 
// container type that becomes the BitList.
struct C                                                 \
    : public Base_bit_list                                 \
  {                                                        \
    typedef C                             this_type; \
    typedef T                             value_type;  \
    typedef Bit_field<0,0,value_type>     nil_bits_t;\
                                                          \
    enum { k_offset_0 = 0 };                               \
                                                          \
    template < typename IndexT,                           \
              typename BitT>                                \
    BitT& GetField(const BitT &)                         \
    { return GetFieldData(BitT()); }                       \
                                                           \
    nil_bits_t& GetFieldData(const nil_bits_t&)            \
    {                                                      \
      static nil_bits_t nil_bits;                          \
      return nil_bits;                                     \
    }

For each field that is defined, a collection of definitions is created by the MACRO that is similar to this:

C++

typedef FieldIndex< 2, this_type,4> idx_2;
typedef Bit_field  < k_offset_2, 4, value_type > third_t;
enum { k_offset_3 = k_offset_2 + N };
third_t   third;
third_t& GetFieldData(const third_t&)
{ return P; }

The definition above declares a new Hg::BitField type that meets that criteria for the proper index, offset and size of that field. A new instance of that time is defined with the name the user has specified, and a function is provided that returns a reference to that field. The function requires a const reference of an object of the same type as the desired field (I have since realized that pointers are much less expensive than const references and just as effective). Since all of the fields have different indices and offsets, there will be no problems with duplicate types.

The type definitions use a well-defined format that can be programmatically indexed. Therefore, developing functions to operate over the fields will be possible. However, there is still more work to integrate this type into the Alchemy framework.

Type-deduced Definitions

Throughout the development of this project, the template definitions continued to grow longer and more complex. In many cases, they weren't that difficult to deal with at all. Rather, they were tedious and required the parameterized types to be propagated through the depths of the interfaces to which they applied. It didn't take long for me to recognize that I could wrap all of the complex definitions into parent-type, and reference my types much like the meta-functions I have been creating. The only difference is in this case I truly do want a simpler definition of the type.

Here is what the definition of a Hg::BitSet with 8 fields that each used 4 bits would look like:

C++

Hg::BitSet< uint32_t, 4,4,4,4,4,4,4,4 >   info;

Because the type information is packaged in the convenient message type-list, all of the details required to reconstruct the Hg::BitList definition is still available, even when packaged like this:

C++

template< size_t    Idx,
          typename  format_type>
struct Format_bitset
{
  typedef 
    typename Hg::TypeAt < Idx,
                          format_type
                        >::type                   base_t;

  typedef Hg::BitSet< base_t,
                      base_t::idx_0::k_size,
                      base_t::idx_1::k_size,
                      base_t::idx_2::k_size,
                      base_t::idx_3::k_size,
                      base_t::idx_4::k_size,
                      base_t::idx_5::k_size,
                      base_t::idx_6::k_size,
                      base_t::idx_7::k_size
                    >                             type;
};

The type-list and index make the Hg::BitList much more accessible, considering all of the other utility functions and definitions that have been created to support the fundamental types.

Integrating as a Datum

At first glance, this does not appear to be a difficult issue to resolve. The primary issue is we need to make the Hg::BitList compatible with the existing Hg::Datum type. For a basic template, it would already meet that requirement. Through the use of static-polymorphism, a type only needs to contain the same member data and functions for all of the types to be compatible in an algorithm. They are not required to be tied together through inheritance, which is the basis of dynamic-polymorphism.

The difficulty I ran into, was the logic that I had created to programmatically navigate each field of a message were specialized for the Hg::Datum type. To specialize on a second distinct type would cause ripples through the entire API, and cause almost complete duplicate implementations for alternative specialization. This seems to defeat the purpose of templates. So I searched for a solution to bring the two types together.

BitListDatum

The solution I used for this first version was to create a class, Hg::BitListDatum, which derived from both of these classes. Here is the declaration of the type:

C++

template< size_t IdxT,
          typename format_type
        >
struct BitListDatum
  : public Hg::Format_bitlist< IdxT, format_type >::type
  , public Hg::Datum< IdxT, format_type >
{ };

The purpose of this class is to become a shallow adapter that is compatible with the definitions and types already defined for the Hg::Datum. The only challenge that I ran into was creating a definition that would allow the Hg::BitList type to be returned for the user to interact with, but still be a Hg::Datum. This is what I created:

C++

typedef typename
  Hg::Datum < Idx,
              format_type,
              StoragePolicy,
              kt_offset
            >               base_type;

operator base_type&()
{
  return *static_cast< base_type *>(this);
}
operator value_type() const
{
  return static_cast< const base_type*>(this)->operator value_type();
}

The base_type operator properly handles the interactions where a reference to the Hg::Datum type is required. In all other cases, the user can interact with the value_type operator that returns access to the child Hg::BitField objects by name.

Summary

After dynamically-sized arrays, the support of bit-fields has been the most tedious feature to implement in Alchemy. In fact, I do not think implementing support for arrays would have gone nearly as smoothly for me if I hadn't learned the lessons from implementing bit-field support.

Bit-fields are a feature that are not used much. However, they tend to appear quite frequently when dealing with network protocols or data-storage formats. Once you consider that the bit-field feature is not well defined for C and C++, it becomes clear that an alternate solution for providing bit-level access to the data is important. Mask and shift code is tedious and error prone, not to mention difficult to read if you do not see it often. Considering that the simplification of software maintenance is one of the primary goals of Alchemy, providing a feature such as the Hg::BitList is well worth the effort.

Original post blogged at Code of the Damned.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)