This is an entry for the continuing series of blog entries that documents the design and implementation process of a library. This library is called, Network Alchemy[^]. Alchemy performs data serialization and it is written in C++. This is an Open Source project and can be found at GitHub.
Previously, I posted the first prototype that demonstrates that the concept of Alchemy is both feasible and useful. However, the article ended up being much longer than I had anticipated and was unable to cover serializing the user object to and from a data stream. This entry will finish the prototype by adding serialization capabilities to the prototype for the basic datum fields that have already been specified.
Message Buffer
One topic that has been glossed over up to this point is how is the memory going to be managed for messages that are passed around with Alchemy. The Alchemy message itself is a class object that holds a composited collection of Datum
fields convenient for a user to access, just like a struct
. Unfortunately, this format is not binary compatible or portable for message transfer on a network or storage to a file.
We will need a strategy to manage memory buffers. We could go with something similar to the standard BSD socket API and require that the user simply manage the memory buffer. This path is unsatisfying to me for two reasons:
- BSD sockets ignore the format of the data and simply setup end-points as well as read/write capabilities.
- Alchemy is an API that handles the preparation of binary data formats to create ABI compatible data-streams.
Ignoring the memory buffer used to serialize the data would only provide a marginal service to the user, however, not enough to be compelling for this to be a universal necessity when serializing data. Adding a memory management strategy to Alchemy would only require a small amount of extra effort on our part, yet provide enormous value to the user.
Considerations
It will be possible for us to create a solution that is completely transparent to the user, with respect to memory management. The Message
object could simply hide the allocations and management internally. A const shared_ptr
could be given to the user once they call an accessor function like data()
. However, experience has shown that often times developers have already tackled the memory management on their own.
Furthermore, even if they have not yet tackled the memory management problem, the abstractions that they have created around their socket and other transport protocols has forced a mechanism upon a user. Therefore, I propose that we develop a generic memory buffer. One that meets our immediate needs of development, and also provides flexibility to integrate other strategies in the future.
The Basics
There are four operations that must be considered when memory management is discussed. "FOUR?! I thought there were only two!" Go ahead and silently snicker at the other readers that you know made that exclamation because you were aware of the four operations:
- Allocation
- De-allocation
- Read
- Write
It's very easy to overlook that read and write must be considered when we discuss memory allocation. Because if we simply talk in terms of malloc/free
, new/delete
, or simply new
for JAVA and C#, you allocate a buffer, and reads and writes are implicitly built into the language. This only is only true for the fundamental types native to the language.
However, when you create an object, you control read and write access to the data with accessory functions for the specific fields of your object. In most cases, we are interested in keeping the concept of raw memory abstract inside of an object. We are managing a buffer of memory, and it is important for us to be able to provide proper access to appropriate locations within the buffer that correspond to the values advertised to the user through the Datum
interfaces.
That brings to mind one last piece of information that we will want to have readily available at all times, the size of the buffer. This is true whether we choose a strategy that uses a fixed size block of buffers, dynamically allocate the buffers, or we adapt a buffer previously defined by the user.
The Policy Design Pattern
Strictly speaking, this is better known as the Strategy design pattern. I am sure there are other names as well, probably as many as there are ways to implement it. We are developing in C++, and this solution is traditionally implemented with a policy-based design. We want to create a memory buffer object that is universal to our message implementation in Alchemy. So far, we have not provided any hint of a special memory object to deal with in the Alchemy interface. I do not plan on changing this either.
However, we have already established there are multiple ways that memory will be used to transfer and store data. A Policy-based design will allow us to implement a single object to perform the details of managing a memory buffer and providing the correct read/write access, and still allow the user to integrate their own memory management system with Alchemy. This design pattern is an example of the 'O' in the SOLID object-oriented methodology. The 'O' represents Open for extension, closed for modification.
In order for a user to integrate their custom component, they will be required to implement a policy class to map the four memory management functions mentioned above to a standard form that will be accessed by our memory buffer class. A policy class is a collection of constants and static
member functions. Generally a struct
is used because of its public
by default nature. The class that is extended expects a certain set of functions to be available in the policy type. The policy class is associated with the extended class as a template parameter. The only requirement is the policy class implements all of the functions and constants accessed by the policy host.
Policy Declaration
Here is the declaration for an Alchemy storage policy:
struct StoragePolicy
{
typedef unsigned char data_type;
typedef data_type* pointer;
typedef const data_type* const_pointer;
typedef std::shared_ptr< data_type > s_pointer;
static
s_pointer allocate(size_t size);
static
void deallocate(s_pointer &spBuffer)
static
bool read ( const_pointer pBuffer,
void* pStorage,
size_t size,
std::ptrdiff_t offset)
static
bool write( pointer pBuffer,
const void* pStorage,
size_t size,
std::ptrdiff_t offset)
}:
The typedef
s can be defined to any type that makes sense for the users storage policy. The class doesn't even need to be named or derived from StoragePolicy
, because it will be used as a parameterized input type. The only requirement is that the type does support all of the declarations defined above. When this is put to use, it becomes an example of static polymorphism. This is the foundation that most of The C++ Standard Library (formerly STL) is built upon. The polymorphism is invoked implicitly rather than explicitly by way of deriving from a base class and overriding virtual
functions.
Policy Implementation
At this point, I am only concerned with leaving the door open to extensibility without major modifications in the future. That is my front-loaded excuse for why the implementation to these policy interface functions are so damn simple. Frankly, this code was original implemented inline
with the original message buffer class. I thought that it would be better to introduce this policy extension now, so that some other decisions that you will see in the near future make much more sense. Don't blink as you scroll down, or you may miss the implementation for the functions of the storage policy below:
Allocate
static
s_pointer allocate(size_t size)
{
s_pointer spBuffer =
std::make_shared(new(std::nothrow) data_type[size]);
return spBuffer;
}
Deallocate
static
void deallocate(s_pointer &spBuffer)
{
spBuffer.reset();
}
Read
static
bool read ( const_pointer pBuffer,
void* pStorage,
size_t size,
std::ptrdiff_t offset)
{
::memcpy( pStorage,
pBuffer + offset,
size);
return true;
}
Write
static
bool write( pointer pBuffer,
const void* pStorage,
size_t size,
std::ptrdiff_t offset)
{
::memcpy( pBuffer + offset,
pStorage,
size);
return true;
}
Message Buffer (continued)
I have covered all of the important concepts related to the message buffer, basic needs, extensibility and adaptability. There isn't much left except to present the class declaration and clarify any thing particularly tricky within the implementation of the actual class. Keep in mind this is an actual class, and we don't intend on providing direct user access to this particular object. The Alchemy class Hg::Message
will be the consumer of this object.
Class Definition and Typedefs
typedef
s are extremely important when practicing generic programming techniques in C++. They provide the flexibility to substitute different types in the function declarations. In some cases, the types defined may seem silly, such as the size_type
fields used in the STL. However, in our case, the definitions for data_type
, pointer
and const_pointer
become invaluable.
If it isn't obvious, the policy class that we just created is used as the template parameter below for the MsgBuffer
. You will see further below in the function implementations that I display how the calls are make through the policy. We declared the functions static
, therefore there is no need to create an instance of the policy.
One last note: Starting with C++11, the ability to alias definitions is preferred over the typedef
. There are many advantages, some of which include partially defined template aliases, a more intuitive definition for function pointers, and the compiler preserves the name of the aliased type. Preservation of the type in the compiler error messages goes a long way towards improving the readability of template programming errors, especially template meta-programming errors.
template < typename StorageT>
class MsgBuffer
{
public:
typedef StorageT storage_type;
typedef typename
storage_type::data_type data_type;
typedef typename
storage_type::s_pointer s_pointer;
typedef typename
storage_type::w_pointer w_pointer;
typedef data_type*. pointer;
typedef const data_type* const_pointer;
};
Construction
MsgBuffer();
. explicit
MsgBuffer(size_t n);
MsgBuffer(const MsgBuffer& rhs);
~MsgBuffer();
MsgBuffer& operator=(const MsgBuffer& rhs);
Status
For a construct like the message buffer, I like to use functions that are consistent with the naming and behavior of the standard library. Or if my development fits closer in context to some other API, I will select names that closely match the primary environment that most closely matches the code.
bool empty() cons;
size_t capacity() const;
size_t size() const;
void clear();
void resize(size_t n);
void resize(size_t n, byte_t val);
MsgBuffer clone() const;
const_pointer data() const;
Basic Methods
There was one mistake, actually, learning experience that I acquired during my first attempt with this library. I did not provide a simple way for users to directly initialize an Alchemy buffer, from a buffer of raw memory. When in many cases, that is how their memory was managed or accessible to the user. I encouraged and intended for users to develop StoragePolicy
objects to suite their needs. Instead, they would create convoluted wrappers around the main Message
object to allocate and copy data into the message construct.
This time, I was sure to add an assign
operation that would allow the initialization of the internal buffer from raw memory.
void zero();
void assign(const_pointer pBuffer, size_t n);
std::ptrdiff_t offset() const;
void offset(std::ptrdiff_t new_offset);
I would like to briefly mention the offset()
property. This will not be used immediately, however, it becomes useful once I add nested Datum
support. This will allow a message format to contain sub-message formats. The offset
property allows a single MsgBuffer
to be sent to the serialization of sub-structures without requiring a distinction to be made between a top-level format and a nested format. When this becomes more relevant to the project, I will elaborate further on this topic.
Getting Values
This function deserves an explanation. This is a template member-function. That means this is a parameterized member function, a function that requires template type-definitions. An instance of this function will be generated for every type that is called against it.
This function provides two values beyond allowing data to be extracted.
- A convenient interface is created for the user to get values without a typecast.
- Type-safety is introduced with this type specific function. All operations on the value can have the appropriate type associated with it up through this function call. This call performs the typecast to a
void*
at the final moment when data will be read into the data type.
template < typename T >
size_t get_data(T& value, std::ptrdiff_t pos) const
{
if (empty())
return 0;
std::ptrdiff_t total_offset = offset() + pos;
size_t bytes_read = 0;
if ( total_offset >= 0
&& total_offset + sizeof(value) <= size())
{
bytes_read =
storage_type::read( data(),
&value,
sizeof(T),
total_offset)
? sizeof(T)
: 0;
}
return bytes_read;
}
Setting Values
This function is similar to get_data
, and provides the same advantages. The only difference is this function writes user data to the buffer rather than reading it.
template < typename T >
size_t set_data(const T& value, size_t pos)
{
if (empty())
return 0;
size_t total_offset =
static_cast< size_t >(offset()) + pos;
size_t bytes_written = 0;
size_t total_size = size();
if ( (total_offset >= 0)
&& (total_offset + Hg::SizeOf< t >::value) <= total_size)
{
bytes_written =
storage_type::write ( raw_data(),
&value,
Hg::SizeOf< t >::value,
total_offset)
? Hg::SizeOf< t >::value
: 0;
}
return bytes_written;
}
Summary
I have just presented the internal memory management construct that will be used in an Alchemy Message
. We now have the final piece that will allow us to move forward and serialized the message fields programmatically into a buffer. My next entry on Alchemy will demonstrate how this is done.