While many of the principles of developing robust software are easy to explain, it is much more difficult to know how and when to apply these principles. Practice and learning from mistakes is generally the most productive way to understand these principles. However, it is much more desirable to understand the principles before a flawed system is built; an example from the physical world is the Tacoma Narrows bridge. Therefore, I am starting a journey to demonstrate how to create robust software. This is not a simple task that can be summarized in a magazine article, a chapter in a book or a reference application with full source code.
This journey starts now, and I expect to continue over the next few months in a series of entries. I will demonstrate and document the design and development of a small library intended to improve the quality of network communication software. Developing code for network data transmission is very cumbersome and error prone. The bugs that appear are generally subtle defects that may only appear once you use a new compiler or move to a new operating system or hardware platform. As with any type of software, there are also the bugs that can be introduced by changes. So let's begin by first identifying what we need and problems we are trying to avoid.
Inter-process Communication
The core topic of discussion is Inter-process Communication (IPC). IPC is required when two processes do not share the same memory pool by default. IPC mechanisms are used to communicate between the two separate processes, which takes the form of message transmissions and data transfers. There are many mediums that are used to communicate between two programs such as:
- Shared Memory Pools
- Pipes
- Network Sockets
- Files
Regardless of the medium used to transfer data, the process remains the same for preparing the data. This process is repetitive, mundane, and very error prone. This is the portion of the IPC process that I will focus on first. Let's breakdown the typical tasks that are required to communicate with IPC.
Serialize Structured Data
Data has many forms. A set of numbers, complex classes, raw buffers and bitmap images are examples of information that may be transferred between processes. One thing that is important to remember is that two different programs will most likely structure the data differently. Even though the information is the same, the difference in compilers, languages, operating systems and hardware platforms may represent data in a different way than the program that originally created the data. Therefore, it is very important to convert the data into a well-defined format shared between the two endpoints of the IPC session. A packet format definition is a standard method that is commonly used to define these serialized message formats.
Interpret Serialized Data
Once the data has been transferred and received by the destination endpoint, it must be interpreted. This process is the opposite process of serialization. However, the consumer of the data does not necessarily need to be the same author of the source endpoint that created the data. Therefore, the packet format definition should be used to interpret the data that has been received. A common implementation is to take the data from the binary buffer and assign the values into fields of a structure for ease of access. This sort of approach makes the code much clearer to understand.
Translate Data Messages
Another common operation is to take an incoming message in one format and convert it to a different message format before further processing. This is usually a simple task, however if you have a large number of messages to convert, the task can quickly become mundane. In my experience, mundane tasks that require a large amount of repetitive code are very error prone done to copy/paste errors.
This is a great example of a problem that occurs frequently and a brute force approach is used to solve the problem. This should be a major concern from a maintenance perspective, because this solution generates an enormous amount of code to maintain. There is high probability that errors will be introduced if fundamental changes are ever made to this body of code. to this body of code. I plan on creating a simple and maintainable solution to this problem as part of network alchemy.
Transmit Data
Transmitting the data is the reason we started this journey. Transmission of data is simple in concept, but in practice there are many potential pitfalls. The problems that must be handled depends on the medium used to transfer the data. Fortunately, many libraries have already been written to tackle these issues. Therefore, my primary focus will be to prepare the data for use with an API designed to transfer serialized data. Because of this, it will most likely be necessary to research a few of these APIs to be sure that our library can be integrated with relative ease.
Common Mistakes
We have identified the primary goals of our library. However, I also would like to take into account experience gained from similar projects. If I don't have the personal experience related to a project, I at least try to heed the wisdom offered to me by others. I always have the option to choose later whether the advice is useful or relevant to my situation. In this particular situation, I have both made these mistakes myself as well as found and fixed similar bugs causes by others. I would like to know what I can do to eliminate these issues from appearing in the future.
Byte Order Conversions
Byte order conversion is necessary when data is transferred between two different machines with different endianess. As long as your application or library will only run on a single platform type, this should not be a concern. However, because of the nature that software tends to outlive its original purpose, I would not ignore this issue. Your application will work properly for the original platform, but if it is ever to be run on two different types of platforms, the issues will begin to appear.
Standard convention is to convert numbers to Network byte order (Big Endian format) when transferring over the wire. Data extracted from the wire should then be converted to Host byte order. For Big Endian systems, the general operation is a no-op, and the work really only needs to be performed for Little Endian platforms. A few functions have been created to minimize the amount of duplicate code that must be written.
C/C++
unsigned short htons(unsigned short hostShort);
unsigned long htonl(unsigned long hostLong);
unsigned short ntohs(unsigned short netShort);
unsigned long ntohl(unsigned long netLong);
The issue will appear if the data you transfer uses number fields with two or more bytes for representation when the correct endianess is not used for data transfer. The symptoms that appear can be as subtle as incorrect numbers are received, to more immediate and apparent issues such as the program or thread crashes. To further obfuscate the problem, it is quite common to use the correct conversion functions in most places, however, a single instance is incorrectly converted in the one of potentially hundreds of uses of these functions. During maintenance, this often occurs when changing the size of a field but failing to change the conversion function used to swap the byte order before transport or interpretation.
Abuse of the Type System
Serializing data between different processes requires a predefined agreement, protocol, to dictate what should be sent and received. Data that is read from any source should be questioned and verified before it is accepted and further processed. This is true whether the source is a network communication stream, a file, and especially user input. The type information for the data structures we work with cannot be statically enforced by the compiler until we are able to verify it, and put it back into a well-defined and reliable type. More often than not, I have observed implementations that are content to pass around pointers to buffers and simply trust the contents encoded in the buffer.
Here is a very simple example to illustrate the concept. An incoming message is received and assumed to be one of the types of messages that can be decoded by the switch
statement. The first byte of the message indicates the type of message, and from there, the void*
is blindly cast to the 1 of 256 possible message types indicated by the first byte.
C/C++
void ProcessMsg(void* pVoid)
{
if (!pVoid)
{
return;
}
char* pType = (char*)pVoid;
switch (*pType)
{
case k_typeA:
ProcessMessageA((MessageA*)pVoid);
break;
case k_typeB:
ProcessMessageB((MessageB*)pVoid);
break;
}
}
Any address could be passed into this function, which would then be interpreted as if it were a valid message. This particular problem cannot be completely eliminated simply because eventually the information is reduced to a block of encoded bytes. However, much can be done to both improve the robustness of the code as well as continue to provide convenience to the developer serializing data for transport. It is important that we keep as much type information around through as much of the code as possible. When the type system is subverted, even more elusive bugs can creep into your system. It is possible to create code that appears to work correctly on one system but becomes extremely difficult to port to other platforms, and although the program works on the original system, it may be running at a reduced performance level.
Properly Aligned Memory Access
One of the key features of the C++ is its ability to allow us to develop at a high layer of abstraction or drop down and code right on top of the hardware. This allows it to run efficiently in all types of environments. Unfortunately, when you peek below the abstractions provided by the language, it is important that you understand what you are now responsible for that the language normally provides for you. A memory access restriction is one of these concepts that are often hidden by the language. When these rules are broken, the best you can hope for is decreased performance; more likely an unaligned memory access exception will be triggered by the processor.
Generally a processor will require an address to be aligned along the same granularity based on the size of data to be read. By definition, reading a single byte is always aligned with the appropriate address. However, a 16-bit (2 byte) value must be located at an address that is divisible by a power of two. Similarly a 32-bit (4 byte) value must be aligned at an address divisible by 4. This rule is not absolute due to the different ways processors perform memory access. Many processors will automatically make the adjustment to create a 4 byte read as a series of smaller single or double byte operations and combine the results in the expected 4 byte address. This convenience comes at a cost of efficiency. If a processor does not handle misaligned access, an exception will be raised.
Running into this issue often confounds developers when they run into this issue. Thinking, "It works on system A, why does it crash on system B?" There is nothing syntactically wrong with the code, and even a thorough inspection of the code would not reveal any glaring issues with the code. I demonstrate two common implementations that I have run across when working with serialized data communications that can potentially lead to a misaligned memory access violation.
Both examples use this structure definition to provide a simplified access to the fields of the message. This structure is purposely designed to create a layout where field2
is not aligned on a 4-byte boundary:
C/C++
struct MsgFormat
{
unsigned short field1;
unsigned long field2;
unsigned char field3;
};
Blind Copy
The code below assumes that the MsgFormat
structure is layer out in memory exactly as the fields would be if you added up the size of each field and offset it from the beginning of the structure. Note also that this example ignores byte-order conversion.
C/C++
const size_t k_size = sizeof(MsgFormat);
char buffer[k_size];
MsgFormat packet;
::memcpy(&MsgFormat, buffer. k_size);
short value_a = packet.field1;
long value_b = packet.field2;
The problem is the compiler is allowed to take certain liberties with the layout of data structures in memory in order to produce optimal code. This usually done by adding extra padding bytes throughout the structure to ensure the individual data fields are optimally aligned for the target architecture. This is what the structure will most likely look like in memory:
C/C++
struct MsgFormat
{
unsigned short field1;
unsigned short padding;
unsigned long field2;
unsigned char field3;
};
Dereferencing Unaligned Memory
This example demonstrates a different way to trigger the unaligned memory access violation. We recognize that byte-order conversion has not been properly handled in the previous example, so let's rectify that:
C/C++
const size_t k_size = sizeof(MsgFormat);
char buffer[k_size];
MsgFormat packet;
Packet.field1 = ntohs(*(unsigned short*) buffer);
char* p_field2 = buffer + sizeof(unsigned short);
Packet.field2 = ntohl (*(unsigned long *)p_field2);
Even more insidious memory alignment issues can appear when message definitions become large sets of nested structures. This is a good design decision in many ways, it will help organize and abstract groups of data. However, care must be taken to ensure adding fields in a deeply nested child does not cause memory alignment issues. One must always be aware of the possibility for pointers passed into a function may not be properly aligned. In practice, I think this is a responsibility that should fall upon the function caller as this is a rare but very real possibility to be aware of.
One final practice that could be a cause of future pain is pointers embedded in the message structure. It is common to see offset fields indicate the location of a block of data in a message structure. This can be done safely, as long as validity checks are made along the way.
Message Buffer Management
Much of the code that I have worked with did a fine job of managing buffers appropriately. Allocations are freed properly as expected. Buffer lengths are verified. However, it appears that much of the misused techniques that I described above, are actually driven by the developers attempt to reduce memory allocations and eliminate copying. Therefore, another goal that I will try to integrate into Alchemy, is efficient and flexible buffer management. Specifically aiming towards minimal allocations, minimal copying of buffers, and robust memory management.
Summary
We have identified a handful of common problems that we would like to address for the Network Alchemy library. The next step is to explore some strategies to determine what could feasibly solve these problems in a realistic and economical way. I will soon add an entry that describes the unit-test framework that I prefer, as well as the tools I have developed over the years to help improve my productivity of developing in an independent test harness.
Original post blogged at Code of the Damned.