Introduction
Let's think a little about binary serialization. This problem is quite common and everyone has solved it at least once. But what if we introduce a number of serious restrictions to this problem:
- Our binary serializer must support versioning and try to find compatible by version downgrade
- We want to know if no serializer version is compatible with the current one
- We don't want to describe each fields serialization for POD-structures
- We really want recursion when serializing structure members
- Each serialized structure version can have different fields, that cannot be presented in structures in other versions
- And of course, we want INCREDIBLE speed, how else :), so all of the above should be checked during compilation.
The task is clear and it seems nothing complicated :).
Background
Let's solve the problem. We need 4 common operations to be supported by our serializer:
- serialize
- deserialize (who needs our data buffer if we cannot recover useful data from it)
- the size needed for operations above
- make move intermediate objects
How we can serialize data? Consistently coping all serializable members of course.
And what about data arrays and strings? Easy, we push the elements count of array first, and then serialize all elements one-by-one.
i.e., int32
will be serialized as 4 bytes.
XXXX
The arrays and string will be presented as follows (for x64):
SSSSSSSS DDDDDDDDDDD, where S - is a count of items, and D - is data item
Using the Code
Let's define some structures version:
#define API_VERSION_MAJOR 1
#define API_VERSION_MINOR 30
The compiler must check our current structure version, and then try to find the closest implementation. We must remember about POD-structures check in future as well, because all of them can be serialized by simple copy. Here, you can take into account the endianness of the platform and data in the buffer. The common view of our serialization code will be:
template<typename __Type, uint32_t __IsPOD> struct Core {
template<int32_t __Maj, int32_t __Min> struct Serializer {
static bool proc(__Type& t, uint8_t*& buffer, uint32_t& size) {
return false;
}
};
};
template<typename __Type> struct Core<__Type, 1> {
template<int32_t __Maj, int32_t __Min> struct Serializer {
static bool proc(__Type& t, uint8_t*& buffer, uint32_t& size) {
const uint32_t typeSz = bsr_size(t);
if (size >= typeSz) {
::memcpy(buffer, &t, typeSz);
buffer += typeSz;
size -= typeSz;
return true;
}
SERIALIZER_ASSERT(!"Too less buffer size for serialization!");
return false;
}
};
};
The call of standard function is_pod will return if type is POD. In C++14 (and C++11) PODs.
A POD struct is a non-union class that is both a trivial class and a standard-layout class, and has no non-static data members of type non-POD struct, non-POD union (or array of such types). Briefly, PODs have no non-trivial constructors, non-trivial copy and move constructors, non-trivial destructors, inheritance, private and protected members, non-trivial copy assignment and move operators, virtual functions, non-POD members.
The function, which can call the serializer routine will be:
template<typename __Type> SERIALIZER_INLINE bool bsr_serialize
(__Type& t, uint8_t*& buffer, uint32_t& size) {
SERIALIZER_ASSERT(buffer != nullptr);
SERIALIZER_ASSERT(size >= bsr_size(t));
return Core<__Type, std::is_pod<__Type>::value>::Serializer
<API_VERSION_MAJOR, API_VERSION_MINOR>::proc(t, buffer, size);
}
Now we should look at serializer's template specialization for each type. First of all, we need the default implementation, because we need to decrement version, while the compiler is looking for compatible version. Something like this:
template<> struct Core<Test, 0> {
template<int32_t __Maj, int32_t __Min> struct Serializer {
static bool proc(Test& t, uint8_t*& buffer, uint32_t& size) {
static_assert(__Min >= 0, __FUNCTION__ "
is not defined for this version."); return Serializer<__Maj, __Min - 1>::proc(t, buffer, size);
}
};
};
It is useful to put struct
Serializer code to the macro because it must be declared for every template specialization for each type.
It's just a recursive call with decrement of minor version and stop with static_assert
. So, if we will find nothing, static_assert
will help us to detect this. With the general scheme, it is clear. There is a class Core
and there is a specialization for a serializable non-POD type that decrements the version while searching for a suitable candidate. If it doesn't find it, it falls into default implementation.
Now, according to paragraph 5, we need to think about the fact that in the structures of different versions, there may not be some class members. The SFINAE ("Substitution Failure Is Not An Error") principle will help us. If in short, when defining function overloads, erroneous template instantiations do not cause a compilation error, but are discarded from the list of candidates for the most suitable overload. See the documentation for more information. The following macro defines structure, which can help us to check the member existence in compile-time.
#define SFINAE_DECLARE_MEMBER(parent,type,name) \
template<typename T> struct __sfiname_has_mem_ ## parent ## name { \
struct Fallback { type name; }; \
struct Derived : T, Fallback { }; \
template<typename C, C> struct ChT; \
template<typename C> static char(&f(ChT<type Fallback::*, &C::name>*))[1]; \
template<typename C> static char(&f(...))[2]; \
static bool const value = sizeof(f<Derived>(0)) == 2; \
};
The result of this code we can use as template parameter. The following code will call serialization function only if template parameter is not 0, i.e., the member of a class/struct exists.
template<int enabled> struct InternalSerialize {
template<typename __Type> static bool proc(__Type& t, uint8_t*& buffer, uint32_t& size) {
bool res = false;
DEFINE_INIT_SIZE;
res = binary_serialization::bsr_serialize(t.name, buffer, size);
CHECK_BEC_SIZE(unique,type,name);
return res;
}
};
template<> struct InternalSerialize<0> {
template<typename __Type> static bool proc
(__Type& , uint8_t*& , uint32_t& ) {
SERIALIZER_ASSERT(!"Unexpected serialize routine!");
return false;
}
};
Note that this code must be defined for every serializable __Type
. So it is useful to use macro.
Some of the useful serializable types need additional function implementations, for example (I will not consider custom allocators, because you can easily fix code to use them).
template<typename __Type> SERIALIZER_INLINE
bool bsr_serialize(__Type& t, uint8_t*& buffer, uint32_t& size);
template<typename __Type> SERIALIZER_INLINE
bool bsr_serialize(std::vector<__Type>& t, uint8_t*& buffer, uint32_t& size);
template<uint32_t __Sz> SERIALIZER_INLINE
bool bsr_serialize(wchar_t(&t)[__Sz], uint8_t*& buffer, uint32_t& size);
template<typename __Type> SERIALIZER_INLINE
bool bsr_serialize(std::basic_string<__Type>& t, uint8_t*& buffer, uint32_t& size);
As a result, we can verify the existence of the member and, if successful, run serialization, which takes into account the types of POD and checks the version. We similarly define deserialization, size and move functions. You can find the final code in the attachment.
Example Usage
Let's talk about usage. I defined several macros in the final code, to make the usage simpler.
For example, we can use the following structure:
#define API_VERSION_MAJOR 1
#define API_VERSION_MINOR 30
struct Test {
std::vector<int> id;
std::string login;
};
namespace binary_serialization {
# include "binary_serializer.hpp"
DECLARE_SERIALIZABLE_MEMBER(Test, std::vector<int>, id);
DECLARE_SERIALIZABLE_MEMBER(Test, std::vector<int>, id_2); DECLARE_SERIALIZABLE_MEMBER(Test, std::string, login);
template<> struct Core<Test, 0> {
typedef Test Type_t;
DEFAULT_IMPLEMENTATION(Type_t);
template<> struct Serializer<1, 2> { static bool proc(Type_t& t, uint8_t*& buffer, uint32_t& size) {
return
_INTERNAL_SERIALIZE(Test, Type_t, std::vector<int>, id) &&
_INTERNAL_SERIALIZE(Test, Type_t, std::string, login);
}
};
template<> struct Serializer<1, 1> {
static bool proc(Type_t& t, uint8_t*& buffer, uint32_t& size) {
return
_INTERNAL_SERIALIZE(Test, Type_t, std::vector<int>, id) &&
_INTERNAL_SERIALIZE(Test, Type_t, std::vector<int>, id_2) && _INTERNAL_SERIALIZE(Test, Type_t, std::string, login);
}
};
template<> struct Move<1, 1> {
static bool proc(Type_t& src, Type_t& dst) {
return
_INTERNAL_MOVE(Test, Type_t, std::vector<int>, id) &&
_INTERNAL_MOVE(Test, Type_t, std::vector<int>, id_2) &&
_INTERNAL_MOVE(Test, Type_t, std::string, login);
}
};
template<> struct Move<1, 2> {
static bool proc(Type_t& src, Type_t& dst) {
return
_INTERNAL_MOVE(Test, Type_t, std::vector<int>, id) &&
_INTERNAL_MOVE(Test, Type_t, std::string, login);
}
};
template<> struct Deserializer<1, 1> {
static bool proc(Type_t& t, const uint8_t*& buffer, uint32_t& size) {
return
_INTERNAL_DESERIALIZE(Test, Type_t, std::vector<int>, id) &&
_INTERNAL_DESERIALIZE(Test, Type_t, std::vector<int>, id_2) &&
_INTERNAL_DESERIALIZE(Test, Type_t, std::string, login);
}
};
template<> struct Deserializer<1, 2> {
static bool proc(Type_t& t, const uint8_t*& buffer, uint32_t& size) {
return
_INTERNAL_DESERIALIZE(Test, Type_t, std::vector<int>, id) &&
_INTERNAL_DESERIALIZE(Test, Type_t, std::string, login);
}
};
template<> struct Size<1, 1> {
static uint32_t proc(Type_t& t) {
return
_INTERNAL_SIZE(Test, Type_t, std::vector<int>, id) +
_INTERNAL_SIZE(Test, Type_t, std::vector<int>, id_2) +
_INTERNAL_SIZE(Test, Type_t, std::string, login);
}
};
template<> struct Size<1, 2> {
static uint32_t proc(Type_t& t) {
return
_INTERNAL_SIZE(Test, Type_t, std::vector<int>, id) +
_INTERNAL_SIZE(Test, Type_t, std::string, login);
}
};
};
}
The main function of our test will look like this:
int main() {
uint8_t buffer[1024];
{
Test t_1 = { { 1, 2, 3, 4, 5 }, "test_login" };
uint8_t* buffer_ptr = buffer;
uint32_t buffer_size = sizeof(buffer);
binary_serialization::bsr_serialize(t_1, buffer_ptr, buffer_size);
}
{
Test t_1;
uint8_t const* buffer_ptr = buffer;
uint32_t buffer_size = sizeof(buffer);
binary_serialization::bsr_deserialize(t_1, buffer_ptr, buffer_size);
printf("%s", t_1.login.c_str());
}
return 0;
}
This program just creates and initializes object, serialize it to buffer, then deserialize to other object.
In disassembly, you can see (Visual Studio 2017, with /O2 optimization).
00000000011710FC mov ecx,8
0000000001171101 xor eax,eax
0000000001171103 mov r10,qword ptr [rsp+58h]
0000000001171108 mov r8,qword ptr [t_1]
000000000117110D sub r10,r8
0000000001171110 sar r10,2
0000000001171114 test r10,r10
0000000001171117 je main+0BCh (0117112Ch)
0000000001171119 nop dword ptr [rax]
0000000001171120 add rcx,4
the size for each element if 'id'
0000000001171124 inc rax
0000000001171127 cmp rax,r10
000000000117112A jb main+0B0h (01171120h)
000000000117112C cmp ecx,400h
0000000001171132 ja main+156h (011711C6h)
0000000001171138 mov qword ptr [rsp+20h],r10
000000000117113D movsd xmm0,mmword ptr [rsp+20h]
0000000001171143 movsd mmword ptr [rbp-70h],xmm0
0000000001171148 lea r9,[rbp-68h]
000000000117114C mov ecx,3F8h
00000000003B1151 xor edx,edx
00000000003B1153 test r10,r10
00000000003B1156 je main+114h (03B1184h)
00000000003B1158 cmp ecx,4
where vector elements are serializing
00000000003B115B jb main+156h (03B11C6h)
00000000003B115D mov eax,dword ptr [r8+rdx*4]
00000000003B1161 mov dword ptr [r9],eax
00000000003B1164 add r9,4
00000000003B1168 add ecx,0FFFFFFFCh
00000000003B116B inc rdx
00000000003B116E mov rax,qword ptr [rsp+58h]
00000000003B1173 mov r8,qword ptr [t_1]
00000000003B1178 sub rax,r8
00000000003B117B sar rax,2
00000000003B117F cmp rdx,rax
00000000003B1182 jb main+0E8h (03B1158h)
loop if counter is below
00000000003B1184 mov r8,qword ptr [rsp+78h]
00000000003B1189 mov qword ptr [rsp+20h],r8
00000000003B118E cmp ecx,8
00000000003B1191 jb main+156h (03B11C6h)
00000000003B1193 movsd xmm0,mmword ptr [rsp+20h]
to the buffer
00000000003B1199 movsd mmword ptr [r9],xmm0
00000000003B119E add ecx,0FFFFFFF8h
00000000003B11A1 mov eax,ecx
00000000003B11A3 cmp r8,rax
00000000003B11A6 ja main+156h (03B11C6h)
00000000003B11A8 test r8,r8
00000000003B11AB je main+156h (03B11C6h)
00000000003B11AD lea rdx,[rsp+68h]
00000000003B11B2 cmp qword ptr [rbp-80h],10h
00000000003B11B7 cmovae rdx,qword ptr [rsp+68h]
00000000003B11BD lea rcx,[r9+8]
00000000003B11C1 call memcpy (03B2B23h)
00000000003B11C6 lea rcx,[t_1]
00000000003B11CB call Test::~Test (03B1280h)
00000000003B11D0 xorps xmm0,xmm0
You can move the type serializator to another header file and define different type versions under namespaces and use single serializer for these structures as follows:
namespace ver_2 {
#define API_VERSION_MAJOR 1
#define API_VERSION_MINOR 2
struct Test {
std::vector<int> id;
std::string login;
};
# include "serializer_base.hpp"
#undef API_VERSION_MAJOR
#undef API_VERSION_MINOR
}
namespace ver_1 {
#define API_VERSION_MAJOR 1
#define API_VERSION_MINOR 1
struct Test {
std::vector<int> id;
std::vector<int> id_2;
std::string login;
};
# include "serializer_base.hpp"
#undef API_VERSION_MAJOR
#undef API_VERSION_MINOR
}
The code was tested in Visual Studio 2017.
Have a nice code!
History
- 29th October, 2019: Initial version