C++ serialization framework/library

bishopnator29a

4.00/5 (4 votes)

26 Jun 2016CPOL23 min read

20.4K

339

Saving, Loading and data migration in compact library

Download Serialization_v0.1.zip

Introduction

I needed to integrate a serialization mechanism (saving and loading of data) to my application but I failed to find appropriete library which fulfils my requirements. I focused mostly on boost.serialization library which also played significant role in designing this library which I present here. In my application I am using custom allocators for all my objects and it was impossible to integrate it with boost.serialization code. Then I decided to write my own library to fulfil all my requirements.

Background

I am using MFC serialization (CObject::Serialize) for a long time, but I didn't want to integrate MFC into my code, so this was out of options. The goals from http://www.boost.org/doc/libs/1_61_0/libs/serialization/doc/ were mostly applicable to my application but not all of them (like portability). In my library I focused on following goals:

Serialization methods/functions without templates - I need to export serialization code from DLLs and using templates just makes it harder
Custom memory management (arbitrary not only custom allocators, class new/delete operators)
Independent versioning for each class definition. That is, when a class definition changed, older files can still be imported to the new version of the class.
Deep pointer save and restore. That is, save and restore of pointers saves and restores the data pointed to.
Proper restoration of pointers to shared data.
Serialization of STL containers and other commonly used templates.
Non-intrusive. Permit serialization to be applied to unaltered classes. That is, don't require that classes to be serialized be derived from a specific base class or implement specified member functions. This is necessary to easily permit serialization to be applied to classes from class libraries that we cannot or don't want to have to alter.
The archive interface must be simple enough to easily permit creation of a new type of archive.
Support for complex migration (described below in the article)

The points 3-8 are copied from boost link above.

I won't focus on boost.serialization in this article. I think the library is great and it is useful in many cases and it has also great documentation. I have to say also that I am not expert in boost.serialization. The problems which I had in my code I just believed that were impossible to overcome and own library was the only way how to go. At the end I think the library which I wrote can be very useful for anybody and I didn't want to keep it for myself.

Building the library

The library is using boost and visual leak detector.

http://www.boost.org/

https://vld.codeplex.com/

Download libraries from the above links, extract them on your HDD, edit StartVS.bat file to setup the paths of the libraries and start visual studio by executing this batch file. SerializationTests project requires to have built boost test framework library but it is not needed to build the library itself.

Using the code

Let's start with example (it is example from attached zip file but adapted to be as single file):

C++

// Serialization
#include <Serialization/Archive/BinaryOutArchive.h>
#include <Serialization/Archive/BinaryInArchive.h>
#include <Serialization/Archive/OutArchiveStdFunctors.h>
#include <Serialization/Archive/InArchiveStdFunctors.h>
#include <Serialization/File/MemoryBinaryOutFile.h>
#include <Serialization/File/MemoryBinaryInFile.h>
#include <Serialization/DeclareMacros.h>
#include <Serialization/ImplementMacros.h>

// std
#include <string>

using namespace Serialization;

/// class to serialize
class MyData
{
public:
    /// construction
    MyData()
    : m_Value(0)
    { }
    MyData(int value, const std::string& name)
    : m_Value(value)
    , m_Name(name)
    { }

    ~MyData()
    { }

    /// get/set value
    int GetValue() const        { return m_Value; }
    void SetValue(int value)    { m_Value = value; }

    /// get/set name
    const std::string& GetName() const        { return m_Name; }
    void SetName(const std::string& name)    { m_Name = name; }

private:
    int                m_Value;
    std::string        m_Name;
};

/// serialization routines
void Save(BinaryOutArchive& ar, const MyData& data, const int /*classVersion*/)
{
    ar << data.GetValue();
    ar << data.GetName();
}
void Load(BinaryInArchive& ar, MyData& data, const int /*classVersion*/)
{
    int value;
    std::string name;
    ar >> value;;
    ar >> name;
    data.SetValue(value);
    data.SetName(name);
}

/// enable serialization for MyData
DECLARE_TYPE_INFO_STRING_KEY(MyData);
IMPLEMENT_TYPE_INFO(MyData, "MyData", 0);
REGISTER_KEY_SERIALIZATION(BinaryInArchive, BinaryOutArchive, const char*);
REGISTER_CLASS_SERIALIZATION(BinaryInArchive, BinaryOutArchive, MyData);

//////////////////////////////////////////////////////////////////////////
void main()
{
    // save data
    void* pBuffer = nullptr;
    size_t bufferSize = 0;
    try
    {
        MyData data1(1, "MyData#1"), data2(2, "MyData#2"), data3(3, "MyData#3");
        MemoryBinaryOutFile fo(1024);
        BinaryOutArchive ao(fo);
        ao << data1 << data2 << data3;
        pBuffer = fo.Release(bufferSize);

        printf_s("Buffer created: %d\n", bufferSize);
        printf_s("\tData1: value = %d; name = %s\n", data1.GetValue(), data1.GetName().c_str());
        printf_s("\tData2: value = %d; name = %s\n", data2.GetValue(), data2.GetName().c_str());
        printf_s("\tData3: value = %d; name = %s\n", data3.GetValue(), data3.GetName().c_str());
    }
    catch(SerializationException& e)
    {
        printf_s("Save error: %s\n", e.what());
        return;
    }

    printf_s("-------------------------------------------\n");

    // load data
    try
    {
        MyData data1, data2, data3;
        MemoryBinaryInFile fi(pBuffer, bufferSize);
        BinaryInArchive ai(fi);
        ai >> data1 >> data2 >> data3;

        printf_s("Loaded data:\n");
        printf_s("\tData1: value = %d; name = %s\n", data1.GetValue(), data1.GetName().c_str());
        printf_s("\tData2: value = %d; name = %s\n", data2.GetValue(), data2.GetName().c_str());
        printf_s("\tData3: value = %d; name = %s\n", data3.GetValue(), data3.GetName().c_str());
    }
    catch(SerializationException& e)
    {
        printf_s("Load error: %s\n", e.what());
    }

    // release
    free(pBuffer);
    bufferSize = 0;
}

Serialization of user classes is enabled through specialization of templates. To make the task easier the library provides set of macros which generate required specializations. In above example those macros are used here:

C++

/// enable serialization for MyData
DECLARE_TYPE_INFO_STRING_KEY(MyData);
IMPLEMENT_TYPE_INFO(MyData, "MyData", 0);
REGISTER_KEY_SERIALIZATION(BinaryInArchive, BinaryOutArchive, const char*);
REGISTER_CLASS_SERIALIZATION(BinaryInArchive, BinaryOutArchive, MyData);

Macro DECLARE_TYPE_INFO_STRING_KEY defines a type of key which is bind together with class. The key is required for serializing the pointers to the classes. The archive needs to store a type info of serialized pointer so later it will be able to create exactly the type which was stored and load data into it. The macro DECLARE_TYPE_INFO_STRING_KEY binds a string key type with the class. Library supports arbitrary key types through general macro DECLARE_TYPE_INFO(class_name, key_type_name).

Macro IMPLEMENT_TYPE_INFO binds a key value with the class. This is exactly the value which archive will store to the file if a type must be saved. The keys must be unique for all class which support serialization.

Macro REGISTER_KEY_SERIALIZATION binds a key type with archives. It is possible to use different keys with different archives. If class is serialized, its bound key type must be the same as key type bound to the archive. Otherwise exception is thrown.

Macro REGISTER_CLASS_SERIALIZATION register a class to the archives so it is possible to store references and pointers of that class to the specified archives.

More advanced example

The AdvancedExample shows a more realistic usage of serialization. The file AppVersion.h contains a macros to define an application version. Each version changes the data in City.h and hence migration is needed. I added different kind of migrations there - all of them where possible to solve directly in Load methods.

Important files

City.h

declaration of data types fro serialization

City.cpp

implementation of data types

metaCity.cpp

serialization of data types

Archive.h

declaration of archives used by types in City.h

Archive.cpp

implementation of archives

AppVersion.h

defines current application version (change the #define)

Application description

The application creates 4 items (cities) with predefined data values. According to application version in AppVersion.h it creates data which are only supported by that version. The data are possible to store to a file and also select target application version. So it is possible to export data, switch the application version to higher version and then import file. Or export file from higher version to lower version and then load the file in that lower version.

Code documentation

Macro description

The DeclareMacros.h contains macros which should be used in header files and the ImplementMacros.h contains macros which should be used in implementation files.

DeclareMacros.h

DECLARE_TYPE_INFO(class_name, key_type_name)

Binds a type with a key type. It is necessary to provide a specialization of DirectValueReader and DirectValueWriter templates for the key_type_name.

DECLARE_TYPE_INFO_STRING_KEY(class_name)

Binds a type with std::string key type.

DECLARE_TYPE_INFO_WSTRING_KEY(class_name)

Binds a type with std::wstring key type.

ImplementMacros.h

IMPLEMENT_TYPE_INFO(class_name, key, version_number, ...)

Implementation of DECLARE_TYPE_INFO macro.

class_name	name of the class
key	key value (must be unique within all `IMPLEMENT_TYPE_INFO` usages)
version_number	Used for enabling loading older archives. Everytime the class member set is changed, the version_number should be increased.
...	List of parent classes (only those which support serialization)

REGISTER_CLASS_SERIALIZATION(in_archive_name, out_archive_name, class_name)

Creates specialization of TypedInArchiveObjectBinder and TypedOutArchiveObjectBinder. The specialization of those 2 templates expects that the input class has following 2 methods:

C++

void Save(out_archive_name& ar, const int classVersion) const;
void Load(in_archive_name& ar, const int classVersion);

or as stand-alone functions:

C++

void Save(out_archive_name& ar, const class_name& obj, const int classVersion);
void Load(in_archive_name& ar, class_name& obj, const int classVersion);

REGISTER_KEY_SERIALIZATION(in_archive_name, out_archive_name, key_type_name)

Bind a key_type_name with provided archives. Only types which are registered with same key types are possible to serialize with the archives.

TypedSharedPtrHolder.h

DECLARE_SHARED_PTR0, DECLARE_SHARED_PTR1, DECLARE_SHARED_PTR2

Macros enable serialization of shared pointer for the class. The number defines how many parent classes has a type.

C++

class C_no_parents { ... };
DECLARE_TYPE_INFO(C_no_parents);
DECLARE_SHARED_PTR0(C_no_parents, std::shared_ptr<C_no_parents>); 

class C_single_parent : public A { };
DECLARE_TYPE_INFO(C_single_parent );
DECLARE_SHARED_PTR1(C_single_parent, std::shared_ptr<C_single_parent>, std::shared_ptr<A>); 

class C_two_parents : public B, public A { };
DECLARE_TYPE_INFO(C_two_parents);
DECLARE_SHARED_PTR2(C_two_parents, std::shared_ptr<C_two_parents>, std::shared_ptr<B>, std::shared_ptr<A>);

The library supports only shared pointers for classes which have up to 2 parent classes. But it is not problem to write DECLARE_SHARED_PTR3, DECLARE_SHARED_PTR4, etc. if needed.

Customizing archive type

Check BinaryInArchive and BinaryOutArchive for details. Important step is to inherit from templates InArchive and OutArchive which provides the streaming operators. The InArchive template expects from main archive class that it has a member method:

C++

void Read(void* pBuffer, size_t size);

The OutArchive template expects following method:

C++

void Write(const void* pBuffer, size_t size);

The Read method can be added by yourself or by using BinaryInFileComposition as another parent class. The Write method can be also added by yourself or by using BinaryOutFileComposition as another parent class.

Helper templates

The whole library customize serialization of user types by template specializations. Those specializations are used by archive-object binders described in class description section below.

Non-default constructible classes

If class doesn't provide default constructor, the specialization of ReadConstructDataImpl and WriteConstructDataImpl must be provided.

C++

template<typename ArchiveT, typename ObjectT, typename Enabled = void>
struct ReadConstructDataImpl
{
    static void Invoke(ArchiveT& /*ar*/, ObjectT* pMemory, const int /*classVersion*/)
    {
        // read input parameters
        ...
        // call class constructor
        ::new(pMemory) ObjectT(...); // pass input parameters to constructor
    }
};

template<typename ArchiveT, typename ObjectT, typename Enable = void>
struct WriteConstructDataImpl
{
    static void Invoke(ArchiveT& /*ar*/, const ObjectT& /*obj*/, const int /*classVersion*/)
    {
        // write all parameters to be able to call constructor in ReadConstructDataImpl::Invoke call
    }
};

Note that last template parameter Enable is possible to use for group of classes by using std::enable_if. Let's consider that you have an intermediate class A which accepts 2 strings and you have iherited 3 classes from A which have same constructor signature as A (to be able to call parent constructor). In this case you can write:

C++

template<typename ArchiveT, typename ObjectT>
struct WriteConstructDataImpl<ArchiveT, ObjectT, typename std::enable_if<std::is_base_of<A, ObjectT>::value>::type>
{
    static void Invoke(ArchiveT& ar, const ObjectT& obj, const int /*classVersion*/)
    {
        ar << obj.GetString1();
        ar << obj.GetString2();
    }
};

template<typename ArchiveT, typename ObjectT>
struct ReadConstructDataImpl<ArchiveT, ObjectT, typename std::enable_if<std::is_base_of<A, ObjectT>::value>::type>
{
    static void Invoke(ArchiveT& ar, ObjectT* pMemory, const int /*classVersion*/)
    {
        std::string s1, s2;
        ar >> s1 >> s2;
        ::new(pMemory) ObjectT(s1, s2);
    }
};

Serialization of types without DECLARE_TYPE_INFO

It is possible to serialize types without DECLARE_TYPE_INFO. However in this case serialization library needs specialization of DirectValueWriter and DirectValueReader templates.

C++

struct MyHelperDataType
{
    int a, b, c;
};
template<>
struct DirectValueWriter<MyOutArchive, MyHelperDataType>
{
    static void Invoke(MyOutArchive& ar, const MyHelperDataType& value)
    {
        ar << value.a << value.b << value.c;
    }
};

template<>
struct DirectValueReader<MyInArchive, MyHelperDataType>
{
    static void Invoke(MyInArchive& ar, MyHelperDataType& value)
    {
        ar >> value.a >> value.b >> value.c;
    }
};

The disadvantage is that pointers of the type are not possible to serialize to/from archives. Also the migration of the data is not possible to perform easily if members are changed.

Custom memory allocation support

C++

template<typename ArchiveT, typename ObjectT, typename Enabled = void>
struct AllocateDataImpl
{
    static void* Invoke(ArchiveT& /*ar*/, const int /*classVersion*/)
    {
        return malloc(sizeof(ObjectT));
    }
};
template<typename ArchiveT, typename ObjectT, typename Enabled = void>
struct DeallocateDataImpl
{
    static void Invoke(ArchiveT& /*ar*/, const int /*classVersion*/, void* pMemory)
    {
        free(pMemory);
    }
};

You can write your own specialization of those templates to provide your own allocation.

If library needs to construct a type directly, it uses template class ConstructDefaultValue. This template is also useful for custrom memory allocation support. For example if it is necessary to read a container of containers from archive, there is no chance to provide custom construction of inner containers. The support for std containers uses exactly ConstructDefaultValue template for constructing the stored types so implementation is able to pass custom allocators to the constructor of those containers.

Writing and reading std::unique_ptr templates can use also some customized deleters. The library provides WriteUniquePtrDeleter and ReadUniquePtrDeleter to allow writing custom data bound to deleters.

Support for shared pointers is discussed in separated section.

Serializing parent class content

As the serialization of the classes can be implemented as member methods or stand-alone functions, it is not clear how to serialize parent class data. To unify the calls regardless of parent class serialization implementation, the library contains a template BaseObject<T>. The template parameter is the parent type. Note that if storing the data, the parent type should be const T.

void B::Save(MyOutArchive& ar, const int /*classVersion*/) const
{
    ar << BaseObject<const A>(*this);
}
void B::Load(MyInArchive& ar, const int /*classVersion*/)
{
    ar >> BaseObject<A>(*this);
}

Serialization::Access and friend access

If serialization is implemented using member methods, those methods can be declared as private if Serialization::Access is given friend access to this class.

C++

classs MyClass
{
    friend Serialization::Access;
private:
    void Save(MyOutArchive& ar, const int classVersion) const;
    void Load(MyInArchive& ar, const int classVersion);
};

Shared pointers support

It is necessary to declare a shared pointer type for the class with macro DECLARE_SHARED_PTR0 (or DECLARE_SHARED_PTR1 or DECLARE_SHARED_PTR2 according to how many parent classes the type has). It is then also necessary to provide specialization of template:

C++

template<typename ArchiveT, typename SharedPtrT, typename Enabled = void>
struct CreateSharedPtrImpl
{
    static SharedPtrT Invoke(ArchiveT& /*ar*/, SharedPtrT::value_type* /*pMemory*/)
    {
        // wrap a created object to a shared pointer
        ...    
    }
};

Together with above template specialization, the library expect also following specialization:

C++

template<typename T>
struct SharedPtrValueGetter
{
    static void* Invoke(const T& sharedPtr)
    {
        // extract raw pointer from a shared pointer
        ...
    }
};
template<typename WeakPtrT>
struct ToSharedPtr
{
	using SharedPtrT = ...; // shared pointer from WeakPtrT
	static SharedPtrT Invoke(const WeakPtrT& ptr)
	{
		// convert weak pointer to shader pointer
		return ...;
	}
};
template<typename T, typename U>
struct UpCastSharedPtr
{
	U Invoke(const T& /*ptr*/)
	{
		// convert a shared pointer T to a shared point U. It's up to the specialization to verify that types are related and convertible.
		return ...;
	}
};

The library has build-in support for std::shared_ptr and std::weak_ptr (see StdSharedPtrImpl.h).

Exception handling

Error reporting is done by using an exceptions. All exceptions thrown by the library are inherited from Serialization::SerializationException and they are placed in Exceptions.h file.

STD containers

The library has build-in support for serializing STD-container. If needed in code, it is necessary to include InArchiveStdFunctors.h and OutArchiveStdFunctors.h. It is possible to also write serialization for boost containers but boost contains so many that I rather didn't write them.

Serialization of template classes

The template classes are tricky. Different template arguments produce different types. The serialization library needs a unique key for every type which should be supported by library.

C++

template<typename T>
class MyTemplate
{
...
};
DECLARE_TYPE_INFO(MyTemplate, std::string); // <-- DOESN'T WORK !!

In the case of templates it is possible to expand and adapt code generated by macro DECLARE_TYPE_INFO:

C++

namespace Serialization
{
namespace Detail
{
template<typename T>
struct TypeInfoTraits<MyTemplate<T>> // partial specialization
{
using value_type = MyTemplate<T>;
using key_type = std::string; // or whatever key type is needed
};
}
}

but it is not possible to trick IMPLEMENT_TYPE_INFO in this way because for every type it is necessary to provide a key value (like string in above example). And hence there must be an IMPLEMENT_TYPE_INFO per instanced template.

C++

IMPLEMENT_TYPE_INFO(MyTemplate<int>, "MyTemplateInt", 0);
IMPLEMENT_TYPE_INFO(MyTemplate<char>, "MyTemplateChar", 0);
IMPLEMENT_TYPE_INFO(MyTemplate<bool>, "MyTemplateBool", 0);

Because it is necessary to write IMPLEMENT_TYPE_INFO for every instanced template, I write also DECLARE_TYPE_INFO for those types separately and export instanced template typed from DLL if serialization is placed in separated module.

Archive-object binders

ArchiveObjectBinder

Binds archive and objects types together so it is possible to call specific methods on both types without using common base classes. This class serves as base class for defining interface for input and output archive object binders.

InArchiveObjectBinder

Interface class for reading data from an archive to an object

void Read(BaseInArchive& ar, void* ptr, const int classVersion) const

Read content of an object.

void ReadConstructData(BaseInArchive& ar, void* ptr, const int classVersion) const

Read input data for constructing an object.

void* AllocateObject(BaseInArchive& ar, const int classVersion) const

Allocate an object (constructor is not called yet).

void DeallocateObject(BaseInArchive& ar, const int classVersion, void* pMemory) const

Release a memory previously allocated by AllocateObject call.

void DestructObject(BaseInArchive& ar, const int classVersion, void* pMemory) const

Call a destructor of bound object.

std::unique_ptr<sharedptrwrapper> CreateSharedPtr(BaseInArchive& ar, void* ptr) const

<sharedptrwrapper>Wrap created and loaded pointer to a shared pointer wrapper. Described in Shared pointers support section.

void GetInputObjects(BaseInArchive& ar, void* ptr, LoadedPointerInfoArray& inputObjects) const

Support for complex object migration. Described in Complex migration support section.

bool PostLoad(BaseInArchive& ar, void* ptr, const int classVersion) const

Support for complex object migration. Described in Complex migration support section.

InArchiveKeyBinder

Interface for reading keys from an archive. The key describes a class type written in archive. It is important archive-object binder to have support of writing/reading polymorphic pointers.

const TypeInfo::Key& Read(BaseInArchive& ar) const

Read a key from an archive.

DeferredInArchiveObjectBinder

Support for reading objects directly like objects stored in containers. This archive-object binder allows to use non-default constructible objects in std::vector like containers.

OutArchiveObjectBinder

Interface class for writing data to an archive from an object.

void Write(BaseOutArchive& ar, const void* ptr, const int classVersion) const

Write a content from the input object.

void WriteConstructData(BaseOutArchive& ar, const void* ptr, const int classVersion) const

Write a data required by constructor of the input object.

OutArchiveKeyBinder

Interface for writing keys to an archive. The key is bound together with class type to support deep writing of the pointers.

void Write(BaseOutArchive& ar, const TypeInfo::Key& key) const

Write a key to an archive.

DeferredOutArchiveObjectBinder

Support for writing objects directly which can be loaded by using DeferredInArchiveObjectBinder. It writes all data required to call a constructor of written object and the content of the object.

template<typename ArchiveT, typename ObjectT> TypedInArchiveObjectBinder

Main implementation of InArchiveObjectBinder. It bind actuall types of archive and object together. Implementation is customized by using further template classes so it is not necessary to reimplement the class for specific archives and class types but rather just specialization of sub-templates. Input ArchiveT must inherit from BaseInArchive and input ObjectT must be non-pointer and cannot be const. The instances of this class is created by using a macro REGISTER_CLASS_SERIALIZATION. All input parameters of methods are ensured to be a correct types by BaseInArchive.

void Read(BaseInArchive& ar, void* ptr, const int classVersion) const

Call ObjectT::Load member method or stand-alone void Load(ArchiveT&, ObjectT&, int) function.

void ReadConstructData(BaseInArchive& ar, void* ptr, const int classVersion) const

For abstract classes in doesn't do anything and the method shouldn't be called (it throws an exception if called). For non-abstract classes it calls a default constructor. It uses a template class ReadConstructDataImpl for customizing behavior.

void* AllocateObject(BaseInArchive& ar, const int classVersion) const

Allocate memory for an object. For abstract classes it doesn't do anything and the method shouldn't be called (it throws an exception if called). For non-abstract classes it allocates a memory using malloc. The behavior can be customized by specializing AllocateDataImpl template.

void DeallocateObject(BaseInArchive& ar, const int classVersion, void* pMemory) const

Deallocates memory of an object previously allocated by AllocateObject call. For abstract classes it doesn't do anything and the method shouldn't be called (it throws an exception if called). For non-abstract classes it releases memory using free. The behavior can be customized by specializing DeallocateDataImpl template.

void DestructObject(BaseInArchive& ar, const int classVersion, void* pMemory) const

Calls a destructor of an object previously constructed by ReadConstructData method. The method shouldn't be called on abstract classes (it throws an exception). The reason is that ReadConstructData is not allowed to call on abstract classes so neither this method. The behavior is possible to customize by specializing DestructDataImpl template.

std::unique_ptr<SharedPtrWrapper> CreateSharedPtr(BaseInArchive& ar, void* ptr) const

Create a shared pointer wrapper. Described in Shared pointers support section.

void GetInputObjects(BaseInArchive& ar, void* ptr, LoadedPointerInfoArray& inputObjects) const

If complex migration is enabled for ArchiveT, it calls ObjectT::GetInputObjects member method or stand-alone void GetInputObjects(ArchiveT&, ObjectT&, LoadedPointerInfoArray&) function. Described in more details in Complex migration support section.

bool PostLoad(BaseInArchive& ar, void* ptr, const int classVersion) const

If complex migration is enabled for ArchiveT, it calls ObjectT::PostLoad member method or stand-alone bool PostLoad(ArchiveT&, ObjectT&, const int) function. Described in more details in Complex migration support section.

template<typename ArchiveT, typename KeyT> TypedInArchiveKeyBinder

Implementation of InArchiveKeyBinder to read a keys of specific type. The specialization of this template is created by REGISTER_KEY_SERIALIZATION macro.

template<typename ArchiveT, typename ObjectT> TypedDeferredInArchiveObjectBinder

Implementation of DeferredInArchiveObjectBinder. The instances of the classes are created directly on stack if object must be loaded directly from the archive.

ObjectT Read(BaseInArchive& ar);

Read an object from archive. The ObjectT must be movable, but not necessary to be copyable.

template<typename ArchiveT, typename ObjectT> TypedOutArchiveObjectBinder

Main implementation of OutArchiveObjectBinder. It bind actuall types of archive and object together. Implementation is customized by using further template classes so it is not necessary to reimplement the class for specific archives and class types but rather just specialization of sub-templates. Input ArchiveT must inherit from BaseInArchive and input ObjectT must be non-pointer and cannot be const. The instances of this class is created by using a macro REGISTER_CLASS_SERIALIZATION. All input parameters of methods are ensured to be of correct types by BaseOutArchive.

void Write(BaseOutArchive& ar, const void* ptr, const int classVersion) const

Call ObjectT::Save member method or stand-alone void Save(ArchiveT&, const ObjectT&, int) function.

void WriteConstructData(BaseOutArchive& ar, const void* ptr, const int classVersion) const

For abstract classes in doesn't do anything and the method shouldn't be called (it throws an exception if called). For non-abstract classes it uses a template class WriteConstructDataImpl for customizing behavior. By default template doesn't do anything but it is possible to write specialization to store input data for constructing a class.

template<typename ArchiveT, typename KeyT> TypedOutArchiveKeyBinder

Implementation of OutIArchiveKeyBinder to write a key of specific type. The specialization of this template is created by REGISTER_KEY_SERIALIZATION macro.

template<typename ArchiveT, typename ObjectT> TypedDeferredOutArchiveObjectBinder

Implementation of DeferredOutArchiveObjectBinder. The instances of the classes are created directly on stack if object must be saved directly to the archive.

void Write(BaseOutArchive& ar, const ObjectT& obj)

Write an object to archive. It writes construct data and content of the object.

Creating files compatible with older version of an application

By general the library doesn't support this. If application requires to export files which should be possible to load with older versions of the same application, I recommend to ignore and not use at all class versioning. The file version when using class versioning system is defined by set of all class versions serialized in the file. In this case it would be necessary to track those sets somehow and to export class versions if one of the classes is about to be changed. Rather I suggest to use a single number to track the application file version and pass it to archive. Then every Save/Load method can access this number and store/load only what was at that time. Good practice in this case is to create an intermediate file version (which is not yet released). If class is about to be changed, the Save/Load method content is copied and kept for saving/loading older versions, then if statement is added and the code can be adapted. Very similar to using class versioning system but rather to use same number for all classes.

C++

void Save(MyArchive& ar, const int classVersion) const
{
    if(ar.GetFileVersion() <= APP_FILE_VERSION_1_0)
    {
        // file version 1.0
        ar << m_Data1;
    }
    else if(ar.GetFileVersion() <= APP_FILE_VERSION_2_0)
    {
        // file version 2.0
        ar << m_Data1;
        ar << m_Data2; // new to 2.0 version
    }
    else
    {
        // most recent file version
        ar << m_Data1;
        ar << m_Data2; // new to 2.0 version
        ar << m_Data3; // new to 3.0 version
    }
}

Just note that the last part is active also for 4.0, 5.0, etc. versions. If e.g. class didn't change between 3.0-5.0 version, but in 6.0 it changes, the Save method would look like this:

C++

void Save(MyArchive& ar, const int classVersion) const
{
    if(ar.GetFileVersion() <= APP_FILE_VERSION_1_0)
    {
        // file version 1.0
        ar << m_Data1;
    }
    else if(ar.GetFileVersion() <= APP_FILE_VERSION_2_0)
    {
        // file version 1.1 - 2.0
        ar << m_Data1;
        ar << m_Data2; // new to 2.0 version
    }
    else if(ar.GetFileVersion() <= APP_FILE_VERSION_5_0)
    {
        // file versions 2.1 - 5.0
        ar << m_Data1;
        ar << m_Data2; // new to 2.0 version
        ar << m_Data3; // new to 3.0 version
    }
    else
    {
        // most recent file version
        ar << m_Data1;
        ar << m_Data2; // new to 2.0 version
        ar << m_Data3; // new to 3.0 version
        ar << m_Data4; // new to 6.0 version
    }
}

Loading will look similar, but it is very important to not forget about initializing new members when the class is loaded from older archives.

C++

void Load(MyArchive& ar, const int classVersion)
{
    if(ar.GetFileVersion() <= APP_FILE_VERSION_1_0)
    {
        // file version 1.0
        ar >> m_Data1;
        m_Data2 = ...;
        m_Data3 = ...;
        m_Data4 = ...;
    }
    else if(ar.GetFileVersion() <= APP_FILE_VERSION_2_0)
    {
        // file version 1.1 - 2.0
        ar >> m_Data1;
        ar >> m_Data2; // new to 2.0 version
        m_Data3 = ...;
        m_Data4 = ...;
    }
    else if(ar.GetFileVersion() <= APP_FILE_VERSION_5_0)
    {
        // file versions 2.1 - 5.0
        ar >> m_Data1;
        ar >> m_Data2; // new to 2.0 version
        ar >> m_Data3; // new to 3.0 version
        m_Data4 = ...;
    }
    else
    {
        // most recent file version
        ar >> m_Data1;
        ar >> m_Data2; // new to 2.0 version
        ar >> m_Data3; // new to 3.0 version
        ar >> m_Data4; // new to 6.0 version
    }
}

So if new member is added, it is necessary to add it to all sections. I don't recommend to do any optimization here like trying to avoid duplicate code:

C++

void Load(MyArchive& ar, const int classVersion)
{
    ar >> m_Data1;
    if(ar.GetFileVersion() >= APP_FILE_VERSION_2_0)
        ar >> m_Data2; // new to 2.0 version
    else
        m_Data2 = ...;
    if(ar.GetFileVersion() >= APP_FILE_VERSION_5_0)
        ar >> m_Data3; // new to 3.0 version
    else
        m_Data3 = ...;
    if(ar.GetFileVersion() >= APP_FILE_VERSION_6_0)
        ar >> m_Data4; // new to 6.0 version
    else
        m_Data4 = ...;
}

At first it is mess and believe me that migration can get ugly in meantime and it is very easy to make a mistake in the code like this. The most important rule is that if application is already released to public and files should be possible to export or import from that released appplication, it is better to keep the code unchanged. This is the reason why it is better to copy it, wrap into condition and alter only copied code for the current application version.

Complex migration support

The data migration is used when classes are changed and it is still necessary to be able to load files already created (and possibly already shipped to customers). The simples type of migration is when classes changed directly its members - like the data type of a member is changed, added or removed. For this simple purpose the library has class versioning. If class version is then increased by every changed, it is possible to verify the class version during loading and adapt the data accordingly. However the real examples are more complicated and it is not possible always to migrate data of the class with Load method/function. During the life-cycles of an application a single class can grow rapidly and it then can be necessary to split a class to 2 types. Or opposite change - merge 2 types to a single type. Another type of migration can be a data computation based of several objects. During serialization is not possible to define in which order the object are stored/loaded and hence reading data from a member pointer of class can lead to problems. The library has support for executing more complex migration after all the objects are loaded. If such support is required, the input archive (implementation of InArchive) must used withing a macro ENABLE_ARCHIVE_MIGRATION. By using this macro, all classes registered with REGISTER_CLASS_SERIALIZATION must then declare additional interface:

C++

As stand-alone functions:
    void GetInputObjects(ArchiveT& ar, T& obj, LoadedPointerInfoArray& inputObjects);
    bool PostLoad(ArchiveT& ar, T& obj, const int classVersion);
or member methods:
    void GetInputObjects(ArchiveT& ar, LoadedPointerInfoArray& inputObjects);
    bool PostLoad(ArchiveT& ar, const int classVersion);

Migration should be placed inside PostLoad methods/functions. GetInputObjects defined order in which PostLoad are called. Let's consider following class:

C++

class A
{
    ...
public:
    B* m_pB;
    C* m_pC;
    D* m_pD; // added in classVersion 1
};
void Save(MyOutArchive& ar, const A& a, const int classVersion)
{
    ar << a.m_pB << a.m_pC;
    if(classVersion >= 1)
        ar << a.m_pD;    
}
void Load(MyInArchive& ar, A& a, const int classVersion)
{
    ar >> a.m_pB >> a.m_pC;
    if(classVersion >= 1)
        ar >> a.m_pD;
    else
    {
        // m_pD should be initialize, but how?
        ...
    }
}

Regardless of bad design (the a.m_pB and a.m_pC should be ensured that they are nullptr before loading or Load() function should release pointers if class A is owner) just consider that class A needs to extract some data from both B and C pointers and initialize m_pD for classVersion == 0. In Load() it is not sure that B and C are already loaded - if B or C would point back to A and serialization invokes first storing of pointer to B, then during loading A, pointer B is just partially initialized - constructor of B was already called, but B::Load() didn't finish yet and hence not all members are already loaded from the archive. For such cases, the PostLoad is required. It is necessary to tell input archive, that we would like to receive A::PostLoad after the B and C are initialized. GetInputObjects() is exactly the function for this purpose:

C++

void GetInputObjects(MyInArchive& ar, A& a, Serialization::LoadedPointerInfoArray& inputObjects)
{
	Serialization::AddInputObject(ar, inputObjects, *m_pB);
	Serialization::AddInputObject(ar, inputObjects, *m_pC);
}

Filling of LoadedPointerInfoArray is not possible directly and library rather provides helper template function AddInputObject to create an item which is then stored to array. This ensures that PostLoad is first called on B and C (order is not defined unless B or C specify another class as its input through its GetInputObjects) and then on A. So the A's PostLoad implementation then can correctly initialize m_pD member pointer. The notifications are called when MigrationManager::Execute is called. If classes are inherited from other classes, in both PostLoad and GetInputObjects methods, it is necessary to call also parent methods. Again similar as with Load/Save, the parent methods/functions should be called through BaseObject template. Library's InArchiveMigration has member methods which accept BaseObject reference and call corresponding notification.

If it is necessary to perform even deeper hierarchical migration involving more objects which are not directly related - e.g. migrate a whole array of objects, MigrationManager provides concept of packets and migrators. Using this support, all classes should have access to MigrationManager. The library provides a template InArchiveMigration which has member method GetMigrationManager() to provide this access. Just change base class of your custom archive from InArchive to InArchiveMigration.

Packets are small classes which just hold data. During serialization the Load and PostLoad funtions can collect objects and other data and store them to packet(s). Later when migrators are executed, those packets can be accessed and data from them processed. Packets are managed by MigrationManager's methods RegisterPacket, UnregisterPacket and GetPacket. The library contains helper template class Serialization::PacketImpl which binds a packet with specified key type. Opposite to serializable classes, the packets with different key types can be registered to MigrationManager (as they are never stored to archive, it doesn't play a huge role). It is confortable to implement packets with a static getter to extract packet from the archive:

C++

class MyDataPacket : public Serialization::PacketImpl<std::string>
{
public:
    MyDataPacket()
    : Serialization::PacketImpl<std::string>("MyDataPacket_key")
    {
    }

    MyDataPacket& Get(MyArchive& ar)
    {
        MyDataPacket ref; // just to extract a key
        auto* pPacketRawPtr = static_cast<MyDataPacket*>(ar.GetMigrationManager().GetPacket(ref.GetKey()));
        if(pPacketRawPtr == nullptr)
        {
            auto pPacketPtr = std::make_unique<MyDataPacket>();
            pPacketRawPtr = pPacketPtr.get();
            if(!ar.GetMigrationManager().RegisterPacket(std::move(pPacketPtr)))
            {
                // or whatever exception you prefer
                throw std::runtime_error("Packet was not registered!!");
            }
        }
        return *pPacketRawPtr;
}
private:
    ... // data members
};

Disadvantage is to have a type per packet so the same type of packet cannot be registered twice. On the other hand reusing packets needs to maintain some kind of list of keys.

Migrators are classes for doing a complex migration. It is the last stage of migration after the whole document is already loaded and migrated/initialized in PostLoad methods. The migrators are registered also during Load/PostLoad calls and usually together with creating packets. The order in which the migrators are called is not possible to define and it is not a good practice to create a dependencies there. If there dependencies, it is better to merge 2 or more migrators together and solve dependencies rather within a single migrator.

MigrationManager description

bool Execute(BaseInArchive& ar)

Calls GetInputObjects on all loaded objects (pointers) from archive, builds dependency graph and calls PostLoad notifications in requested order.

template<typename T> void AddExternObject(BaseInArchive& ar, T& inputObject)

Registers extern pointer to the archive. During loading the archive doesn't store addresses of loaded references because those can be invalidate later by loaded other objects. Just consider stored objects in std::vector. If new object is inserted, the storage is reallocated. The archive cannot track changes like this. It is up to the user to register such objects to MigrationManager if he wants that the notifications are called on the objects. Note that all objects added through GetInputObjects will receive also PostLoad notification even if they were loaded as references from archive. So the AddExternObject should be called mainly with top-level references which are not input objects of any other objects.

void AddMigrator(MigratorPtr pMigrator)

Register a migrator with MigrationManager. The type must be unique. Trying to register 2 migrators of the same types will fail.

Migrator* FindMigrator(const type_info& migratorInfo) const

Find a migrator by type. Returns nullptr if not found.

bool RegisterPacket(PacketPtr pPacket)

Register a packet with MigrationManager. If the packet holds a key which was used already by another packet, registration fails.

void UnregisterPacket(const TypeInfo::Key& key)

Unregister already registered packet. If key was not used by any packet, the method doesn't do anything.

Packet* GetPacket(const TypeInfo::Key& key) const

Find a packet by key. Returns nullptr if packet was not found.

Compilation errors, template mess, etc.

Initially I wanted to have a library which is possible to integrate easily. For me the easy integration means that it is really easy to solve compilation errors if a mistake is made during integration. I think I failed here and the compilation errors are strange as those popping when boost serialization library is tried to integrate. The main problem is with templates themselves. If a template is specialized, it is completelly a new class where it is possible to write whatever a programmer like. The library of course expects to have some features on those classes like static functions with certain interface, but I didn't find a way how to verify that the interface of the class is ok. So then compiler produces a strange error about the call. Other kind of errors come from visibility of templates. If the library expect a specialization of a template, your specialization must be visible at this place. I added as many as possible static_asserts to the code to make clear what is going wrong but it was not always possible.

Implementation notes

The library uses boost only in ObjectDependencyGraph.cpp for sorting dependency (using boost graph library). The unit tests are also created using boost (boost framework library), but they are not needed for building the library itself.

History

21-06-2016 Initial release

27-06-2016 Added PointerOwnershipTests and fixed bug when loading 2 nullpointers of the same type as std::unique_ptr

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)