Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / C++

A practical guide to C++ serialization

4.83/5 (11 votes)
14 Jul 2011CPOL7 min read 94K  
The best way to understand how to serialize with boost is to walk through increasingly complex serialization scenarios.

In a nutshell, serialization consists of writing data and objects on a support (a file, a buffer, a socket) so that they can be reconstructed later in the memory of the same or another computing host. The reconstruction process is also known as deserialization.

Serializing a primitive type like a bool, int, or float is trivial: just write the data as it is (assuming that no compression is used). Serializing a pointer is different: the object it points to must be serialized first. That way deserializing the pointer simply consists of setting its value to the memory address at which the object has been reconstructed.

We can distinguish three levels of complexity in serialization, depending on how complex the pointer (and reference) graph is:

  1. The pointer graph is a forest (i.e., a set of trees). Data can simply be serialized bottom up with a depth first traversal of the trees.
  2. The pointer graph is a directed acyclic graph (DAG), i.e., a graph without loop. We can still serialize the data bottom up, making sure we write and restore shared data only once.
  3. The pointer graph is a general graph, i.e., it may have loops. We need to write and restore data with forward references so that loops are handled properly.

Image 1

Pointer graph as a tree, a DAG, and with loops

It is always an option to serialize objects using your own customized code. However, serialization is much more complex than a simple pretty-print method. One would like serialization to support the following features:

  1. Serialization should be able to handle any pointer graph (i.e., with loops).
  2. Serializing a pointer or a reference should automatically trigger the serialization of the referred object.
  3. Serializing an entire data model can require a lot of code – from simple scalar fields (bool, int, float), to containers (vector, list, hash table, etc.), to intricate data structures (graph, quad-tree, sparse matrices, etc). One would like templates that carry most of the burden.
  4. The save and load functions must always be in sync: if the ‘save’ function is modified, the ‘load’ function must be changed appropriately. One would like that process to be automated as much as possible.
  5. One should have a way of serializing objects without changing their .hpp files – this is known as non-intrusive serialization. The reason is that in many cases, one does not want (or one cannot) change the source files of existing libraries.
  6. Serialization needs to support versioning. As objects evolve, data members are added or removed, and it is desirable to be back compatible – meaning, one can still deserialize archives from older versions into the most recent data model.
  7. Serialization should be cross-platform compatible (32 and 64 bit machines, Windows, Linux, Solaris, etc.).

The boost library provides serialization that meets all the requirements above, and more:

  • It is extremely efficient, it supports versioning, and it automatically serializes STL containers.
  • Serialization (the save function) and deserialization (the load function) are expressed with one single template, which reduces the size of the code and resolves the synchronization problem.
  • With a little bit of help, boost serialization is also 32 and 64 bit compatible, which means that a database serialized on a 32 bit machine can be read on a 64 bit machine and conversely.
  • Also, boost serialization (respectively deserialization) takes an output (respectively input) argument that is very similar to a std::ostream (respectively std::istream), meaning that it can be a file on a disk, a buffer, or a socket. You can literally serialize your data over a network.

The best way to understand how to serialize with boost is to walk through increasingly complex serialization scenarios.

Basic serialization

The code for serialization, as well as an example that saves and restores simple objects, is given below.

C++
#pragma once
// File obj.hpp

// Forward declaration of class boost::serialization::access
namespace boost {
namespace serialization {
class access;
}
}

class Obj {
public:
  // Serialization expects the object to have a default constructor
  Obj() : d1_(-1), d2_(false) {}
  Obj(int d1, bool d2) : d1_(d1), d2_(d2) {}
  bool operator==(const Obj& o) const {
    return d1_ == o.d1_ && d2_ == o.d2_;
  }
private:
  int d1_;
  bool d2_;

  // Allow serialization to access non-public data members.
  friend class boost::serialization::access;

  template<typename Archive>
  void serialize(Archive& ar, const unsigned version) {
    ar & d1_ & d2_; // Simply serialize the data members of Obj
  }
};

The template ‘serialize’ defines both the save and load. This is achieved because the operator & will be defined as << (respectively >>) for an output (respectively input) archive. Note the friend declaration to allow the save/load template to access the private data members of the objects. Also note that serialization expects the object to have a default constructor (which can be private).

C++
#include "obj.hpp"
#include <assert.h>
#include <fstream>
#include <boost/archive/text_iarchive.hpp>
#include <boost/archive/text_oarchive.hpp>

int main() {
  const char* fileName = "saved.txt";

  // Create some objects
  const Obj o1(-2, false);
  const Obj o2;
  const Obj o3(21, true);
  const Obj* const p1 = &o1;

  // Save data
  {
    // Create an output archive
    std::ofstream ofs(fileName);
    boost::archive::text_oarchive ar(ofs);

   // Write data
    ar & o1 & o2 & o3 & p1;
  }

  // Restore data
  Obj restored_o1;
  Obj restored_o2;
  Obj restored_o3;
  Obj* restored_p1;
  {
    // Create and input archive
    std::ifstream ifs(fileName);
    boost::archive::text_iarchive ar(ifs);

    // Load data
    ar & restored_o1 & restored_o2 & restored_o3 & restored_p1;
  }

  // Make sure we restored the data exactly as it was saved
  assert(restored_o1 == o1);
  assert(restored_o2 == o2);
  assert(restored_o3 == o3);
  assert(restored_p1 != p1);
  assert(restored_p1 == &restored_o1);

  return 0;
}

In main.cpp, we first include the files declaring the input and output text archives, where objects will be loaded from and saved to, respectively. We create an output archive (here, a file on a disk), and write three instances of class Obj, as well as a pointer to one of the instances. We then read them back and make sure we restore the data as they were. Note how the restored pointer restored_p1 points to the restored object restored_o1.

More on pointer serialization

Whenever we call serialization on a pointer (or reference), this triggers the serialization of the object it points to (or refers to) whenever necessary. So we do not need to explicitly serialize pointed objects as boost serialization will make sure the appropriate objects reached in the pointer's graph are serialized.

For instance, the code below shows that serializing the pointer p1 triggers the serialization of o1, the object it points to. When restoring the pointer restored_p1, we automatically create a clone of the object o1.

C++
#include "obj.hpp"
#include <assert.h>
#include <fstream>
#include <boost/archive/text_iarchive.hpp>
#include <boost/archive/text_oarchive.hpp>

int main()
{
  const char* fileName = "saved.txt";

  // Create one object o1.
  const Obj o1(-2, false);
  const Obj* const p1 = &o1;

  // Save data
  {
    // Create an output archive
    std::ofstream ofs(fileName);
    boost::archive::text_oarchive ar(ofs);
    // Save only the pointer. This will trigger serialization
    // of the object it points too, i.e., o1.
    ar & p1;
  }

  // Restore data
  Obj* restored_p1;
  {
    // Create and input archive
    std::ifstream ifs(fileName);
    boost::archive::text_iarchive ar(ifs);
    // Load
    ar & restored_p1;
  }

  // Make sure we read exactly what we saved.
  assert(restored_p1 != p1);
  assert(*restored_p1 == o1);

  return 0;
}

When deserializing a pointer, the object it points to will be automatically deserialized if this object has not been deserialized yet. This means that one should not attempt to deserialize an object after a pointer to this object has been deserialized. The reason is that once the pointer deserialization has forced the object deserialization, one cannot rebuild this object at a different address.

C++
#include "obj.hpp"
#include <fstream>
#include <boost/archive/text_iarchive.hpp>
#include <boost/archive/text_oarchive.hpp>

int main()
{
  const char* fileName = "saved.txt";
  std::ofstream ofs(fileName);

  // Create one object o1 and a pointer p1 to that object.
  const Obj o1(-2, false);
  const Obj* const p1 = &o1;

  // Serialize object, then pointer.
  // This works fine: after the object is deserialized, we can
  // deserialize the pointer by assigning it to the object’s address.
  {
    boost::archive::text_oarchive ar(ofs);
    ar & o1 & p1;
  }

  // Serialize pointer, then object.
  // This does not work: once p1 has been serialized, the object
  // has already been deserialized and its address cannot change.
  // This will throw an instance of 'boost::archive::archive_exception'
  // at runtime.
  {
    boost::archive::text_oarchive ar(ofs);
    ar & p1 & o1;
  }

  return 0;
}

In the example above, the second serialization will result in a runtime error:

C++
ocoudert@MyMacBookPro $ a.out
terminate called after throwing an instance of 'boost::archive::archive_exception'
    what(): pointer conflict
Abort trap
coudert@MyMacBookPro $

This means that when pointers need to be serialized, we should never explicitly serialize the objects they point to.

Explicit save and load function definitions

We need an explicit definition of the save and load functions whenever they are not fully symmetric. This is typical when versioning is involved. Note the use of the macro BOOST_SERIALIZATION_SPLIT_MEMBER(), which is responsible for calling save/load when using an output/input archive.

C++
#pragma once

#include <boost/serialization/split_member.hpp>

class Obj {
public:
  Obj() : d1_(-1), d2_(false) {}
  Obj(int d1, bool d2) : d1_(d1), d2_(d2) {}
  bool operator==(const Obj& o) const {
    return d1_ == o.d1_ && d2_ == o.d2_;
  }

private:
  int d1_;
  bool d2_;

  friend class boost::serialization::access;

  template<class Archive>
  void save(Archive & ar, const unsigned int version) const {
    ar & d1_ & d2_;
  }

  template<class Archive>
  void load(Archive & ar, const unsigned int version) {
    ar & d1_ & d2_;
  }

  BOOST_SERIALIZATION_SPLIT_MEMBER()
};

Serialization of C-strings

A C-string cannot be directly serialized because it assumes a specific interpretation of a char*, namely an array of char terminated by a null character (‘\0’). Thus we need to explicitly serialize a C-string. The class below is a simple helper to serialize C-strings (note that this can be optimized by avoiding the construction of the sdt::string).

C++
#pragma once
// File SerializeCStringHelper.hpp

#include <string>
#include <boost/serialization/string.hpp>
#include <boost/serialization/split_member.hpp>

class SerializeCStringHelper {
public:
  SerializeCStringHelper(char*& s) : s_(s) {}
  SerializeCStringHelper(const char*& s) : s_(const_cast<char*&>(s)) {}

private:

  friend class boost::serialization::access;

  template<class Archive>
  void save(Archive& ar, const unsigned version) const {
    bool isNull = (s_ == 0);
    ar & isNull;
    if (!isNull) {
      std::string s(s_);
      ar & s;
    }
  }

  template<class Archive>
  void load(Archive& ar, const unsigned version) {
    bool isNull;
    ar & isNull;
    if (!isNull) {
      std::string s;
      ar & s;
      s_ = strdup(s.c_str());
    } else {
      s_ = 0;
    }
  }

  BOOST_SERIALIZATION_SPLIT_MEMBER();

private:
  char*& s_;
};

A simple example of its usage is as follows.

C++
#include "SerializeCStringHelper.hpp"
#include <assert.h>
#include <fstream>
#include <boost/archive/text_iarchive.hpp>
#include <boost/archive/text_oarchive.hpp>

int main()
{
  const char* fileName = "saved.txt";
  const char* str = "This is an example a C-string";

  // Save data
  {
    // Create an output archive
    std::ofstream ofs(fileName);

    boost::archive::text_oarchive ar(ofs);
    // Save
    SerializeCStringHelper helper(str);
    ar & helper;
  }

  // Restore data
  char* restored_str;
  {
    // Create and input archive
    std::ifstream ifs(fileName);
    boost::archive::text_iarchive ar(ifs);

    // Load
    SerializeCStringHelper helper(restored_str);
    ar & helper;
  }

  // Make sure we read exactly what we saved
  assert(restored_str!= str);
  assert(strcmp(restored_str, str) == 0);

  return 0;
}

Non-intrusive serialization

So far the serialization code is added in the class definition. A non-intrusive serialization, outside of the class, might be preferable. For instance, we would like to serialize a class from a library without altering the library’s hpp file. This is easy when the data members are public:

C++
#pragma once

class Obj {
public:
  Obj() : d1_(-1), d2_(false) {}
  Obj(int d1, bool d2) : d1_(d1), d2_(d2) {}
  bool operator==(const Obj& o) const {
    return d1_ == o.d1_ && d2_ == o.d2_;
  }

public:
  int d1_;
  bool d2_;
};

namespace boost {
namespace serialization {

template<typename Archive>
void serialize(Archive& ar, Obj& o, const unsigned int version) {
  ar & o.d1_ & o.d2_;
}

} // namespace serialization
} // namespace boost

If we want to protect the data members, the code is a bit more complicated because the serialization template needs to be declared as a friend. This requires a forward declaration of the template.

C++
#pragma once

//// Declaration of the template
class Obj;

namespace boost {
namespace serialization {

template<typename Archive>
void serialize(Archive& ar, Obj& o, const unsigned int version);

} // namespace serialization
} // namespace boost

//// Definition of the class
class Obj {
public
  Obj() : d1_(-1), d2_(false) {}
  Obj(int d1, bool d2) : d1_(d1), d2_(d2) {}
  bool operator==(const Obj& o) const {
    return d1_ == o.d1_ && d2_ == o.d2_;
}

private:
  int d1_;
  bool d2_;

  // Allow serialization to access data members.
  template<typename Archive> friend
  void boost::serialization::serialize(Archive& ar, Obj& o, 
                                       const unsigned int version);
};

//// Definition of the template
namespace boost {
namespace serialization {

template<typename Archive>
void serialize(Archive& ar, Obj& o, const unsigned int version) {
ar & o.d1_ & o.d2_;
}

} // namespace serialization
} // namespace boost

Non-intrusive explicit save and load function definitions

This combines the two previous serialization styles, except that the include file and macro are different. For the sake of simplicity, we give the version for public data members.

C++
#pragma once

#include <boost/serialization/split_free.hpp>

class Obj {
public:
  Obj() : d1_(-1), d2_(false) {}
  Obj(int d1, bool d2) : d1_(d1), d2_(d2) {}
  bool operator==(const Obj& o) const {
    return d1_ == o.d1_ && d2_ == o.d2_;
  }

public:
  int d1_;
  bool d2_;
};

namespace boost {
namespace serialization {

template<class Archive>
void save(Archive & ar, const Obj& o, const unsigned int version) {
  ar & o.d1_ & o.d2_;
}

template<class Archive>
void load(Archive & ar, Obj& o, const unsigned int version) {
  ar & o.d1_ & o.d2_;
}

} // namespace serialization
} // namespace boost

BOOST_SERIALIZATION_SPLIT_FREE(Obj)

Serialization of STL containers

The boost library comes with templates to automatically serialize STL containers, as well as some STL objects (e.g., std::string). Instead of saving/loading a vector with the following code:

C++
template<typename Archive>
void save(Archive& ar, const std::vector<Obj>& objs, const unsigned version) {
  ar << objs.size();
  for (size_t i = 0; i < objs.size(); ++i) {
    ar << objs[i];
  }
}

template<typename Archive>
void load(Archive& ar, std::vector<Obj>& objs, const unsigned version) {
  size_t size;
  ar >> size;
  objs.resize(size);
  for (size_t i = 0; i < size; ++i) {
    ar >> objs[i];
  }
}

One simply writes:

C++
#include <boost/serialization/vector.hpp>

template<typename Archive>
void serialize(Archive& ar, std::vector<Obj>& objs, const unsigned version) {
  ar & objs;
}

All the STL containers are supported using the appropriate include files:

C++
#include <boost/serialization/array.hpp>
#include <boost/serialization/vector.hpp>
#include <boost/serialization/hash_map.hpp>
#include <boost/serialization/hash_set.hpp>
#include <boost/serialization/list.hpp>
#include <boost/serialization/slist.hpp>
#include <boost/serialization/map.hpp>
#include <boost/serialization/set.hpp>
#include <boost/serialization/bitset.hpp>
#include <boost/serialization/string.hpp>

Serialization of base class

When a class inherits from another, the base class needs to be serialized as well.

C++
#include <boost/serialization/base_object.hpp>

class Base {
public:
  Base() : c_('\0') {}
  Base(char c) : c_(c) {}
  bool operator==(const Base& o) const { return c_ == o.c_; }

private:
  char c_;

  friend class boost::serialization::access;

  template <typename Archive>
  void serialize(Archive& ar, const unsigned version) {
    ar & c_;
  }
};

class Obj : public Base {
private:
  typedef Base _Super;
public:
  Obj() : _Super(), d1_(-1), d2_(false) {}
  Obj(int d1, bool d2) : _Super('a'), d1_(d1), d2_(d2) {}
  bool operator==(const Obj& o) const {
    return _Super::operator==(o) && d1_ == o.d1_ && d2_ == o.d2_;
  }

private:
  int d1_;
  bool d2_;

  friend class boost::serialization::access;

  template <typename Archive>
  void serialize(Archive& ar, const unsigned version) {
    ar & boost::serialization::base_object<_Super>(*this);
    ar & d1_ & d2_;
  }
};

Versioning

We want to maintain back-compatibility when the class Obj evolves. For instance, if a new data member ‘ID_’ is added, we want to read an old archive and build a new Obj with the missing data member taking the default value.

C++
#pragma once

#include <boost/serialization/split_member.hpp>
#include <boost/serialization/version.hpp>

class Obj {
public:
  Obj() : d1_(-1), d2_(false), ID_(0) {}
  Obj(int d1, bool d2, unsigned ID id) : d1_(d1), d2_(d2), ID_(id) {}
  bool operator==(const Obj& o) const {
    return d1_ == o.d1_ && d2_ == o.d2_ && ID_ == o.ID_;
  }

private:
  int d1_;
  bool d2_;
  unsigned ID_;

  friend class boost::serialization::access;

  template<class Archive>
  void save(Archive & ar, const unsigned int version) const {
    ar & d1_ & d2_ & ID_;
  }

  template<class Archive>
  void load(Archive & ar, const unsigned int version) {
    ar & d1_ & d2_;
    // If archive’s version is 0 (i.e., is old), ID_ keeps
    // its default value from the new data model,
    // else we read ID_’s value from the archive.
    if (version > 0) {
      ar & ID_;
    }
  }

  BOOST_SERIALIZATION_SPLIT_MEMBER()

};

Serialization of const data or objects

Attempting to serialize a const data or object triggers a long trail of error messages, which includes something that looks like:

C++
[snip]

/opt/local/include/boost/archive/detail/check.hpp:162: error:
  invalid application of ‘sizeof’ to incomplete 
    type ‘boost::STATIC_ASSERTION_FAILURE<false>‘

[snip]

/opt/local/include/boost/archive/basic_text_iprimitive.hpp:88: error:
  ambiguous overload for ‘operator>>in
  ‘((boost::archive::basic_text_iprimitive<std::basic_istream<char,
       std::char_traits<char> > >*)this)->
             boost::archive::basic_text_iprimitive<std::basic_istream<char,
         std::char_traits<char> > >::is >> t’

This means that the input archive expects the recipient of the data to be non-const. Thus const data members must be const_cast<>()’ed to be serialized. For example:

C++
#pragma once

#include <boost/archive/text_iarchive.hpp>
#include <boost/archive/text_oarchive.hpp>

class Obj {
public:
  Obj() : d1_(-1), d2_(false) {}
  Obj(int d1, bool d2) : d1_(d1), d2_(d2) {}

private:
  const int d1_;
  bool d2_;

  // Allow serialization to access data members.
  friend class boost::serialization::access;

  template<typename A>
  void serialize(A& ar, const unsigned version) {
    ar & const_cast<int&>(d1_) & d2_;
  }
};

Text, XML, and binary archives

The text archive is an ASCII file that is somewhat human readable. There are other archive types available in boost/archive/*.hpp, e.g.:

C++
// Text archive that defines boost::archive::text_oarchive
// and boost::archive::text_iarchive
#include <boost/archive/text_iarchive.hpp>
#include <boost/archive/text_oarchive.hpp>

// XML archive that defines boost::archive::xml_oarchive
// and boost::archive::xml_iarchive
#include <boost/archive/xml_oarchive.hpp>
#include <boost/archive/xml_iarchive.hpp>

// XML archive which uses wide characters (use for UTF-8 output ),
// defines boost::archive::xml_woarchive
// and boost::archive::xml_wiarchive
#include <boost/archive/xml_woarchive.hpp>
#include <boost/archive/xml_wiarchive.hpp>

// Binary archive that defines boost::archive::binary_oarchive
// and boost::archive::binary_iarchive
#include <boost/archive/binary_oarchive.hpp>
#include <boost/archive/binary_iarchive.hpp>

The text and XML archives are portable across 32 and 64 bit platforms.

Having a binary archive that is portable between 32 and 64 bits is not trivial, because C++ does not specify the size of primitive types. For instance, a long is usually 4 bytes on a 32 bit machine, and 8 bytes on a 64 bit machine. In practice though it is pretty portable – there is a non-official version for a portable binary archive.

Related posts

  1. How to write abstract iterators in C++
  2. How to make software deterministic

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)