Table of Contents
Introduction
The world of data structures is a vast one. And when we need to write and read those enormous blobs of data to or from the disk, memory, or sockets, MFC serialization is a powerful tool in every programmer’s tool box.
Background
Serialization was part of the MFC (Microsoft Foundation Classes) library since its very first introduction, but I felt it has never received its proper dues because it was largely undocumented. SDK samples that demonstrated the serialization were very limited and covered serialization of the plain old data and CObject derived classes and collections. However with the right extensions we can serialize any data structure in existence, STL collections, user defined collections, any collections (including flat C style arrays). It is undoubtedly is the most powerful, efficient, and blazingly fast way to store and retrieve hierarchical data to and from the disk, memory, or sockets. MFC Serialization supports read write to the disk, memory, and sockets. Writing to the memory is very useful for inter process communications such as clipboard cut/copy/paste operations and writing to sockets is useful when networking with remote machines. I will cover in this article plain old MFC serialization with MFC provided classes, how to serialize STL collections, how to serialize plain Windows SDK data structures, how to serialize C style arrays, how to serialize to process and shared memory and how to serialize to and from sockets. Also I will demonstrate how to use MFC Serialization with or without Document/View architecture such as inside the console applications and TCP/IP servers.
What is Serialization
MSDN documentation gives us the best description:
Serialization is the process of converting an object into a stream of bytes in order to store the object or transmit it to memory, a database, or a file. Its main purpose is to save the state of an object in order to be able to recreate it when needed. The reverse process is called deserialization.
MFC serialization implements binary and text serializations. Binary handled via shift operators (<<, >>) and WriteObject / ReadObject functions. Text serialization is handled with ReadString / WriteString functions.
MFC serialization provides serialization of C++ CObject derived classes with versioning. With the right extensions it can provide serialization for non CObject derived classes. However the versioning in those cases need to be handled manually.
In the heart of the MFC serialization lays the CArchive
object. CArchive
has no base class and it is tightly coupled to work with CFile
and CFile
derived classes, such as CSocketFile
, CSharedFile
, or CMemFile
. CArchive
internally encapsulates an array of bytes that are dynamically grown as needed and are written or read to or from the CFile
or CFile
derived object.
CFile
– provides serialization to or from disk CMemFile
– provides serialization to or from process memory CSharedFile
– provides serialization to or from processes shared memory which is accessible by the other processes CSocketFile
– provides serialization to or from CSocket
for network communications - You can also serialize over Named pipes, RPC and other Windows inter process communication mechanisms
CArchive
provides serialization of plain old data and C++ CObject derived classes with versioning. To make a CObject
class serializable all you need is to add a macro:
DECLARE_SERIAL(CRoot)
IMPLEMENT_SERIAL(CRoot, CObject, VERSIONABLE_SCHEMA | 1)
Those two macros are adding global extraction operator >> (which calls to CArchive::ReadObject
), static function CreateObject
, and CRuntimeClass
member variable to your class. CRuntimeClass
structure has m_lpszClassName
member which stores the text representation of your class name. CRuntimeClass
also has m_wSchema
that holds version information of your class.
These macros internally expand to the following code
public:
static CRuntimeClass classCRoot;
virtual CRuntimeClass* GetRuntimeClass() const;
static CObject* PASCAL CreateObject();
AFX_API friend CArchive& AFXAPI operator >> (CArchive& ar, CRoot* &pOb);
CObject* PASCAL CRoot::CreateObject()
{
return new CRoot;
}
extern AFX_CLASSINIT _init_CRoot;
AFX_CLASSINIT _init_CRoot (RUNTIME_CLASS(CRoot));
CArchive& AFXAPI operator >> (CArchive& ar, CRoot * &pOb)
{
pOb = (CRoot *)ar.ReadObject(RUNTIME_CLASS(CRoot));
return ar;
}
AFX_COMDAT CRuntimeClass CRoot::classCRoot =
{
"CRoot",
sizeof(class CRoot),
VERSIONABLE_SCHEMA | 1,
CRoot::CreateObject,
RUNTIME_CLASS(CObject),
NULL,
&_init_CRoot
};
CRuntimeClass* CRoot::GetRuntimeClass() const
{
return RUNTIME_CLASS(CRoot);
}
There is no insertion operator << because CArchive
stores CObject
derived class through the base class pointer declared in the global namespace.
CArchive& AFXAPI operator<<(CArchive& ar, const CObject* pOb);
Plain old data is handled rather straightforward. Here is an example of reading and writing float data type:
CArchive& CArchive::operator<<(float f)
{
if(!IsStoring())
AfxThrowArchiveException(CArchiveException::readOnly,m_strFileName);
if (m_lpBufCur + sizeof(float) > m_lpBufMax)
Flush();
*(UNALIGNED float*)m_lpBufCur = f;
m_lpBufCur += sizeof(float);
return *this;
}
Following code is loading code for the float data type
CArchive& CArchive::operator>>(float& f)
{
if(!IsLoading())
AfxThrowArchiveException(CArchiveException::writeOnly,m_strFileName);
if (m_lpBufCur + sizeof(float) > m_lpBufMax)
FillBuffer(UINT(sizeof(float) - (m_lpBufMax - m_lpBufCur)));
f = *(UNALIGNED float*)m_lpBufCur;
m_lpBufCur += sizeof(float);
return *this;
}
Reading and writing CObject
derived classes a bit bore complex. And it will be covered in the next sections.
Because all data is stored in a continuous byte buffer it must be read in the exact same order as it was stored. Failure to do so will result in CArchiveException
thrown during load.
To simply put it you cannot call GetObjectSchema
more than once per object load for the following reason.
UINT CArchive::GetObjectSchema()
{
UINT nResult = m_nObjectSchema;
m_nObjectSchema = (UINT)-1;
return nResult;
}
As to why this is so? My best guess a legacy issues. Member variable CArchive::m_nObjectSchema
is very different from CRuntimeClass::m_wSchema
in a way that the CArchive
object schema is read from the file which can potentially contain many objects with many schemas. It holds schema of an object which is currently being read from a file. Think about it. When you de serialize object such as in the following example (Hypothetically m_nObjectSchema
left alone):
void CMyClass::Serialize(CArchive& ar)
{
if (ar.IsStoring())
{
}
else
{
UINT nSchema = ar.GetObjectSchema();
switch(nSchema)
{
case 1:
ar >> m_pObject1;
ar >> m_pObject2;
ar >> m_pObject3;
ar >> m_pObject4;
}
}
if(ar.IsLoading())
{
UINT nSchema = ar.GetObjectSchema();
}
}
The object schema in the above example has been changed 4 times by the time you finished the loading section of the code. My guess is to eliminate subtle erroneous behavior the MFC framework decided to cut it short at the very source instead of programmers scratching their head as to why their precious data was hosed away.
The GetObjectSchema
can only be called once per object load because framework forcefully resets it to (UINT)-1 after each call to the CArchive::GetObjectSchema
.
Even the above example in today’s MFC library is fool proof. Listing from the CArchive::ReadObject
has following code
TRY
{
pOb = pClassRef->CreateObject();
UINT nSchemaSave = m_nObjectSchema;
m_nObjectSchema = nSchema;
pOb->Serialize(*this);
m_nObjectSchema = nSchemaSave;
}
As you can see it saves current m_nObjectSchema
into the nSchemaSave
. Assigns current object schema to the m_nObjectSchema
. Call Serialize. Pop saved schema back into the m_nObjectSchema
. Thus the object schema will never go astray.
There are four ways to go around of serialization of the derived and base classes in MFC.
But first let’s look first at the subtle problem. Back in a day of the 16 bit MFC implementation the disk space was a precious commodity, as were the RAM. Thus no matter how many derived classes you have in the class hierarchy, their object schema will be always equal to the final child class schema and will be written only once!
class CBase : public CObject
{
DECLARE_SERIAL(CBase)
public:
int m_i;
float m_f;
double m_d;
virtual void Serialize(CArchive& ar);
};
class CDerived : public CBase
{
DECLARE_SERIAL(CDerived)
public:
long m_l;
unsigned short m_us;
long long m_ll;
virtual void Serialize(CArchive& ar);
};
IMPLEMENT_SERIAL(CBase, CObject, VERSIONABLE_SCHEMA | 1)
void CBase::Serialize(CArchive& ar)
{
if (ar.IsStoring())
{
}
else
{
UINT nSchema = ar.GetObjectSchema();
switch (nSchema)
{
case 1:
ar >> m_i;
ar >> m_f;
ar >> m_d;
break;
}
}
}
IMPLEMENT_SERIAL(CDerived, CBase, VERSIONABLE_SCHEMA | 2)
void CDerived::Serialize(CArchive& ar)
{
CBase::Serialize(ar);
if (ar.IsStoring())
{
}
else
{
UINT nSchema = ar.GetObjectSchema();
switch (nSchema)
{
case 1:
case 2:
ar >> m_l;
ar >> m_us;
ar >> m_ll;
break;
}
}
}
Why is that? Quick look at the binary file dump reveals that for the CSerializableDerived
class the schema is written only once and it is always equals to the instantiated object schema. In this case it is equal CSerializableDerived
class schema even if the base class schema equals to something else.
Tracing into the CArchive::WriteObject
reveals to us this code:
CRuntimeClass* pClassRef = pOb->GetRuntimeClass();
WriteClass(pClassRef);
Tracing into the CArchive::WriteClass
framework first writes wNewClassTag
WORD value which is equal to 0xFFFF. Then it calls CRuntimeClass::Store
function
*this << wNewClassTag;
pClassRef->Store(*this);
The CRuntimeClass::Store
function obtains the length of the class name and writes object schema followed by the length of the class name and the class name itself. Herein lies the answer to the queston why the object schema written only once for the derived most class.
void CRuntimeClass::Store(CArchive& ar) const
{
WORD nLen = (WORD)AtlStrLen(m_lpszClassName);
ar << (WORD)m_wSchema << nLen;
ar.Write(m_lpszClassName, nLen*sizeof(char));
}
After CRuntimeClass
information was written to the file the framework finally calls virtual Serialize function of our object:
((CObject*)pOb)->Serialize(*this);
Exact opposite happens during object load. First the extraction operator is called. This operator is provided by the IMPLEMENT_SERIAL
macro.
CArchive& AFXAPI operator >> (CArchive& ar, CSerializableDerived* &pOb)
{
pOb = (CSerializableDerived*)ar.ReadObject(RUNTIME_CLASS(CSerializableDerived));
return ar;
}
Tracing into the CArchive::ReadObject
reveals us following code
UINT nSchema;
DWORD obTag;
CRuntimeClass* pClassRef = ReadClass(pClassRefRequested, &nSchema, &obTag);
CArchive::ReadClass
function first reads the object tag
DWORD obTag;
WORD wTag;
*this >> wTag;
if (wTag == wBigObjectTag)
*this >> obTag;
else
obTag = ((wTag & wClassTag) << 16) | (wTag & ~wClassTag);
CRuntimeClass* pClassRef;
UINT nSchema;
if (wTag == wNewClassTag)
{
if ((pClassRef = CRuntimeClass::Load(*this, &nSchema)) == NULL)
AfxThrowArchiveException(CArchiveException::badClass, m_strFileName);
}
Following is the listing of the CRuntimeClass::Load
function. Please note that the class name cannot exceed 64 characters. If the length of the class name is greater or equal to 64 characters or the CArchive::Read
failed to read the class name from the file then function returns NULL. If the class name successfully read from a file the szClassName
is NULL terminated at the nLen
length value and is looked up in the CRuntimeClass::FromName
CRuntimeClass* PASCAL CRuntimeClass::Load(CArchive& ar, UINT* pwSchemaNum)
{
if(pwSchemaNum == NULL)
{
return NULL;
}
WORD nLen;
char szClassName[64];
WORD wTemp;
ar >> wTemp; *pwSchemaNum = wTemp;
ar >> nLen;
if (nLen >= _countof(szClassName) ||
ar.Read(szClassName, nLen*sizeof(char)) != nLen*sizeof(char))
{
return NULL;
}
szClassName[nLen] = '\0';
CRuntimeClass* pClass = FromName(szClassName);
if (pClass == NULL)
{
TRACE(traceAppMsg, 0, "Warning: Cannot load %hs from archive. Class not defined.\n",
szClassName);
}
return pClass;
}
CRuntimeClass::FromName
simply iterates through the AFX_MODULE_STATE::m_classList
and does a comparison by name. If the class found CRuntimeClass
pointer is returned. AFX_MODULE_STATE CRuntimeClass
discovery is whole another topic that deserves its own article. But suffice it to say that this feature was implemented prior to RTTI (Run Time Type Information) compiler support and it allows runtime type discovery of the MFC classes with RTTI compiler switch turned off. As a matter of fact default setting for the Visual C++ 6.0 RTTI switch was off.
CRuntimeClass* PASCAL CRuntimeClass::FromName(LPCSTR lpszClassName)
{
CRuntimeClass* pClass=NULL;
ENSURE(lpszClassName);
AFX_MODULE_STATE* pModuleState = AfxGetModuleState();
AfxLockGlobals(CRIT_RUNTIMECLASSLIST);
for (pClass = pModuleState->m_classList; pClass != NULL;
pClass = pClass->m_pNextClass)
{
if (lstrcmpA(lpszClassName, pClass->m_lpszClassName) == 0)
{
AfxUnlockGlobals(CRIT_RUNTIMECLASSLIST);
return pClass;
}
}
AfxUnlockGlobals(CRIT_RUNTIMECLASSLIST);
#ifdef _AFXDLL
AfxLockGlobals(CRIT_DYNLINKLIST);
for (CDynLinkLibrary* pDLL = pModuleState->m_libraryList; pDLL != NULL;
pDLL = pDLL->m_pNextDLL)
{
for (pClass = pDLL->m_classList; pClass != NULL;
pClass = pClass->m_pNextClass)
{
if (lstrcmpA(lpszClassName, pClass->m_lpszClassName) == 0)
{
AfxUnlockGlobals(CRIT_DYNLINKLIST);
return pClass;
}
}
}
AfxUnlockGlobals(CRIT_DYNLINKLIST);
#endif
return NULL;
}
Back into the CArchive::ReadClass
it returns back CRuntimeClass
, pSchema
, and pObTag
pointers.
if (pSchema != NULL)
*pSchema = nSchema;
else
m_nObjectSchema = nSchema;
if (pObTag != NULL)
*pObTag = obTag;
return pClassRef;
After CRuntimeClass
pointer were successfully obtained the framework calls CreateObject
which is provided by the DECLARE_SERIAL
and IMPLEMENT_SERIAL
macros.
- stores current
CArchive::m_nObjectScema
into the nSchemaSave
- Assigns current
CRuntimeClass
schema to the CArchive::m_nObjectSchema
- Calls virtual Serialize function
- Pops the
nSchemaSave
back into the CArchive::m_nObjectSchema
TRY
{
pOb = pClassRef->CreateObject();
UINT nSchemaSave = m_nObjectSchema;
m_nObjectSchema = nSchema;
pOb->Serialize(*this);
m_nObjectSchema = nSchemaSave;
ASSERT_VALID(pOb);
}
So now you know why your class will only have one schema regardless of how many classes you have in your class hierarchy.
How do we address this issue? There are four ways to go around it. Some are more elegant then the others. Let us look at all of those. Of course this applies only to the cases when you must maintain versions throughout all of your classes. The easiest way is not to version anything however in the real life if your application life expectancy measured in decades it is absolutely imperative to maintain versioning right from the start.
This is less elegant solution but it works and eliminates all surprises. For our above example this code will look like this:
IMPLEMENT_SERIAL(CDerived, CBase, VERSIONABLE_SCHEMA | 2)
void CDerived::Serialize(CArchive& ar)
{
if (ar.IsStoring())
{
ar << m_i;
ar << m_f;
ar << m_d;
ar << m_l;
ar << m_us;
ar << m_ll;
}
else
{
UINT nSchema = ar.GetObjectSchema();
switch (nSchema)
{
case 1:
case 2:
ar >> m_i;
ar >> m_f;
ar >> m_d;
ar >> m_l;
ar >> m_us;
ar >> m_ll;
break;
}
}
}
This solution is not very pretty. And if your base class has too many members your Serialize
function can potentially be enormous.
This solution a bit more elegant however you would still need to increment schemas in all of base classes when schema changes.
IMPLEMENT_SERIAL(CBase, CObject, VERSIONABLE_SCHEMA | 1)
void CBase::Serialize(CArchive& ar)
{
if (ar.IsStoring())
{
}
else
{
UINT nSchema = ar.GetObjectSchema();
switch (nSchema)
{
case 1:
case 2:
ar >> m_i;
ar >> m_f;
ar >> m_d;
break;
}
ar.SetObjectSchema(nSchema);
}
}
IMPLEMENT_SERIAL(CDerived, CBase, VERSIONABLE_SCHEMA | 2)
void CDerived::Serialize(CArchive& ar)
{
CBase::Serialize(ar);
if (ar.IsStoring())
{
}
else
{
UINT nSchema = ar.GetObjectSchema();
switch (nSchema)
{
case 1:
case 2:
ar >> m_l;
ar >> m_us;
ar >> m_ll;
break;
}
}
}
Adding private virtual function SerializeImpl(CArchive& ar, UINT nSchema)
will eliminate need to call CArchive::GetObjectSchema
more than once.
class CBase : public CObject
{
DECLARE_SERIAL(CBase)
public:
int m_i;
float m_f;
double m_d;
virtual void Serialize(CArchive& ar);
private:
virtual void SerializeImpl(CArchive& ar, UINT nSchema);
};
class CDerived : public CBase
{
DECLARE_SERIAL(CDerived)
public:
long m_l;
unsigned short m_us;
long long m_ll;
private:
virtual void SerializeImpl(CArchive& ar, UINT nSchema);
};
IMPLEMENT_SERIAL(CBase, CObject, VERSIONABLE_SCHEMA | 1)
void CBase::Serialize(CArchive& ar)
{
if (ar.IsStoring())
{
SerializeImpl(ar, (UINT)-1);
}
else
{
UINT nSchema = ar.GetObjectSchema();
switch (nSchema)
{
case 1:
case 2:
ar >> m_i;
ar >> m_f;
ar >> m_d;
break;
}
SerializeImpl(ar, nSchema);
}
}
void CBase::SerializImpl(CArchive& ar, UINT nSchema)
{
}
IMPLEMENT_SERIAL(CDerived, CBase, VERSIONABLE_SCHEMA | 2)
void CDerived::SerializImpl(CArchive& ar, UINT nSchema)
{
CBase::SerializImpl(ar, nShema);
if (ar.IsStoring())
{
}
else
{
switch (nSchema)
{
case 1:
case 2:
ar >> m_l;
ar >> m_us;
ar >> m_ll;
break;
}
}
}
This is somewhat more elegant but it will still require us to increment version number in the all of the base classes when schema changes.
And here comes the most elegant solution.
Now this solution addresses the shortcomings of the MFC serialization mechanism. You have access to your base class schema via member variable static classCBase::m_wSchema
in our example.
IMPLEMENT_SERIAL(CBase, CObject, VERSIONABLE_SCHEMA | 1)
void CBase::Serialize(CArchive& ar)
{
if (ar.IsStoring())
{
WORD wSchema = (WORD)classCBase.m_wSchema;
ar << wSchema;
ar << m_i;
ar << m_f;
ar << m_d;
}
else
{
WORD wSchema = 0;
ar >> wSchema;
switch (wSchema)
{
case 1:
ar >> m_i;
ar >> m_f;
ar >> m_d;
break;
}
}
}
IMPLEMENT_SERIAL(CDerived, CBase, VERSIONABLE_SCHEMA | 2)
void CDerived:: Serialize(CArchive& ar)
{
CBase::Serialize(ar);
if (ar.IsStoring())
{
}
else
{
UINT nSchema = ar.GetObjectSchema();
switch (nSchema)
{
case 1:
case 2:
ar >> m_l;
ar >> m_us;
ar >> m_ll;
break;
}
}
}
This is the most elegant solution because it frees you from the maintenance of the base classes at the cost of adding a sizeof(WORD)
to you file per every parent class.
Suppose you have a CObject
derived class with pure virtual functons.
class CPureBase : public CObject
{
DECLARE_SERIAL(CPureBase)
public:
CPureBase();
virtual ~CPureBase();
virtual void Serialize(CArchive& ar);
virtual CString CanSerialize() const = 0;
virtual CString GetObjectSchema() const = 0;
virtual CString GetObjectRunTimeName() const = 0;
};
Under normal circumstances this will not work because IMPLEMENT_SERIAL
macro will add the following function to your code:
CObject* PASCAL CPureBase::CreateObject()
{
return new CPureBase;
}
To work around this issue we would need to create our own version of the IMPLEMENT_SERIAL
macro that will return nullptr from the CreateObject
function.
#define IMPLEMENT_SERIAL_PURE_BASE(class_name, base_class_name, wSchema)\
CObject* PASCAL class_name::CreateObject() \
{ return nullptr; } \
extern AFX_CLASSINIT _init_##class_name; \
_IMPLEMENT_RUNTIMECLASS(class_name, base_class_name, wSchema, \
class_name::CreateObject, &_init_##class_name) \
AFX_CLASSINIT _init_##class_name(RUNTIME_CLASS(class_name)); \
CArchive& AFXAPI operator>>(CArchive& ar, class_name* &pOb) \
{ pOb = (class_name*) ar.ReadObject(RUNTIME_CLASS(class_name)); \
return ar; }
Now you can declare your pure base class serializable.
IMPLEMENT_SERIAL_PURE_BASE(CPureBase, CObject, VERSIONABLE_SCHEMA | 1)
CPureBase::CPureBase()
{
}
CPureBase::~CPureBase()
{
}
void CPureBase::Serialize(CArchive& ar)
{
if (ar.IsStoring())
{
}
else
{
}
}
This type of serialization is the most covered in MFC literature. If you have application with the document view architecture, serialization is already part of the CDocument
derived class. Serialize override provides necessary code. Typical structure of the code looks like this:
void CSerializeDemoDoc::Serialize(CArchive& ar)
{
if (ar.IsStoring())
{
ar << m_pRoot;
}
else
{
ar >> m_pRoot;
}
}
To serialize without the Document / View say in the console application you would need to add following code to write to the file
CFile file;
if (!file.Open(_T("Test.my_ext"), CFile::modeCreate | CFile::modeReadWrite | CFile::shareExclusive))
return false;
CArchive ar(&file, CArchive::store | CArchive::bNoFlushOnDelete);
ar << val;
ar.Close();
file.Close();
To de serialize or read without the Document / View use following code
CFile file;
if (!file.Open(_T("Test.my_ext"), CFile::modeRead | CFile::shareExclusive))
return false;
CArchive ar(&file, CArchive::load);
ar >> val;
ar.Close();
file.Close();
Just in a few lines of code you have harnessed the power of the CArchive
object.
CArchive
provides following insertion and extraction operators to handle the plain old data storage and retrieval.
CArchive& operator<<(BYTE by);
CArchive& operator<<(WORD w);
CArchive& operator<<(LONG l);
CArchive& operator<<(DWORD dw);
CArchive& operator<<(float f);
CArchive& operator<<(double d);
CArchive& operator<<(LONGLONG dwdw);
CArchive& operator<<(ULONGLONG dwdw);
CArchive& operator<<(int i);
CArchive& operator<<(short w);
CArchive& operator<<(char ch);
#ifdef _NATIVE_WCHAR_T_DEFINED
CArchive& operator<<(wchar_t ch);
#endif
CArchive& operator<<(unsigned u);
template < typename BaseType , bool t_bMFCDLL>
CArchive& operator<<(const ATL::CSimpleStringT<BaseType, t_bMFCDLL>& str);
template< typename BaseType, class StringTraits >
CArchive& operator<<(const ATL::CStringT<BaseType, StringTraits>& str);
template < typename BaseType , bool t_bMFCDLL>
CArchive& operator>>(ATL::CSimpleStringT<BaseType, t_bMFCDLL>& str);
template< typename BaseType, class StringTraits >
CArchive& operator>>(ATL::CStringT<BaseType, StringTraits>& str);
CArchive& operator<<(bool b);
CArchive& operator>>(BYTE& by);
CArchive& operator>>(WORD& w);
CArchive& operator>>(DWORD& dw);
CArchive& operator>>(LONG& l);
CArchive& operator>>(float& f);
CArchive& operator>>(double& d);
CArchive& operator>>(LONGLONG& dwdw);
CArchive& operator>>(ULONGLONG& dwdw);
CArchive& operator>>(int& i);
CArchive& operator>>(short& w);
CArchive& operator>>(char& ch);
#ifdef _NATIVE_WCHAR_T_DEFINED
CArchive& operator>>(wchar_t& ch);
#endif
CArchive& operator>>(unsigned& u);
CArchive& operator>>(bool& b);
...
If you need to serialize data types which are not declared in the CArchive
object, you would need to write your own implementation. We will look at this a bit later when I cover serializing Windows SDK structures.
MFC provides serialization support for nearly all of its collection and in order to serialize MFC collections all you need to do is to call collection’s version of Serialize(CArchive& ar)
. CArray
is different because it is a template and the template type isn’t known ahead. And the type potentially may or may not be derived from CObject
. Default implementation of the CArray::Serialize
function is listed below. All it does is writes size of the CArray
during write operation and reads size of the CArray from disk and resizes CArray
during read operation. It then kindly forwards the call to SerializeElements<TYPE>()
function.
template<class TYPE, class ARG_TYPE>
void CArray<TYPE, ARG_TYPE>::Serialize(CArchive& ar)
{
ASSERT_VALID(this);
CObject::Serialize(ar);
if (ar.IsStoring())
{
ar.WriteCount(m_nSize);
}
else
{
DWORD_PTR nOldSize = ar.ReadCount();
SetSize(nOldSize, -1);
}
SerializeElements<TYPE>(ar, m_pData, m_nSize);
}
The user must provide appropriate implementation of the SerializeElements<TYPE>()
for the type being stored or retrieved from the archive. Following listing demonstrates SerializeElements<TYPE>
implementation for CAge
class. Please refer to the SerializeDemo project for the implementation details.
class CAge : public CObject
{
DECLARE_SERIAL(CAge)
public:
CAge();
CAge(int nAge);
virtual ~CAge();
virtual void Serialize(CArchive& ar);
UINT m_nAge;
};
template<> inline void AFXAPI SerializeElements(CArchive& ar, CAge** pAge, INT_PTR nCount)
{
for (INT_PTR i = 0; i < nCount; i++, pAge++)
{
if (ar.IsStoring())
{
ar << *pAge;
}
else
{
CAge* p = nullptr;
ar >> p;
*pAge = p;
}
}
}
Serialization to and from memory is supported via CMemFile
. CMemFile
does not require a file name.
CMemFile file;
CArchive ar(&file, CArchive::store);
ar << val;
ar.Close();
Serialization from the memory done in the following manner
CMemFile file;
file.Attach(m_aBytes.GetData(), m_aBytes.GetSize());
CArchive ar(&file, CArchive::load);
ar >> val;
ar.Close();
Serialization to and from memory is supported via CSharedFile
. This is very useful if you want to transfer your serialized object to the clipboard for pasting into another instance of your application or for passing it to another application.
UINT m_nClipboardFormat = RegisterClipboardFormat(_T("MY_APP_DATA"));
CSharedFile file(GMEM_MOVEABLE | GMEM_SHARE | GMEM_ZEROINIT);
CArchive ar(&file, CArchive::store | CArchive::bNoFlushOnDelete);
GetDocument()->Serialize(ar);
EmptyClipboard();
SetClipboardData(m_nClipboardFormat, file.Detach());
CloseClipboard();
ar.Close();
file.Close();
Serialization from the shared memory paste operation from the clipboard:
UINT m_nClipboardFormat = RegisterClipboardFormat(_T("MY_APP_DATA"));
if (!OpenClipboard())
return;
CSharedFile file(GMEM_MOVEABLE | GMEM_SHARE | GMEM_ZEROINIT);
HGLOBAL hMem = GetClipboardData(m_nClipboardFormat);
if (hMem == nullptr)
{
CloseClipboard();
return;
}
file.SetHandle(hMem);
CArchive ar(&file, CArchive::load);
GetDocument()->DeleteContents();
GetDocument()->Serialize(ar);
CloseClipboard();
ar.Close();
file.Close();
Serialization to and from sockets is done via the CSocketFile
class. You can serialize CArchive
into the CSocket
only if the CSocket
is of the type SOCK_STREAM
. This topic is a bit more complex than it is described in the MSDN documentation. Official documentation describes that you can write and read to the CSocket
with the CSocketFile
. This is true for the write operation but for the read operation this is not necessarily true. If your transmitted data size is a few bytes only then yes you can use CSocketFile
for the receiving the data. However if you data size is in megabytes (or any size greater than the reading buffer) then you will likely to receive the data in several reads and you will have to accumulate all of it into the CByteArray
structure first and only after all the data has been received you can attach it to the CMemFile
rather than CSocketFile
and de serialize. Trying to read partial data from the CSocketFile
usually results in CArchiveException
.
CSocket sock;
if (!sock.Create())
return;
if (!sock.Connect(_T("127.0.0.1"), 1011))
return;
CSocketFile file(&sock);
CArchive ar(&file, CArchive::store | CArchive::bNoFlushOnDelete);
ar << m_pRoot;
ar.Close();
file.Close();
sock.Close();
Serialization from the socket is a bit more complicated. I am giving the full listing of the class to demonstrate how to properly read large binary data set from the socket. For the full source code listing please refer to the example project SerializeTcpServer
.
class CSockThread;
class CRecvSocket : public CSocket
{
public:
CRecvSocket();
virtual ~CRecvSocket();
virtual void OnReceive(int nErrorCode);
CSockThread* m_pThread;
CByteArray m_aBytes;
private:
DWORD m_dwReads;
void Display(CRoot* pRoot);
};
#define INCOMING_BUFFER_SIZE 65536
CRecvSocket::CRecvSocket(): m_pThread(nullptr)
, m_dwReads(0)
{
}
CRecvSocket::~CRecvSocket()
{
}
void CRecvSocket::OnReceive(int nErrorCode)
{
Sleep(10);
BYTE btBuffer[INCOMING_BUFFER_SIZE] = { 0 };
int nRead = Receive(btBuffer, INCOMING_BUFFER_SIZE);
switch (nRead)
{
case 0:
m_pThread->PostThreadMessage(WM_QUIT, 0, 0);
break;
case SOCKET_ERROR:
if (GetLastError() != WSAEWOULDBLOCK)
{
m_pThread->PostThreadMessage(WM_QUIT, 0, 0);
}
break;
default:
m_dwReads++;
CByteArray aBytes;
aBytes.SetSize(nRead);
CopyMemory(aBytes.GetData(), btBuffer, nRead);
m_aBytes.Append(aBytes);
DWORD dwReceived = 0;
if (IOCtl(FIONREAD, &dwReceived))
{
if (dwReceived == 0)
{
CMemFile file;
file.Attach(m_aBytes.GetData(), m_aBytes.GetSize());
CArchive ar(&file, CArchive::load);
CRoot* pRoot = nullptr;
TRY
{
ar >> pRoot;
}
CATCH(CArchiveException, e)
{
std::cout << "Error reading data " << std::endl;
}
END_CATCH
if (pRoot)
{
Display(pRoot);
delete pRoot;
}
ar.Close();
file.Close();
m_pThread->PostThreadMessage(WM_QUIT, 0, 0);
}
}
}
CSocket::OnReceive(nErrorCode);
}
In today’s applications you will rarely receive all transmission of the binary or text data in just one OnReceive
call. Thus you need to accumulate all of the data into the array of bytes. And only then you can successfully de serialize it by attaching the accumulated CByteArray
to the CMemFile
. The above example calls IOCtl(FIONREAD, &dwReceived)
to determine if more data is inbound. The rule of thumb is this: because our reading buffer is equal to the 65536 bytes any data transmitted greater than the reading buffer will result in more than one read.
The CSockThread* m_pThread;
implementation is provided in the example project SerializeTcpServer.
Arbitrary byte stream is basically any binary file that you do not know or do not care about its internal structure. An example is that you want to store a JPEG images or mpeg 4 movies files inside of your class data without any knowledge of the underlying data structure. You may de serialize it later and use it with the appropriate application. The MFC serialization allows you to easily store such data.
In the following code we will store the byte stream of four JPEG pictures
class CMyPicture : public CObject
{
DECLARE_SERIAL(CMyPicture)
public:
CMyPicture();
virtual ~CMyPicture();
virtual void Serialize(CArchive& ar);
CString GetHeader() const;
CString m_strName;
CString m_strNewName;
CByteArray m_bytes;
};
typedef CTypedPtrArray<CObArray, CMyPicture*> CMyPictureArray;
Following listing is the body of the class
IMPLEMENT_SERIAL(CMyPicture, CObject, VERSIONABLE_SCHEMA | 1)
CMyPicture::CMyPicture()
{
}
CMyPicture::~CMyPicture()
{
}
void CMyPicture::Serialize(CArchive& ar)
{
if (ar.IsStoring())
{
ar << m_strName;
ar << m_strNewName;
}
else
{
UINT nSchema = ar.GetObjectSchema();
switch (nSchema)
{
case 1:
ar >> m_strName;
ar >> m_strNewName;
break;
}
}
m_bytes.Serialize(ar);
}
To populate such a class with the JPEG image data all you need to do is following
m_aPictures.Add(InitPicture("Water lilies.jpg", "Water lilies Output.jpg"));
m_aPictures.Add(InitPicture("Blue hills.jpg", "Blue hills Output.jpg"));
m_aPictures.Add(InitPicture("Sunset.jpg", "Sunset Output.jpg"));
m_aPictures.Add(InitPicture("Winter.jpg", "Winter Output.jpg"));
UpdateAllViews(nullptr, HINT_GENERATED_DATA);
SetModifiedFlag();
}
std::vector<BYTE> CSerializeDemoDoc::ReadBinaryFile(const char* filename)
{
std::basic_ifstream<BYTE> file(filename, std::ios::binary);
return std::vector<BYTE>((std::istreambuf_iterator<BYTE>(file)), std::istreambuf_iterator<BYTE>());
}
CMyPicture* CSerializeDemoDoc::InitPicture(const char* sFileName, const char* sOutFileName)
{
std::vector<BYTE> vJPG = ReadBinaryFile(sFileName);
CMyPicture* pPicture = new CMyPicture;
pPicture->m_strName = sFileName;
pPicture->m_strNewName = sOutFileName;
pPicture->m_bytes.SetSize(vJPG.size());
CopyMemory(pPicture->m_bytes.GetData(), (void*)&vJPG[0], vJPG.size() * sizeof(BYTE));
return pPicture;
}
void CSerializeDemoDoc::OnTestdataWriteimagedatatodisk()
{
for (INT_PTR i = 0; i < m_pRoot->m_aPictures.GetSize(); i++)
{
CMyPicture* pPic = m_pRoot->m_aPictures.GetAt(i);
std::ofstream fout(pPic->m_strNewName, std::ios::out | std::ios::binary);
fout.write((char*)pPic->m_bytes.GetData(), pPic->m_bytes.GetSize());
fout.close();
}
AfxMessageBox(_T("Finished writing images back to disk"), MB_ICONINFORMATION);
}
Serialization of the Windows SDK structures is not provided by the CArchive class. However it is nearly effortless to add a support for such serialization. Following is the code demonstrates how to serialize LOGFONT SDK structure.
inline CArchive& AFXAPI operator <<(CArchive& ar, const LOGFONT& lf)
{
CString strFace(lf.lfFaceName);
ar << lf.lfHeight;
ar << lf.lfWidth;
ar << lf.lfEscapement;
ar << lf.lfOrientation;
ar << lf.lfWeight;
ar << lf.lfItalic;
ar << lf.lfUnderline;
ar << lf.lfStrikeOut;
ar << lf.lfCharSet;
ar << lf.lfOutPrecision;
ar << lf.lfClipPrecision;
ar << lf.lfQuality;
ar << lf.lfPitchAndFamily;
ar << strFace;
return ar;
}
inline CArchive& AFXAPI operator >> (CArchive& ar, LOGFONT& lf)
{
CString strFace;
ar >> lf.lfHeight;
ar >> lf.lfWidth;
ar >> lf.lfEscapement;
ar >> lf.lfOrientation;
ar >> lf.lfWeight;
ar >> lf.lfItalic;
ar >> lf.lfUnderline;
ar >> lf.lfStrikeOut;
ar >> lf.lfCharSet;
ar >> lf.lfOutPrecision;
ar >> lf.lfClipPrecision;
ar >> lf.lfQuality;
ar >> lf.lfPitchAndFamily;
ar >> strFace;
_tcscpy_s(lf.lfFaceName, strFace);
return ar;
}
After you have defined the LOGFONT extraction and insertion operators all you need to do is following code snippet.
void CRoot::Serialize(CArchive& ar)
{
CBase::Serialize(ar);
if (ar.IsStoring())
{
ar << m_lf;
}
else
{
UINT nSchema = ar.GetObjectSchema();
switch (nSchema)
{
case 1:
ar >> m_lf;
break;
}
}
}
Next code snippet serializes WINDOWPLACEMENT SDK structure:
inline CArchive& AFXAPI operator <<(CArchive& ar, const WINDOWPLACEMENT& val)
{
ar << val.flags;
ar << val.length;
ar << val.ptMaxPosition.x;
ar << val.ptMaxPosition.y;
ar << val.ptMinPosition.x;
ar << val.ptMinPosition.y;
ar << val.rcNormalPosition.bottom;
ar << val.rcNormalPosition.left;
ar << val.rcNormalPosition.right;
ar << val.rcNormalPosition.top;
ar << val.showCmd;
return ar;
}
inline CArchive& AFXAPI operator >> (CArchive& ar, WINDOWPLACEMENT& val)
{
ar >> val.flags;
ar >> val.length;
ar >> val.ptMaxPosition.x;
ar >> val.ptMaxPosition.y;
ar >> val.ptMinPosition.x;
ar >> val.ptMinPosition.y;
ar >> val.rcNormalPosition.bottom;
ar >> val.rcNormalPosition.left;
ar >> val.rcNormalPosition.right;
ar >> val.rcNormalPosition.top;
ar >> val.showCmd;
return ar;
}
Then reading and writing the WINDOWPLACEMENT
structure becomes as trivial as this
void CRoot::Serialize(CArchive& ar)
{
CBase::Serialize(ar);
if (ar.IsStoring())
{
ar << m_wp;
}
else
{
UINT nSchema = ar.GetObjectSchema();
switch (nSchema)
{
case 1:
ar >> m_wp;
break;
}
}
}
Serialization of the STL collection is just as trivial as the serialization of the SDK data structures. Let’s define insertions and extractions operators for the popular STL collections. To serialize std::vector<int>
we would need following definitions
inline CArchive& AFXAPI operator <<(CArchive& ar, const std::vector<int>& val)
{
ar << (int)val.size();
for each (int k in val)
{
ar << k;
}
return ar;
}
To read the STL vector back into the std::vector<int>
we do the following
inline CArchive& AFXAPI operator >> (CArchive& ar, std::vector<int>& val)
{
int nSize;
ar >> nSize;
val.resize(nSize);
for (size_t i = 0; i < (size_t)nSize; i++)
{
ar >> val[i];
}
return ar;
}
Serialization of the std::map<char, int>
collection. First we store the size of the map. Because underlying element of the std::map<char, int>
is a std::pair<char, int>
we store the first and the second members of the pair.
inline CArchive& AFXAPI operator <<(CArchive& ar, const std::map<char, int>& val)
{
ar << (int)val.size();
for each (std::pair<char, int> k in val)
{
ar << k.first;
ar << k.second;
}
return ar;
}
Reading code for the std::map<char, int>
as follows.
inline CArchive& AFXAPI operator >> (CArchive& ar, std::map<char, int>& val)
{
int nSize;
ar >> nSize;
for (size_t i = 0; i < (size_t)nSize; i++)
{
std::pair<char, int> k;
ar >> k.first;
ar >> k.second;
val.insert(k);
}
return ar;
}
Serialization of the STL fixed size std::array<int, 3>
.
inline CArchive& AFXAPI operator <<(CArchive& ar, const std::array<int, 3>& val)
{
for each (int k in val)
{
ar << k;
}
return ar;
}
std::array<int, 3>
reading operator.
inline CArchive& AFXAPI operator >> (CArchive& ar, std::array<int, 3>& val)
{
for (size_t i = 0; i < (size_t)val.size(); i++)
{
ar >> val[i];
}
return ar;
}
Serialization of the std::set<std::string>
collection.
inline CArchive& AFXAPI operator <<(CArchive& ar, const std::set<std::string>& val)
{
ar << (int)val.size();
for each (std::string k in val)
{
ar << CStringA(k.c_str());
}
return ar;
}
Reading code of the std::set<std::string>
collection.
inline CArchive& AFXAPI operator >> (CArchive& ar, std::set<std::string>& val)
{
int nSize;
ar >> nSize;
for (size_t i = 0; i < (size_t)nSize; i++)
{
CStringA str;
ar >> str;
val.insert(std::string(str));
}
return ar;
}
Serialization of the STL types is just as trivial as the serialization of the SDK data structures. First we need an extraction and the insertion operator definition. To serialize or de serialize std::string we need to add following operators:
inline CArchive& AFXAPI operator <<(CArchive& ar, const std::string& val)
{
ar << CStringA(k.c_str());
return ar;
}
De serialize std::string
:
inline CArchive& AFXAPI operator >> (CArchive& ar, std::string& val)
{
CStringA str;
ar >> str;
val = str;
return ar;
}
I will stop here with the STL data and containers serialization implementation. When you saw one STL collection and one STL type serialized, you have seen them all. I will leave it to the reader as an exercise to serialize std::pair
, std::tuple
, std::unordered_map
etc.
To serialize flat C arrays you will follow the same procedure as with serializing collection. But because flat C style array has known size there is no need to store its size in the file.
inline CArchive& AFXAPI operator <<(CArchive& ar, float val[3])
{
for(int i = 0; i < 3; i++)
{
ar << val[i];
}
return ar;
}
Reading flat C style array.
inline CArchive& AFXAPI operator >> (CArchive& ar, float val[3])
{
for (size_t i = 0; i < 3; i++)
{
ar >> val[i];
}
return ar;
}
To serialize enumeration you really need an extraction operator because when inserting an enumeration implicitly converted into an int. But providing both the insertion and extraction operators for enumeration results in the much more cleaner solution and potentially eliminates nasty surprises in the future.
enum EMyTestEnum
{
ENUM_0,
ENUM_1,
};
Write enumeration code.
inline CArchive& AFXAPI operator <<(CArchive& ar, const EMyTestEnum& val)
{
int iTemp = val;
ar << iTemp;
return ar;
}
Read enumeration code.
inline CArchive& AFXAPI operator >> (CArchive& ar, EMyTestEnum& val)
{
int iTmp = 0;
ar >> iTmp;
val = (EMyTestEnum)iTmp;
return ar;
}
This is rather interesting topic and versioning of the CObject
derived can be done in two ways. Let assume we have a class whose version is constantly evolving as the new features are implemented into the core application.
class CMyObject : public CObject
{
DECLARE_SERIAL(CMyObject)
public:
CMyObject();
virtual ~CMyObject();
virtual void Serialize(CArchive& ar);
float m_f;
double m_d;
COLORREF m_backColor;
COLORREF m_foreColor;
CString m_strDescription;
CString m_strNotes;
};
To serialize such an object and still being able to read the Versions 1, 2, and 3 older files, we can implement this in the following ways.
IMPLEMENT_SERIAL(CMyObject, CObject, VERSIONABLE_SCHEMA | 4)
void CMyObject::Serialize(CArchive& ar)
{
if (ar.IsStoring())
{
ar << m_f;
ar << m_d;
ar << m_backColor;
ar << m_foreColor;
ar << m_strDescription;
ar << m_strNotes;
}
else
{
UINT nSchema = ar.GetObjectSchema();
switch (nSchema)
{
case 1:
ar >> m_f;
ar >> m_d;
break;
case 2:
ar >> m_f;
ar >> m_d;
ar >> m_backColor;
ar >> m_foreColor;
break;
case 3:
ar >> m_f;
ar >> m_d;
ar >> m_backColor;
ar >> m_foreColor;
ar >> m_strDescription;
break;
case 4:
ar >> m_f;
ar >> m_d;
ar >> m_backColor;
ar >> m_foreColor;
ar >> m_strDescription;
ar >> m_strNotes;
break;
}
}
}
This approach although crystal clear is tedious at best. There is much of the repetitive code. Another approach is to load this data in reverse and let the switch case statement to fall through to the correct version of the file.
IMPLEMENT_SERIAL(CMyObject, CObject, VERSIONABLE_SCHEMA | 4)
void CMyObject::Serialize(CArchive& ar)
{
if (ar.IsStoring())
{
ar << m_strNotes;
ar << m_strDescription;
ar << m_backColor;
ar << m_foreColor;
ar << m_f;
ar << m_d;
}
else
{
UINT nSchema = ar.GetObjectSchema();
switch (nSchema)
{
case 4:
ar >> m_strNotes;
case 3:
ar >> m_strDescription;
case 2:
ar >> m_backColor;
ar >> m_foreColor;
case 1:
ar >> m_f;
ar >> m_d;
break;
}
}
}
This is much cleaner versioning solution that eliminates all of the repetitive code.
To serialize non CObject
derived class we simply will follow same rule as with the Windows SDK structures.
class CMyObject
{
public:
CMyObject();
virtual ~CMyObject();
static const short VERSION = 1;
float m_f;
double m_d;
};
Write the version number as the very first member. Then when reading depending what is the version inside the file you can take it through appropriate read procedure that corresponds to the version loaded.
inline CArchive& AFXAPI operator <<(CArchive& ar, const CMyObject & val)
{
ar << val.VERSION;
ar << val.m_f;
ar << val.m_d;
return ar;
}
inline CArchive& AFXAPI operator >> (CArchive& ar, CMyObject & val)
{
short nVersion = 0;
ar >> nVersion;
switch(nVersion)
{
case 1:
ar >> val.m_f;
ar >> val.m_d;
break;
}
return ar;
}
Do not serialize WIN32 and WIN64 typedefs ever! If you upgrade your application to the 64 bit and try to read a file which was created with the 32 bit version of the application, which happened to serialize WIN32/64 typedefs (such as DWORD_PTR
) it will fail miserably. Because DWORD_PTR
on the 32 bit architecture is 4 bytes long and 8 bytes long on WIN64 so reading 4 bytes into the 8 bytes and vice versa will result in CArchiveException
and it will make your file useless to another bit aligned version of your application. Serialize only hard known types. If you must use 64 bit integer then serialize it as __int64
explicitly in both 32 and 64 bit versions of your application. This is especially concerning if you are serializing SDK structures. You will need to carefully examine structure declaration and if there are potentially WIN32/64 typedefs present, explicitly cast them to the largest size if you building 32 bit application and plan to upgrade it to 64 bit in the future.
Stick to either to UNICODE or ANSI period. If for whatever reason you must maintain both ANSI and UNICODE versions of your application then serialize exclusively either CStringA
or CStringW
so another version can read the file. Suffice it to say that string such as "hello" will be stored as 5 bytes long in ANSI string but 10 bytes long for the UNICODE version.
Link to MFC statically to eliminate runtime dependency from the MFCXX.DLL, or any other 3rd party library for that matter. Hypothetically if the sizeof(WhateverClass) has changed in a newer version of the 3rd party DLL and your application dynamically linked to it plus serializes it, your application will fail to read the file. Better safe than sorry. So if you are not in control of the 3rd party library code, then link to it statically. A little planning ahead goes a long way.
I have supplied the SerializeDemo solution project that demonstrates all aspects described in this article. This solution contains 4 subprojects:
SerializeData
– houses data structures and operators that are used by all projects SerializeDemo
– MFC Document / View application SerializeTcpServer
– a console server application running on a local host "127.0.0.1"
port 1011
. You may need to change the port number if is already 1011 occupied on your machine. SerializeDemo application demo application can connect to this server for transmitting serialized data SerializationWithoutDocView
– console application that demonstrates usage of CArchive
without Document / View architecture
SerializeDemo application in action.
Try to play with this application using following menu commands.
Try using Edit Copy and Paste into the new instance of the SerializeDemo
application.
Serialize TCP Server in action.
The SerializeDemo
app sent serialized data to the server. Server prints received binary data.
March 16th 2017 Original artice.
Sep 28 2018. Fixed a few typos
Jan 9 2019. Added table of contents since article is quiet long