Introduction
Data Marshalling is the process of converting data objects into a data stream that corresponds with the packet structure of the network transfer protocols. Or represent data objects in a standard format that can be sent and received by network protocols, and retranslated in the other side.
Many ideas are used for data marshalling, but the common thing between them is that they are trying to precede any data object with its type, and all represent that in many formats:
- XML: uses XML tags and attributes to represent data information and XML text to hold values, and the XML can be sent in a text format (array of characters), and pared in the outer side to construct data objects again.
- Binary: uses binary header before each data object to identify its type and length if needed. (Figure 1)
- Text: uses only text to represent data, like: (string:plapla,int:3232,short:43,...) and uses simple parsing I the other side.
First and third ways are good and representable, and human readable, but they are slow in applications that need performance, so I followed the second way in my class to gain speed. In this article, I will try to simplify the idea by introducing a simple and fast marshal class, that can collect data in many formats and, send it to another marshal object cross sockets connections.
Binary Marshalling:
Binary Marshalling means butting data objects in a binary format, each data object preceded by its type as in Figure 1:
73: 's' character means the current element is a string
0d 00: 2 bytes to keep string length
...: string ASCII bytes
69: 'i' character means the current element is a short
and so on, 'type',' value',...
The advantage here is that, the type is always saved in one byte, and each type is saved in its maximum allowed bytes, and if the type is variable length like string
, the length is kept in 2 bytes. So the parsing process will be so fast, just direct access. But there is some points that should be taken to do some special cases, like:
- Marshaling object: To marshal objects like classes and structures, it is needed to inherit from a simple class
CMarshalObject
, that has two functions for serializing and deserializing object data, so your class that you need to marshal must implement these two functions, as the CMarshal
class calls them internally during the marshalling and unmarshaling processes. At the marshal buffer, the type of the object is the char 'o'.
- Marshaling vectors: To marshal vector of any type, just the type is preceded with 'v' character, mean vector, so the marshaled buffer will be like that to marshal array of characters:
Text: vcHatem Mostafa
Binary: 76 63 48 61 74 65 6d 20 4d 6f 73 74 61 66 61
- Marshaling objects vectors: You can marshal vector of objects by preceding the object type 'o' with the character 'v' as in the previous point.
Remember, you don't have to do all of that yourself, I have introduced helpful functions with my class to do all that.
Class functions
High level functions:
Marshal |
Marshal any number of data type in just one call, using Optional arguments function. |
Unmarshal |
Unmarshal any number of data type in just one call, using Optional arguments function. |
Send |
Send marshaled data through the connected socket. |
Recv |
Receive marshaled data through the connected socket. |
bool Marshal(LPCSTR lpcsFormat, ...);
bool Unmarshal(LPCSTR lpcsFormat, ...);
Ex:
Client side:
char c;
int n;
vector<string> vs;
...
CMarshal obj;
obj.Marshal("%c%vs%d", c, &vs, n);
obj.Send(socket);
Server side:
CMarshal obj;
obj.Recv(socket);
obj.Unmarshal("%c%vs%d", c, &vs, n);
The case in these functions is simple, just <marshal, send> in the client side, and <receive, unmarshal> in the server side.
Note: You can send and receive from any side.
Low level functions:
PopType |
Pop the type at current index in the marshaled buffer. |
Pop |
Pop current data at current index in the marshaled buffer. |
PopObject |
Pop current object at current index in the marshaled buffer. |
PopVector |
Pop current vector at current index in the marshaled buffer. |
PopObjectVector |
Pop current object vector at current index in the marshaled buffer. |
Push |
Push data at the index of the marshal buffer. |
PushVector |
Push vector at the index of the marshal buffer. |
PushObjectVector |
Push object vector at the index of the marshal buffer. |
All of these functions deal directly with the internal buffer of the marshal object, to adjust the buffer as in Figure 1, or parse the buffer to fill data objects in the unmarshaling process.
Points of Interest
- The marshal object uses a
String
class for all internal buffer handling, just I offered some helpful operators with the String
class, like: const String & operator+=(const String & string);
const String & operator+=(LPCTSTR lpsz);
const String & operator+=(LPTSTR lpsz);
const String & operator+=(const unsigned char* lpsz);
const String & operator+=(int n);
const String & operator+=(short s);
const String & operator+=(double d);
const String & operator+=(float f);
const String & operator+=(char c);
which helps me in pushing any data types in the stack of the marshal object.
String
class that I use in this code is like the MFC CString
class with some added operators as in the previous point.
- Sockets synchronization is the best thing you can find in this article.
- The functions
Send
and Recv
at the marshal object can be used from client or server sides, but what will be the case if one client used a marshal object in two threads and want to send at the same time with the same socket
?
From the sockets library documentation, sockets are not thread safe. So at the client side you should take care of calling Send
from multiple threads with the same socket
. You should use synchronization objects to serialize calls to the Send
function.
- If the client calls
Send
from many threads (using synchronization objects), and each thread calls Recv
for the same socket, how can they get there replies correctly, the thread that has the current time slice will receive first!!!. So, I followed a good technique here to solve this problem:
- Each thread should send its unique ID to the server in the
Send
function.
- Client should receive replies for this socket in one place (thread).
- All threads should be suspended in the
Recv
function, waiting for its replay from the common place (thread).
- The server should precede each client reply with client ID.
That what I have did in my code:
- At the
Send
function:
m_hEvent = ::CreateEvent(NULL, FALSE, FALSE, NULL);
m_data.Insert(0, (int)this);
- At the client, I am using a common thread for receiving from this socket:
void ClientRecv(void *lpv)
{
SOCKET sock = (SOCKET)lpv;
CMarshal* pMarshal;
try
{
while(true)
{
if(recv(sock, (char*) & pMarshal,
sizeof(int), 0) != sizeof(int))
break;
if(pMarshal->m_fVer != 1)
continue;
if(pMarshal->RecvData(sock) > 0)
if(::SetEvent(pMarshal->m_hEvent) == false)
continue;
}
}
catch(...)
{
}
}
- At the
Recv
of the client thread, I suspended using the event created at the Send
function:
if(m_hEvent)
{
if(::WaitForSingleObject(m_hEvent, 60000) == WAIT_TIMEOUT)
return 0;
::CloseHandle(m_hEvent);
m_hEvent = 0;
return GetLength();
}
Source code files
- Marshal.cpp, Marshal.h
- Socket.cpp, Socket.h
- String.cpp, String.h
- mem.cpp, mem.h
Thanks to...
I awe a lot to my colleagues for helping me in implementing and testing this module. (JAK)