Introduction
ESJ is a JSON mapper for C++ with modest compiler requirements (No C++11) and zero reliance on 3rd party libraries. It is a very light-weight, easy to use system for inter-operating with web and database services. ESJ can be quickly added to existing code thus generating robust and well-formed JSON data.
JSON (Javascript Object Notation) has become the format of choice for Web based data exchange. JSON is very expressive, easy to parse and read, and of course, has an extremely good fit with the JavaScript language itself. In addition to its ubiquity in AJAX (or more accurately AJAJ) contexts, JSON is also a great fit with Web-socket based communication.
Less obviously perhaps, JSON is also very useful for persistent storage in suitably enabled databases. See PostgreSQL and MonetDB for two excellent examples of JSON database support.
Possibly more unusually, the code has been deployed in embedded environments (via mbed on Freescale ARM Cortex-M4 K64F parts), greatly simplifying Web-socket data exchange for 'Internet of Things' devices.
The attached ZIP file includes projects for Visual Studio (2012) and XCode (Clang). The code is also warning free with g++, the on-line mbed compiler, as well as with the Keil ARM compiler
The code is also hosted on Github. If you have any contributions or fixes you'd like to share, please do so via the ESJ repository.
Background
For those unfamiliar with JSON, do visit http://www.json.org to see the language specification and links to a great variety of other resources, including language bindings, useful documentation, tools and the like.
Another extremely useful web resource is the JSON "lint" tool at http://jsonlint.com/. This proved invaluable during the development of ESJ, so thanks to all concerned.
The motivation for this project was the need to be able to quickly and accurately generate JSON from existing C++ code for both JavaScript and database consumers. There are a number of libraries which attempt to mimic JavaScript's dynamic typing and flexible object structure in C++, providing bi-directional JSON serialization via the same. This is quite the reverse of the approach taken by ESJ - here the intention is to maximize the benefits of C++'s strong, static typing in providing well-formed, highly strongly structured content.
Using the Code
Let us start with the canonical example for JSON serialization.
class JSONExample
{
public:
std::string text;
public:
void serialize(JSON::Adapter& adapter)
{
JSON::Class root(adapter,"JSONExample");
JSON_T(adapter,text);
}
};
#include "json_writer.h"
#include "json_reader.h"
- For each class you wish to serialize, add a
public
member function with a signature identical to that below: void serialize(JSON::Adapter& adapter)
- Inside
serialize()
, add a single declaration: JSON::Class root(adapter,"JSONExample");
- For each member variable, you wish to serialize, add a single declaration using the JSON_E or JSON_T macro. Given a
std::string
member called text
, we have JSON_T(adapter,text);
- Finally use the templated
JSON::producer()
and JSON::consumer()
functions as below:
int main(int argc,char* argv[])
{
JSONExample source;
source.text = "Hello JSON World";
std::string json = JSON::producer<JSONExample>::convert(source);
JSONExample sink = JSON::consumer<JSONExample>::convert(json);
}
and that is it. The results of the serialization process can be seen below:
{"JSONExample":{"text":"Hello JSON World"}}
That pretty much covers the essentials. Now allow me to draw a somewhat more detailed picture of the code snippets so far.
In-box support is provided for the following C++ types:
std::string
maps to JSON string
. std::wstring
maps to JSON string
with support for \UXXXX encoding and decoding. int
maps to JSON number (ignores the fractional part when de-serializing). double
also maps to JSON number. bool
maps to JSON true
or false
. std::vector<T>
maps directly to a JSON array. If T
implements the correct serialize()
function, then the serializer will work as expected for vectors of T
. - The serializer will also correctly handle nested serializable instances, thus allowing fairly complex constructs to be easily transformed to and from JSON.
As stated previously, the class needs to implement the serialize
function. Members will be serialized when this function is called, with order, not surprisingly, following the order of the declarations. It is imperative that the JSON::Class
instance always appears first as it controls some behind the curtains magic required to get object declarations out in the correct JSON format. As ever the use of macros is restricted to one-liners which are used for brevity. Somewhat annoyingly, there are 2 macros which are used to add the serialization code for member variables and one needs to ensure they are ordered correctly. The JSON_E
(JSON Element) is used for serialization support for all members save the last. Why? A quick look at the resulting JSON shows that code called by JSON_E
generates a trailing comma character whilst the JSON_T
(JSON Terminator) does not. Thus the requisite pattern of declarations is:
JSON::Class root(adapter,"name of C++ class");
JSON_E(adaptor,first_member_variable);
JSON_E(adaptor,...);
JSON_T(adaptor,last_member_variable);
Any code which uses the JSON
functions should be wrapped in try/catch
blocks to ensure correct exception recovery.
Finally, note that all directly relevant classes and functions are in the JSON
namespace.
Security
Very little contemporary code that is Internet related can ignore security issues. In this particular case, predictable attack vectors would be malformed or overlong string
s for 'buffer-busting' or illegal character sequences that might end up as executable code.
The JSON scanner can be set to accept a maximum length string
which helps mitigate resource-exhaustion type attacks. Character conversions, notably those from escaped hexadecimal \uXXXX to UTF16 or UTF32 are carefully handled, with the decoder throwing exceptions if there are illegal codepoints or truncated sequences.
The JSON parser, which uses the recursive descent idiom, will obviously consume increasing amounts of stack when presented with a very deeply nested set of encodings. Although this condition is not explicitly checked in the parser, it is extremely easy to add: The JSON::Class
constructor actually monitors the nesting of scopes and could throw an exception if an application-specific limit is reached.
No warnings or errors are generated when the test bed is compiled using Visual Studio's Code Analysis mode.
Points of Interest
ESJ is implemented as a set of C++ header files. This significantly reduces the complexities of cross-platform tool-chain management and the like. The principle files are of interest are:
- json_adapter.h Contains the definition of the interface to, and key streaming functions for the JSON::Adapter serializer code.
- json_writer.h Contains the implementations of the primitives for writing the supported types into a UTF8 string.
- json_reader.h Implements the primitives for the reader.
- json_lexer.h Contains a complete, stand-alone JSON tokeniser (useful in its own right, especially if you are operating in a really resource constrained environment).
- stringer.h Light-weight and type-safe replacement for sprintf and friends which overloads operator << for creating formatted strings.
Principal components with associations rendered in slightly non-standard UML (These diagrams are included in the source distribution as SVG files for better viewing).
Structurally speaking, JSON is written to a ISink
derived class by the Writer, in the hierarchy to hand, the sink is a StringSink
. JSON is read from an ISource
derived class, in this case a StringSource
. As their names imply, the internal containers for the JSON content are actually std::string
s which will be a very good fit in many cases. However, it is worth pointing out that this architecture is also pretty flexible. If, for example, you wish to write your JSON direct to a socket or a file (let us say to avoid potentially large amounts of buffering), then you simply need to inherit from the JSON::ISink
class and implement the relevant operator<<() functions as shown in the UML class diagram above.
Subsidiary components:
The main principle at work here is combining a set of free functions, (generically called stream()
and all implemented within the adapter
class), with another set of overloaded virtual functions implemented within the Reader
and Writer
classes, both of which inherit from Adapter
.
There are overloaded stream()
functions for all of the core data types. Then there is a catch-all templated stream
which expects its value
parameter to implement the serialize()
function. It is with this pattern of decomposition that the mechanism works.
void stream(Adapter& adapter,std::string& value)
void stream(Adapter& adapter,int& value);
void stream(Adapter& adapter,double& value);
void stream(Adapter& adapter,bool& value);
template <typename T> void stream(Adapter& adapter,T& arg)
{
arg.serialize(adapter);
}
Along with functions of one-arity, there is another overloaded set which will stream key/value pairs. In the case of the writer, the implementations are trivial, simply creating a correctly quoted string
when required and appending (or outputting) the result to the destination. For example:
virtual void serialize(const std::string& key,std::string& value,bool more)
{
m_content << "\"" << key << Quote() << ':' << Quote();
m_content << Chordia::escape(value) << Quote() << (more ? "," : "");
}
The equivalent read function works in concert with the JSON scanner like so:
virtual void serialize(const std::string& key,std::string& value,bool more)
{
GetNext(key,T_STRING,value,more);
}
void GetNext(const std::string& key,TokenType type,std::string& value,bool more)
{
GetNext(T_STRING);
throw_if(key != m_token.text,"key does not match");
GetNext(T_COLON);
GetNext(type);
value = m_token.text;
if (more)
{
Next();
}
}
virtual void GetNext(TokenType type)
{
TokenType next = Next();
throw_if(next != type,"GetNext: type mismatch");
}
The only really tricky bit in the implementation is the code required to support JSON arrays. This is again handled in the adapter and uses the primitives shown in the previous snippets. This is the only case in which reading and writing are asymmetric. Firstly, the reader has to correctly handle the case where it encounters an empty array []
, so the reader uses the lexer/scanner's peek capabilities to check the next token and proceed accordingly:
adapter.serialize(key);
adapter.serialize(T_COLON);
adapter.serialize(T_ARRAY_BEGIN);
if (adapter.peek(T_ARRAY_END))
{
adapter.serialize(T_ARRAY_END);
}
else
{
}
In Conclusion
It is useful to see how the C++ derived JSON is (correctly) represented within a JavaScript environment. The image below shows a JSON string
that has been pasted into a Chrome console. The resulting JavaScript object (j)
is shown in the debugger. Note that the Hiragana string
has been correctly translated from its UNICODE representation.
None of the code should be controversial or compiler unfriendly. However users of older versions of Visual Studio may require a stdint.h clone to handle some of the uintN_t typedef
s, which appear in some of the UTF8/UTF16/UTF32 conversion functions.
A final word on the example code: This is essentially a set of modest unit tests for each component. There is also a somewhat more complex example test_nesting()
that demonstrates and tests the serialization of a pair of more complex classes, one containing a vector of the other. It is the output of this test which generates the JSON shown in the Chrome console above.
Links
History
- 1.01 - 24th December, 2014
- 1.02 - 24th January, 2014: Update to sync to Github. Fix problem with malformed quoting of unary int and double values
- 1.03 - An updated tarball which fixes a nesting issue can be downloaded here: https://github.com/g40/esj/archive/master.zip
- 1.04 - Fix to handle std::vector<T> of JSON primitive types (string/number/bool) available here: https://github.com/g40/esj/archive/master.zip
- 1.05 - Updated ZIP to mirror the latest code on Github. Adds another test case suggested by Sebastian F.