Introduction
Here you will find a small (15KB) .NET assembly written in C# that allows reading and writing data formatted according to the JSON standard with features that mimic those of LinqToXML.
Background
I am very impressed by how easy it is to read and to write XML data with LinqToXml in .NET Framework 4.0, and I wanted something very similar for JSON. I search around a bit but could find nothing to my liking.
What I found was mostly JSON serializer/deserializer assemblies, which is not what I was interested in, and the few LinqToJSON helpers that I found have issues in my humble opinion:
- They rely on an underlying serializer
- They do not seem to respect the specifications of JSON (this is mostly a consequence of using the serializer)
- They do not offer writing JSON data the way LinqToXml offers writing XML data
When I write “do not seem to respect the specifications of JSON”, I mean that the serializer writes data that is not conforming to Json, as it is described in www.json.org. For instance, in the examples for the JSON libraries, I have often seen something similar to:
"Expiry" = new Date(1234567890)
In my understanding of JSON, this is wrong, because the value new Date(1234567890)
is neither:
- a JSON string - it would be:
"<code>new Date(1234567890)
" - a JSON number, a JSON boolean value or JSON
null
– this is clear - a JSON array element – it would need the delimiters
[
and ]
and it would need to be a valid array element, which it is not - a JSON object - it would need the delimiters
{
and }
and it would need to be a valid object member, which it is not
A date can be written using the proper JSON formatting. Either as a string: “01/01/2011” or as an array [1, 1, 2011], or just as a number (the usual amount of seconds since ...), and choose a descriptive name. For instance:
"ExpiryDate" : "01/01/2011"
or:
"TestDates" : [[01,01,2011], [01,01,2012], [01,01,2013]]
Of course, you have to convert the string
or array or whatever afterwards to the actual 'Date
' object, but this should not be an issue, especially as the .NET Framework already provides the necessary mechanisms to achieve that.
One more piece of advice: when you have decided over the representation of a non-trivial item (whether it's a 'Date
' or binary data, or whatever), stick to it! Please.
Anyway, only one thing to do then: write my own library.
Disclaimer
I do not pretend that this LinqToJson library is the fastest, the leanest, the smallest or the best, I just pretend that it does exactly what I planned it to do, and that it does it well enough for me. And that in "only" 14KB. :)
Besides, please note that I am not dismissing using JSON serializers. My point is that I did not want/need a serializer. In other words, JSON serializers are not better or worse that my LinqToJSON library, they are just serving a different purpose, which wasn't fitting my needs. So, if you need a serializer, use one.
Specifications (What I Want)
Public Objects
Similar to XDocument
, XElement
and XAttribute
from LinqToXml, I want to have:
JDocument
, as the root object. Derived class of JObject
, to allow directly using the properties and methods of JObject
. - Generic containers
JObject
and JArray
hold values. They implement the interface IJValue
and provide accessors to the JSON content.
- Specific containers
JString
– I do not want to use the built-in type System.String
directly, because conceptually a JSON string and a C# string are two different items. JString
implements the interface IJValue
and checks the validity of the given C# string
, according to the JSON specifications. JNumber
– Same story here as to why I do not use the type Double
directly. This class implements the interface IJValue
and checks the validity of the given C# string
, regarding the JSON specifications for numbers. JTrue
, JFalse
and JNull
- They also implement the interface IJValue
. Again, I did not want to use the types bool
and object
directly.
Reading JSON
- Provide a
static
method JDocument.Load()
- Accessors will return
IEnumerable<IJvalue>
when the calling object is a JObject
or Jarray
, to allow for Linq-style querying - a value of the appropriate type and content for the containers: '
bool
' for JTrue
and JFalse
, 'double
' for JNumber
, 'string
' for JString
and 'object
' for JNull
.
- The generic type is
IJValue
. Filtering the values returned can be done over the type, using the keyword is:
if(value is JNumber).
See the classes' description below and the code of the demo application for more details.
Code snippet from the demo application
JDocument jDoc = JDocument.Load(filename);
var members = from member in jDoc.GetMembers() select member;
foreach (var member in members)
{
if (member.Value is JString || member.Value is JNumber)
if (member.Value is JObject)
{
JObject obj = member.Value as JObject;
var names = from name in obj.GetNames() select name;
foreach (var name in names)
{
IJValue value = obj[name];
if (value is JFalse || value is JTrue || value is JNull)
}
var members = from member in obj.GetMembers() select member;
foreach (var member in members)
{
Console.WriteLine("\t{0} : {1}", member.Name, member.Value.ToString());
}
}
}
Writing JSON
- Create a new
JDocument
instance - Add content in several ways.
- either in the constructors of the JSON values directly
- or with the family of methods
Add()
- Each class implementing the interface
IJValue
provides a method returning a string
that contains the correct representation. For most objects, I do not use ToString()
because I want to keep the JSON string
representation of this object independent of C#. - Provide a
JDocument
instance method Save()
See the classes' description below and the code of the demo application for more details.
Code snippet #1 (from the demo application)
new JDocument(
new JObjectMember("D'oh",
new JObject(
new JObjectMember("First Name", new JString("Homer")),
new JObjectMember("Family Name", new JString("Simpson")),
new JObjectMember("Is yellow?", new JTrue()),
new JObjectMember("Children",
new JArray(
new JString("Bart"),
new JString("Lisa"),
new JString("Maggie"))
)
)
),
new JObjectMember("never gets older",
new JObject(
new JObjectMember("First Name", new JString("Bart")),
new JObjectMember("Family Name", new JString("Simpson"))
)
)
).Save("simpsons.json");
There are several interesting aspects in the example above:
- Except the final call to
Save()
, all the data is created through constructors - The code can be presented in a very visual way, that mimics the structure of the JSON data
- This JSON data can most likely not be deserialized, mostly because of the non-alphanumerical characters in the names of the object members
Code snippet #2 (from the demo application)
JObject obj = new JObject();
obj.Add("_number", new JNumber(-3.14));
obj.Add("_true", new JTrue());
obj.Add("_null", new JNull());
JArray arr = new JArray();
arr.Add(new JNumber("-15.64"));
arr.Add(new JString(@"Unicode: \u12A0"),
new JString(@"\n\\test\""me:{}"));
JDocument doc = new JDocument(
new JObjectMember("_false", new JFalse()),
new JObjectMember("_false2", new JFalse())
);
doc.Add(
new JObjectMember("_array", arr),
new JObjectMember("_string1", new JString("string1")),
);
doc.Add("_obj", obj);
doc.Save(filename);
Interfaces and Classes
Interfaces
IJValue
– generic type representing all the different items a JSON value can be.
This interface declares two methods: ToString()
and ToString(int indentLevel)
.
Public Classes
Only the important, public
methods and properties are listed here (the keyword public
is voluntarily omitted).
class JDocument : JObject
{
JDocument()
JDocument(JObject jObject)
JDocument(params JObjectMember[] jObjectMembers)
public static JDocument Load(string uri)
public static JDocument Load(string uri, System.Text.Encoding encoding)
public static JDocument Load(Stream stream)
public static JDocument Load(Stream stream, System.Text.Encoding encoding)
If you provide null
as parameter for the encoding, then Load()
will try to detect the encoding type based on the ByteOrderMark
(currently detected: UTF16(BE & LE), UTF8 and ASCII). An exception is thrown if it fails.
public void Save(Stream stream)
public void Save(Stream stream, System.Text.Encoding encoding, bool addByteOrderMark)
public void Save(string uri)
public void Save(string uri, System.Text.Encoding encoding, bool addByteOrderMark
static JDocument Parse(string text)
}
class JObject : IJValue
{
JObject()
JObject(params JobjectMember[] jObjectMembers)
int Count
IJValue this[string name]
IJValue GetValue(string name)
Ienumerable<string> GetNames()
IEnumerable<IJValue> GetValues()
IEnumerable<JObjectMember> GetMembers()
void Add(params JobjectMember[] jObjectMembers)
void Add(string name, IJValue jValue)
}
class JObjectMember
{
string Name
IJValue Value
JObjectMember(string name, IJValue value)
}
class JArray : IJValue
{
JArray()
JArray(int capacity)
JArray(params IJValue[] jValues)
int Count
IJValue this[int index]
void Add(params IJValue[] jValues)
IEnumerable<IJValue> GetValues()
}
class JNumber : IJValue
{
double Content
JNumber(double number)
JNumber(string text)
}
class JString:IJValue
{
string Content
JString(string content)
}
class JNull : IJValue
{
object Content
}
class JFalse : IJValue
{
bool Content
}
class JTrue : IJValue
{
bool Content
}
Error Handling
Errors are notified solely through Exceptions, no error codes are returned.
Demo Console Application
The demo application is a bit of code thrown together to show several ways how to read and how to write JSON data using the assembly Ranslant.Linq.Json
.
It also shows what would trigger errors when parsing or saving.
Points of Interest
Source Code
I have tried to write code that is easy to understand yet efficient enough. I have avoided complex constructs and design patterns, because reading/writing JSON data is, in itself, not a complex task and therefore should not require complex code.
A few things are worth mentioning for beginners in C#:
- Use of regular expressions (in
JNumber
for instance) - Lazy creation of the data structures in
JObject
and JArray
(Add()
). The private
data containers are initialized only once they are actually needed. - Use of an extension method (
ConsumeToken()
) - Use of an interface as generic base type (
IJValue
) , in combination with a simple factory pattern (JParser.ParseValue()
) - Methods with variable number of arguments (e.g.
Add<code>(params JObjectMember
[] jObjectMembers
)
) - Use of
IEnumerable<T>
to benefit from Linq's querying power - Use of the statement
new
to hide an inherited method (ToString()
)
About the Parser
There are hundreds of ways to write a text parser. It is my opinion that all of them are right as long as the text is properly parsed and you get the specified (and hopefully expected) result. Which means: my parser is good because it works. This is all I expect from it.
What I do here is simple, and rather standard. I parse the text looking for tokens. Those tokens represent delimiters for specific JSON value types. In other words, a token or a set of tokens give away the expected value type. The text between those tokens is then checked for validity, which means I check if what I have actually represents the value type that I expect from the delimiters.
For instance, the delimiter '{' means I get a JSON object, and therefore everything between this delimiter and the corresponding closing delimiter ('}') should represent a JSON object and should therefore be written according to the grammatical rules (taken from www.json.org):
- a JSON
object
is {}
or {members}
members
are pair
or pair,members
pair
is string<code>: value
string
is ""
or "text"
, where text
is … value
is object
or array
or string
or number
, etc. - etc.
What the parser does is follow these grammatical rules and report an error (by throwing an exception) when a rule is violated.
The way the string
is technically parsed is by eating it up, so to speak, until it is empty. In other words, what is checked (whether tokens or content) is removed from the string
to be parsed. Picture Pac-Man eating the pills in the maze, one pill at a time. Well, same thing here. :)
To achieve this, I have chosen the following code structure:
- A class field containing the
string
that will be consumed. This field is not static
since I want to stay thread safe. Each parsing step will modify this string
, by removing the current token or content. - A set of parsing function, each one devoted to recognizing and/or filling one and only one type of value. Which one is called depends on the token found.
I also considered the following alternative, where no string
would be stored in the class, but rather passed from parse function to parse function:
ParseSomething(string textIn, out string textOut)
With textIn
being the string
fed to the specific parse
function and textOut
being the string
left after successful parsing. But I left this solution out, out of concerns for performance. Indeed, the recursive nature of JSON would have created situations where many string
instances are necessary, requiring a lot of memory.
Converting a Base Class to One of its Derived Classes
The first specific difficulty of the parser was getting ParseDocument(string)
to return a JDocument
instance although it was actually parsing a JObject
.
First I had the following code:
Document jDoc = (JDocument)ParseObject();
It compiled, but threw an InvalidCastException
(see http://msdn.microsoft.com/en-us/library/ms173105.aspx and search for 'Giraffe'). Indeed there is no equivalent to the C++ dynamic_cast
in C#, and therefore you cannot convert a base class into one of its derived classes.
There are a few solutions for this issue:
- SELECTED SOLUTION - Copy the members of the object returned by
ParseObject()
into the base of the JDocument
object (using the constructor JDocument(JObject jObject)
). This works well and requires on-ly very little code. Besides, this solution allows to generally create a JDocument
out of any JObject
, which can also be useful. - Encapsulate a
JObject
instance in JDocument
. I found this to be not as nice, in terms of usability, as the first solution. - Use the
CopyTo()
method of JObject
. This uses reflection (and can therefore bring per-formance issues) and is not guaranteed to work according to http://social.msdn.microsoft.com/Forums/en-US/csharplanguage/thread/f7362ba9-48cd-49eb-9c65-d91355a3daee - Derive
JDocument
from IJValue
. But then I lose inheriting the JObject
methods and I would have needed to duplicate them in JDocument
. This is definitely not clean. - I checked if I could use the keywords
in
and out
, to see if I could use the covariance and contravariance features of C#, but since I do not use generics, this won't work.
I found the solution #1 to be the only clean and elegant solution, but it works only because in JObject
the fields and properties (there's actually only one field) are accessed for writing only through public
accessor functions (the Add()
functions) that can be used freely in JDocument
thanks to the inheritance.
This means that this pattern cannot be used generically.
Read the source code and the comments there for more information.
Unicode Encodings
The second difficulty was taking care of the different text encodings. ASCII is not a problem, but UTF8/16/32 are: you need to provide the right encoder to get the correct C# string
.
Besides, internally, a C# string
is encoded with UTF16, and depending on the original encoding of the data, you will or not get a Byte Order Mark (check the Wikipedia articles about UTF 8 and UTF 16 for more information) when converting to a C# string
.
To deal with this, I added two variants of the Load()
/ Save()
functions in JDocument
:
- Polymorphic prototypes where the encoding can be specified (
System.Text.Encoding
) - When encoding is not specified, then UTF8 is the default
- If
null
is passed as parameter for the encoding in the function Load()
, then the library will try to detect the encoding, based on the Byte Order Mark. It it fails, an exception is thrown.
Additionally, since UTF 32 is seldom used, I just didn't consider it at all.
I would advise to carefully read the MSDN documentation about System.Text.Encoding
.
Possible Improvements
Source Code Cleanness and Efficiency
Obviously YOU would have written everything not just completely differently, but better. :D
More seriously, if you have suggestions concerning code cleanness (I am referring to clean code development) and efficiency of data and classes structures, then you are welcome, I always want to learn.
Bug fixes
I have not included the unit tests in the package, but I have tested the code thoroughly. Some issues may still have slipped me though, so if you find a bug, then you're welcome to report it (please be descriptive enough).
If you have a JSON file that cannot be read with this code:
- Make sure the JSON text is properly formatted
- Send me the file if possible and I will check
Direct Conversion
It may turn out to be simpler to use bool
for JTrue
and JFalse
, double
for JNumber
, string
for JString
and object
for JNull
more directly. I would still keep those classes because I still want to use the interface IJValue
, but I could add converters so that the following could be written:
var myQuery = from value in jDocument.GetValues() select value;
foreach(IJValue value in myQuery)
{
if(value is JTrue || value is JFalse)
{
if(value)
{
}
}
}
Currently you have to write: if(value.Content)
, which is not too much to ask for, but if(value)
is more comfortable, and LINQ is all about comfort, isn't it?
Would it help?
JSON to XML Converter
Not too sure about this one. The user could perhaps provide a schema, to provide the desired structure because I definitely do not want to impose an XML schema to the user. I'm not convinced this is needed here though.
Parser
- Use more regular expressions? For instance, to check the validity of a
string
content - Improve parsing speed? How? Why?
- Support comments (delimiter: @"//") even though this is not defined in the JSON standard?
Generic
- Improve memory footprint?
- Have the classes that implement
IJValue
also implement IDisposable
? Why?
History
- 2011/11/23
- 2011/11/24
- I found a few things in the code that I could refactor, and I am now using
public new ToString()
to write the JSON data as text.
- 2011/11/29
JDocument
- changed Save(...)
and Load(...)
methods to use Stream
objects, with the option to give a specific text encoding.
- 2011/11/30
- I found an issue with the encoders and with the
ByteMarkOrders
of UTF8/16 encoded data. So I added the following changes:
- If you provide '
null
' as parameter for the encoding, then Load()
will try to detect the encoding type based on the ByteOrderMark
(currently detected: UTF16(BE & LE), UTF8 and ASCII) - By saving, you can provide whether you'd like a BOM or not
- When parsing, the BOM is ignored if it is detected
- 2011/12/06
- Hopefully the last update
- Added missing text (Unicode encoding) in the article
- Fixed a few typos
- Uploaded the latest (final?) versions of the library and its source code (there was an issue with detecting the ASCII encoding when the data did not start with a visible character)