Small LINQ to JSON Library

Guillaume Ranslant

4.96/5 (28 votes)

6 Dec 2011CPOL13 min read

84.4K

A small LinqToJSON library in C#, and how it works

Introduction

Here you will find a small (15KB) .NET assembly written in C# that allows reading and writing data formatted according to the JSON standard with features that mimic those of LinqToXML.

Background

I am very impressed by how easy it is to read and to write XML data with LinqToXml in .NET Framework 4.0, and I wanted something very similar for JSON. I search around a bit but could find nothing to my liking.

What I found was mostly JSON serializer/deserializer assemblies, which is not what I was interested in, and the few LinqToJSON helpers that I found have issues in my humble opinion:

They rely on an underlying serializer
They do not seem to respect the specifications of JSON (this is mostly a consequence of using the serializer)
They do not offer writing JSON data the way LinqToXml offers writing XML data

When I write “do not seem to respect the specifications of JSON”, I mean that the serializer writes data that is not conforming to Json, as it is described in www.json.org. For instance, in the examples for the JSON libraries, I have often seen something similar to:

"Expiry" = new Date(1234567890)

In my understanding of JSON, this is wrong, because the value new Date(1234567890) is neither:

a JSON string - it would be: "<code>new Date(1234567890)"
a JSON number, a JSON boolean value or JSON null – this is clear
a JSON array element – it would need the delimiters [ and ] and it would need to be a valid array element, which it is not
a JSON object - it would need the delimiters { and } and it would need to be a valid object member, which it is not

A date can be written using the proper JSON formatting. Either as a string: “01/01/2011” or as an array [1, 1, 2011], or just as a number (the usual amount of seconds since ...), and choose a descriptive name. For instance:

"ExpiryDate" : "01/01/2011"

or:

"TestDates" : [[01,01,2011], [01,01,2012], [01,01,2013]]

Of course, you have to convert the string or array or whatever afterwards to the actual 'Date' object, but this should not be an issue, especially as the .NET Framework already provides the necessary mechanisms to achieve that.

One more piece of advice: when you have decided over the representation of a non-trivial item (whether it's a 'Date' or binary data, or whatever), stick to it! Please.

Anyway, only one thing to do then: write my own library.

Disclaimer

I do not pretend that this LinqToJson library is the fastest, the leanest, the smallest or the best, I just pretend that it does exactly what I planned it to do, and that it does it well enough for me. And that in "only" 14KB. :)

Besides, please note that I am not dismissing using JSON serializers. My point is that I did not want/need a serializer. In other words, JSON serializers are not better or worse that my LinqToJSON library, they are just serving a different purpose, which wasn't fitting my needs. So, if you need a serializer, use one.

Specifications (What I Want)

Public Objects

Similar to XDocument, XElement and XAttribute from LinqToXml, I want to have:

JDocument, as the root object. Derived class of JObject, to allow directly using the properties and methods of JObject.
Generic containers
- JObject and JArray hold values. They implement the interface IJValue and provide accessors to the JSON content.
Specific containers
- JString – I do not want to use the built-in type System.String directly, because conceptually a JSON string and a C# string are two different items. JString implements the interface IJValue and checks the validity of the given C# string, according to the JSON specifications.
- JNumber – Same story here as to why I do not use the type Double directly. This class implements the interface IJValue and checks the validity of the given C# string, regarding the JSON specifications for numbers.
- JTrue, JFalse and JNull - They also implement the interface IJValue. Again, I did not want to use the types bool and object directly.

Reading JSON

Provide a static method JDocument.Load()
Accessors will return
1. IEnumerable<IJvalue> when the calling object is a JObject or Jarray, to allow for Linq-style querying
2. a value of the appropriate type and content for the containers: 'bool' for JTrue and JFalse, 'double' for JNumber, 'string' for JString and 'object' for JNull.
The generic type is IJValue. Filtering the values returned can be done over the type, using the keyword is:

if(value is JNumber).

See the classes' description below and the code of the demo application for more details.

Code snippet from the demo application

// first, load a file
JDocument jDoc = JDocument.Load(filename);

// then, you can use LINQ to parse the document
// the line below does nothing useful, but it works, and that's why it's here


var members = from member in jDoc.GetMembers() select member; 

foreach (var member in members)
{
    // you can filter the values using their type
    if (member.Value is JString || member.Value is JNumber)
        // do something ...

    // you can filter the values using their type 
    if (member.Value is JObject)
    {
        JObject obj = member.Value as JObject;

        var names = from name in obj.GetNames() select name;
        foreach (var name in names)
        {
            // you can get an object with its name
            IJValue value = obj[name];

            if (value is JFalse || value is JTrue || value is JNull)
                // do something ...
        }

        var members = from member in obj.GetMembers() select member;
        foreach (var member in members)
        {
            Console.WriteLine("\t{0} : {1}", member.Name, member.Value.ToString());
        }
    }
}

Writing JSON

Create a new JDocument instance
Add content in several ways.
1. either in the constructors of the JSON values directly
2. or with the family of methods Add()
Each class implementing the interface IJValue provides a method returning a string that contains the correct representation. For most objects, I do not use ToString() because I want to keep the JSON string representation of this object independent of C#.
Provide a JDocument instance method Save()

See the classes' description below and the code of the demo application for more details.

Code snippet #1 (from the demo application)

// example of JDocument written only through the constructor:
new JDocument(
    new JObjectMember("D'oh",
        new JObject(
            new JObjectMember("First Name", new JString("Homer")),
            new JObjectMember("Family Name", new JString("Simpson")),
            new JObjectMember("Is yellow?", new JTrue()),
            new JObjectMember("Children", 
                new JArray(
                    new JString("Bart"),
                    new JString("Lisa"),
                    new JString("Maggie"))
            )
        )
    ),
    new JObjectMember("never gets older",
        new JObject(
            new JObjectMember("First Name", new JString("Bart")),
            new JObjectMember("Family Name", new JString("Simpson"))
        )
    )
).Save("simpsons.json");

There are several interesting aspects in the example above:

Except the final call to Save(), all the data is created through constructors
The code can be presented in a very visual way, that mimics the structure of the JSON data
This JSON data can most likely not be deserialized, mostly because of the non-alphanumerical characters in the names of the object members

Code snippet #2 (from the demo application)

// you can create an empty object and add values
JObject obj = new JObject();
obj.Add("_number", new JNumber(-3.14));
obj.Add("_true", new JTrue()); // notice the use of JTrue 
obj.Add("_null", new JNull()); // notice the use of JNull

// you can create an empty array and add values
JArray arr = new JArray();
// ... either only one value
arr.Add(new JNumber("-15.64"));
// ... or more than one at once
// Notice that prefixing your strings with @ will help keeping them as valid JSON strings
arr.Add(new JString(@"Unicode: \u12A0"),
    new JString(@"\n\\test\""me:{}"));

JDocument doc = new JDocument(
    new JObjectMember("_false", new JFalse()),
    new JObjectMember("_false2", new JFalse())
); // the same name cannot be used twice!

// Add() has two forms:
// 1. with JObjectMember, you can give one or more
doc.Add(
    new JObjectMember("_array", arr),
    new JObjectMember("_string1", new JString("string1")),
);
// 2. directly give the name and the value
doc.Add("_obj", obj);

doc.Save(filename);

Interfaces and Classes

Interfaces

IJValue – generic type representing all the different items a JSON value can be.

This interface declares two methods: ToString() and ToString(int indentLevel).

Public Classes

Only the important, public methods and properties are listed here (the keyword public is voluntarily omitted).

class JDocument : JObject
{
    JDocument()

    // Creates a new JDocument instance and copies the members from jObject
    JDocument(JObject jObject)

    // Creates a new JDocument instance and fills it with the jObjectMembers
    JDocument(params JObjectMember[] jObjectMembers)

    // Loads JSON data from a file. An exception is thrown if something goes wrong. 
    // The file has to contain nothing but JSON data.
    // * "uri" : URI (path) to the file. Must contain only JSON Data
    // * "encoding" : specific encoding (default is UFT8Encoding)
    public static JDocument Load(string uri)
    public static JDocument Load(string uri, System.Text.Encoding encoding)
 
    // Loads JSON data from a stream. An exception is thrown if something goes wrong.
    // The stream has to contain nothing but JSON data from its current position to its end.
    // * "stream": stream from where to read. 
    // The stream object is allowed to not support stream.Length
    // * "encoding" : specific encoding (default is UFT8Encoding). 
    public static JDocument Load(Stream stream)
    public static JDocument Load(Stream stream, System.Text.Encoding encoding)

If you provide null as parameter for the encoding, then Load() will try to detect the encoding type based on the ByteOrderMark (currently detected: UTF16(BE & LE), UTF8 and ASCII). An exception is thrown if it fails.

    // Write the JDocument as JObject to the stream. The stream is not closed.
    // An exception is thrown if something goes wrong.
    // * "stream" : stream where to write
    // * "encoding" : specific encoding (default is UFT8Encoding)
    // * "addByteOrderMark" : whether or not the Byte Order Mark has to be added to the file
    //           The BOM may be empty depending on the properties of "encoding".
    public void Save(Stream stream)
    public void Save(Stream stream, System.Text.Encoding encoding, bool addByteOrderMark)
 
    // Saves the JSON data to a file.
    // * "uri" : URI (path) of the file. If it already exists, it will be overwritten
    // * "encoding" : specific encoding (default is UFT8Encoding)
    // * "addByteOrderMark" : whether or not the Byte Order Mark has to be added to the file
    public void Save(string uri)
    public void Save(string uri, System.Text.Encoding encoding, bool addByteOrderMark   
    
    // Parses the given text. Only JSON data is expected. 
    // An exception is thrown if something goes wrong, for instance if the JSON data 
    // is not properly formatted
    static JDocument Parse(string text)
}

class JObject : IJValue
{
    // Creates an empty JSON object. Use one of the Add() functions to fill it.
    JObject()

    // Creates a JSON object pre-filled with the jObjectMembers. 
    // Further members can still be added using one of the Add() functions
    JObject(params JobjectMember[] jObjectMembers)

    // Returns the amount of members in the object
    int Count

    // Returns the object member value that is associated to 'name'
    IJValue this[string name]

    // Returns the object member value that is associated to 'name'
    IJValue GetValue(string name)

    // Returns a list of all the names stored in the object, without their values
    Ienumerable<string> GetNames()

    // Returns a list of all the values stored in the object, without their names
    IEnumerable<IJValue> GetValues()

    // Returns a list of all members stored in the object
    IEnumerable<JObjectMember> GetMembers()

    // Adds JSON object members (the values and their associated name are 
    // stored in a JObjectMember object) to the JSON object
    void Add(params JobjectMember[] jObjectMembers)

    // Adds one JSON object member to the object. 
    // A name cannot be added twice, it has to be unique in the object
    void Add(string name, IJValue jValue)
}

// data container class, used solely with JObject
class JObjectMember
{
    string Name // only the getter is public
    IJValue Value // only the getter is public
    JObjectMember(string name, IJValue value)
}

class JArray : IJValue
{
    // Creates an empty JSON array. Use one of the Add() functions to fill it.
    JArray()

    // Allows to set the initial capacity of the private value container. 
    // Use one of the Add() functions to fill it.
    JArray(int capacity)

    // Creates a JSON array pre-filled with the jValues. 
    // Use one of the Add() functions to fill it further.
    JArray(params IJValue[] jValues)

    // Returns the amount of elements in the array
    int Count

    // Returns the value stored at a specific index in the JArray
    IJValue this[int index]

    // Adds JSON values to the JSON array
    void Add(params IJValue[] jValues)

    // Returns a list of all elements stored in the JSON array
    IEnumerable<IJValue> GetValues()
}

class JNumber : IJValue
{
    double Content
    JNumber(double number)

    // 'text' is checked for validity according to the format description on 
    // <a href="http://www.json.org/">www.json.org</a>, an exception is thrown if an 
    // issue is found.
    JNumber(string text)
}

class JString:IJValue
{
    string Content

    // 'content' is checked for validity according to the format description on 
    // <a href="http://www.json.org/">www.json.org</a>, an exception is thrown if an 
    // issue is found.
    JString(string content)
}

class JNull : IJValue
{
    object Content // returns 'null'
}

class JFalse : IJValue
{
    bool Content // returns 'false'
}

class JTrue : IJValue
{
    bool Content // returns 'true'
}

Error Handling

Errors are notified solely through Exceptions, no error codes are returned.

Demo Console Application

The demo application is a bit of code thrown together to show several ways how to read and how to write JSON data using the assembly Ranslant.Linq.Json.

It also shows what would trigger errors when parsing or saving.

Points of Interest

Source Code

I have tried to write code that is easy to understand yet efficient enough. I have avoided complex constructs and design patterns, because reading/writing JSON data is, in itself, not a complex task and therefore should not require complex code.
A few things are worth mentioning for beginners in C#:

Use of regular expressions (in JNumber for instance)
Lazy creation of the data structures in JObject and JArray (Add()). The private data containers are initialized only once they are actually needed.
Use of an extension method (ConsumeToken())
Use of an interface as generic base type (IJValue) , in combination with a simple factory pattern (JParser.ParseValue())
Methods with variable number of arguments (e.g. Add<code>(params JObjectMember[] jObjectMembers))
Use of IEnumerable<T> to benefit from Linq's querying power
Use of the statement new to hide an inherited method (ToString())

About the Parser

There are hundreds of ways to write a text parser. It is my opinion that all of them are right as long as the text is properly parsed and you get the specified (and hopefully expected) result. Which means: my parser is good because it works. This is all I expect from it.

What I do here is simple, and rather standard. I parse the text looking for tokens. Those tokens represent delimiters for specific JSON value types. In other words, a token or a set of tokens give away the expected value type. The text between those tokens is then checked for validity, which means I check if what I have actually represents the value type that I expect from the delimiters.

For instance, the delimiter '{' means I get a JSON object, and therefore everything between this delimiter and the corresponding closing delimiter ('}') should represent a JSON object and should therefore be written according to the grammatical rules (taken from www.json.org):

a JSON object is {} or {members}
members are pair or pair,members
pair is string<code>: value
string is "" or "text", where text is …
value is object or array or string or number, etc.
etc.

What the parser does is follow these grammatical rules and report an error (by throwing an exception) when a rule is violated.

The way the string is technically parsed is by eating it up, so to speak, until it is empty. In other words, what is checked (whether tokens or content) is removed from the string to be parsed. Picture Pac-Man eating the pills in the maze, one pill at a time. Well, same thing here. :)

To achieve this, I have chosen the following code structure:

A class field containing the string that will be consumed. This field is not static since I want to stay thread safe. Each parsing step will modify this string, by removing the current token or content.
A set of parsing function, each one devoted to recognizing and/or filling one and only one type of value. Which one is called depends on the token found.

I also considered the following alternative, where no string would be stored in the class, but rather passed from parse function to parse function:

ParseSomething(string textIn, out string textOut)

With textIn being the string fed to the specific parse function and textOut being the string left after successful parsing. But I left this solution out, out of concerns for performance. Indeed, the recursive nature of JSON would have created situations where many string instances are necessary, requiring a lot of memory.

Converting a Base Class to One of its Derived Classes

The first specific difficulty of the parser was getting ParseDocument(string) to return a JDocument instance although it was actually parsing a JObject.
First I had the following code:

Document jDoc = (JDocument)ParseObject();

It compiled, but threw an InvalidCastException (see http://msdn.microsoft.com/en-us/library/ms173105.aspx and search for 'Giraffe'). Indeed there is no equivalent to the C++ dynamic_cast in C#, and therefore you cannot convert a base class into one of its derived classes.

There are a few solutions for this issue:

SELECTED SOLUTION - Copy the members of the object returned by ParseObject() into the base of the JDocument object (using the constructor JDocument(JObject jObject)). This works well and requires on-ly very little code. Besides, this solution allows to generally create a JDocument out of any JObject, which can also be useful.
Encapsulate a JObject instance in JDocument. I found this to be not as nice, in terms of usability, as the first solution.
Use the CopyTo() method of JObject. This uses reflection (and can therefore bring per-formance issues) and is not guaranteed to work according to http://social.msdn.microsoft.com/Forums/en-US/csharplanguage/thread/f7362ba9-48cd-49eb-9c65-d91355a3daee
Derive JDocument from IJValue. But then I lose inheriting the JObject methods and I would have needed to duplicate them in JDocument. This is definitely not clean.
I checked if I could use the keywords in and out, to see if I could use the covariance and contravariance features of C#, but since I do not use generics, this won't work.

I found the solution #1 to be the only clean and elegant solution, but it works only because in JObject the fields and properties (there's actually only one field) are accessed for writing only through public accessor functions (the Add() functions) that can be used freely in JDocument thanks to the inheritance.
This means that this pattern cannot be used generically.

Read the source code and the comments there for more information.

Unicode Encodings

The second difficulty was taking care of the different text encodings. ASCII is not a problem, but UTF8/16/32 are: you need to provide the right encoder to get the correct C# string.

Besides, internally, a C# string is encoded with UTF16, and depending on the original encoding of the data, you will or not get a Byte Order Mark (check the Wikipedia articles about UTF 8 and UTF 16 for more information) when converting to a C# string.

To deal with this, I added two variants of the Load() / Save() functions in JDocument:

Polymorphic prototypes where the encoding can be specified (System.Text.Encoding)
When encoding is not specified, then UTF8 is the default
If null is passed as parameter for the encoding in the function Load(), then the library will try to detect the encoding, based on the Byte Order Mark. It it fails, an exception is thrown.

Additionally, since UTF 32 is seldom used, I just didn't consider it at all.

I would advise to carefully read the MSDN documentation about System.Text.Encoding.

Possible Improvements

Source Code Cleanness and Efficiency

Obviously YOU would have written everything not just completely differently, but better. :D

More seriously, if you have suggestions concerning code cleanness (I am referring to clean code development) and efficiency of data and classes structures, then you are welcome, I always want to learn.

Bug fixes

I have not included the unit tests in the package, but I have tested the code thoroughly. Some issues may still have slipped me though, so if you find a bug, then you're welcome to report it (please be descriptive enough).
If you have a JSON file that cannot be read with this code:

Make sure the JSON text is properly formatted
Send me the file if possible and I will check

Direct Conversion

It may turn out to be simpler to use bool for JTrue and JFalse, double for JNumber, string for JString and object for JNull more directly. I would still keep those classes because I still want to use the interface IJValue, but I could add converters so that the following could be written:

var myQuery = from value in jDocument.GetValues() select value;
foreach(IJValue value in myQuery)
{
    if(value is JTrue || value is JFalse)
    {
        if(value)
        {
            // do something...
        }
    }
}

Currently you have to write: if(value.Content), which is not too much to ask for, but if(value) is more comfortable, and LINQ is all about comfort, isn't it?

Would it help?

JSON to XML Converter

Not too sure about this one. The user could perhaps provide a schema, to provide the desired structure because I definitely do not want to impose an XML schema to the user. I'm not convinced this is needed here though.

Parser

Use more regular expressions? For instance, to check the validity of a string content
Improve parsing speed? How? Why?
Support comments (delimiter: @"//") even though this is not defined in the JSON standard?

Generic

Improve memory footprint?
Have the classes that implement IJValue also implement IDisposable? Why?

History

2011/11/23

Initial article

2011/11/24

I found a few things in the code that I could refactor, and I am now using public new ToString() to write the JSON data as text.

2011/11/29

JDocument - changed Save(...) and Load(...) methods to use Stream objects, with the option to give a specific text encoding.

2011/11/30

I found an issue with the encoders and with the ByteMarkOrders of UTF8/16 encoded data. So I added the following changes:

If you provide 'null' as parameter for the encoding, then Load() will try to detect the encoding type based on the ByteOrderMark (currently detected: UTF16(BE & LE), UTF8 and ASCII)
By saving, you can provide whether you'd like a BOM or not
When parsing, the BOM is ignored if it is detected

2011/12/06

Hopefully the last update

Added missing text (Unicode encoding) in the article
Fixed a few typos
Uploaded the latest (final?) versions of the library and its source code (there was an issue with detecting the ASCII encoding when the data did not start with a visible character)

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)