↵
Introduction
Note: This covers one aspect of my Json library. For more, please see my main Json article.
Loading JSON into objects is a great way to abstract it. However, it doesn't work well, if at all, to do it with large amounts of data. This is where the easy, JSON path selectable trees are not going to work so well.
What you need ideally, is to be able to move to the sections of the document you want, and then load those into the object tree and work on that small subset, and then continue. That way, you're never loading the entire document into memory at one time.
Enter JsonTextReader
.
Background
Like all my parser libraries, I expose a streaming pull-parser interface to JSON that works quite a bit like System.Xml.XmlReader
with a few extra features. It's suitable for use on streams that do not seek, and streams that are extremely large. It supports forward only navigation through the document.
The general idea is to call Read()
in a loop, and then check the NodeType
property, and work with the Value
property or the RawValue
property. The latter just returns the string data directly as it came from the stream, while the former "cooks" it by turning it into its corresponding .NET type.
For example, here's the code for printing out every node in the document as a list.
using (var reader = JsonTextReader.CreateFrom(@"..\..\data.json"))
{
while(reader.Read())
{
Console.Write(reader.NodeType);
if (JsonNodeType.Value == reader.NodeType
|| JsonNodeType.Key==reader.NodeType)
Console.Write(" " + reader.Value);
Console.WriteLine();
}
}
If you're familiar with XmlReader
/XmTextReader
, then this above should be familiar.
Using the Code
The above isn't very useful, but it illustrates the basic concept. We're about to make it more real world.
We'll be using the data http://api.themoviedb.org/3/tv/2129?api_key=c83a68923b7fe1d18733e8776bba59bb.
To load from the url, you'd do:
var url = "http://api.themoviedb.org/3/tv/2129?api_key=c83a68923b7fe1d18733e8776bba59bb";
using(var reader=JsonTextReader.CreateFromUrl(url))
{
}
Aside from that, reading is exactly the same as from a file. You can also read from a string
, using Create()
but don't load huge data into a string
. Disposing the reader is important if it was opened on a file or on a URL. On a string
, it doesn't matter but it's good practice.
Remember that the reader is forward only. That complicates things. You cannot move to the parent of a node, and if whatever you're looking for isn't found, it's back to the drawing board because your reader just seeked to the end of where you were searching looking for what it couldn't find and you can't go back and try again - we're forward only. Because of this, in huge documents, you need to know what you're looking for - you can't simply do random access queries on it.
Now let's do some selecting.
var url = "http://api.themoviedb.org/3/tv/2129?api_key=c83a68923b7fe1d18733e8776bba59bb";
using(var reader=JsonTextReader.CreateFromUrl(url))
{
if (reader.SkipToField("created_by"))
{
if (reader.Read())
Console.WriteLine(reader.ParseSubtree());
else
Console.WriteLine("Sanity check failed, key has no value");
}
else
Console.WriteLine("Not found");
}
We landed on an array. What if we want the second value?
using(var reader=JsonTextReader.CreateFromUrl
("http://api.themoviedb.org/3/tv/2129?api_key=c83a68923b7fe1d18733e8776bba59bb"))
{
if (reader.SkipToField("created_by"))
{
if (reader.Read())
{
if (reader.SkipToIndex(1))
Console.WriteLine(reader.ParseSubtree());
else
Console.WriteLine("Couldn't find the index.");
}
else
Console.WriteLine("Sanity check failed, key has no value");
}
else
Console.WriteLine("Not found");
}
Okay, admittedly, even if we remove the comments, that's a lot of code for skipping to two places.
Fortunately, we can shorten it by skipping entire paths:
using(var reader=JsonTextReader.CreateFromUrl
("http://api.themoviedb.org/3/tv/2129?api_key=c83a68923b7fe1d18733e8776bba59bb"))
{
if (reader.SkipTo("created_by",1))
Console.WriteLine(reader.ParseSubtree());
else
Console.WriteLine("Not found");
}
Above we skipped to "created_by" followed by index 1 in one call. You can have as many combinations of field names and indices as you need, but be careful because if it can't find the selection, it will be hard to know what it couldn't find if it fails.
Finally, if we just wanted his name, we'd do:
var url="http://api.themoviedb.org/3/tv/2129?api_key=c83a68923b7fe1d18733e8776bba59bb";
using(var reader=JsonTextReader.CreateFromUrl(url))
{
if (reader.SkipTo("created_by", 1, "name"))
{
if (reader.Read())
Console.WriteLine(reader.ParseSubtree());
else
Console.WriteLine("Sanity check failed, key has no value");
}
else
Console.WriteLine("Not found");
}
Perhaps occasionally, you don't know the field beforehand. Maybe it could be one of a number of possible fields. I don't really have anything other than a contrived scenario with this dataset we're using, but it looks something like this:
var url = "http://api.themoviedb.org/3/tv/2129?api_key=c83a68923b7fe1d18733e8776bba59bb";
using (var reader = JsonTextReader.CreateFromUrl(url))
{
if (reader.SkipToAnyOfFields("seasons", "production_companies"))
{
if (reader.Read())
Console.WriteLine(reader.ParseSubtree());
else
Console.WriteLine("Sanity check failed, key has no value");
}
else
Console.WriteLine("Not found");
}
If you want to do multiple queries through a document, things get a bit more complicated. The reason is that with the pull-reader, after you've found the first result, you'll be somewhere in the inner branches of the tree, and you need to keep calling Read()
to pull yourself back out again so you can parse the next section. JsonTextReader
provides two helper methods to deal with some of that: SkipToEndObject()
and SkipToEndArray()
but you still need to know when to call it, which means knowing where you ended up in the first place.
var url = "http://api.themoviedb.org/3/tv/2129?api_key=c83a68923b7fe1d18733e8776bba59bb";
using (var reader = JsonTextReader.CreateFromUrl(url))
{
if (reader.SkipTo("created_by", 0, "name"))
{
if (reader.Read())
Console.WriteLine(reader.ParseSubtree());
else
Console.WriteLine("Sanity check failed, key has no value");
reader.SkipToEndObject();
if (reader.Read())
{
reader.SkipToField("name");
if (reader.Read())
Console.WriteLine(reader.ParseSubtree());
else
Console.WriteLine("Sanity check failed, key has no value");
}
else
Console.WriteLine("Sanity check failed, unexpected end of document");
}
else
Console.WriteLine("Not found");
}
As I said, things get a bit more complicated, because you have to use SkipToEndObject()
/SkipToEndArray()
to move back outward in the tree so you can run the next query. I find that you sometimes have to experiment with it to figure out where you are in the document at a given point, as it can be hard to keep track of. Remember you can ParseSubtree()
at any point and then pretty print the result, which should give you a good read on where you are, although this won't work if your subtree is very large. You'll just have to step through and log it or use the watch window in the debugger to figure it out. Such is the nature of forward only streaming a nested document structure.
As you can see, it's much more complicated to stream JSON than it is simply to parse it, but with the JsonTextReader
it's not unmanageable. I'd love to support an (extremely restricted) subset of JSON path in the future but currently that's just not feasible yet. Fortunately, SkipTo()
gives you about 70% of that functionality.
History
- 10th September, 2019 - Initial submission