Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / XML

Lazy Deserialization of Large JSON Files with Dynamic Proxy

5.00/5 (4 votes)
26 Jul 2018MIT5 min read 21.8K   80  
Dynamic proxy for lazy deserialization of large JSON files

Introduction

The solutions described in this article were invented in response to a problem that I encountered during implementation of Standard Music Font Layout (https://www.smufl.org/) in the controls library project Manufaktura.Controls (https://www.codeproject.com/Articles/1252423/Music-Notation-in-NET).

Standard Music Font Layout (abbreviated SMuFL) is a font standard initiated by Steinberg and currently developed by the W3C Music Notation Community Group. Music fonts that are compliant with SMuFL specification are distributed with a JSON file containing metadata which describe default engraving settings (like barline thickness, beam thickness, etc.) and relations between glyphs such as glyph sizes, glyph cutouts (for positioning glyphs closely to one another), etc. The full specification of SMuFL can be found here: http://w3c.github.io/smufl/gitbook/specification/.

All these settings are stored in a JSON file which contains thousands of nodes. The single class Manufaktura.Controls.Model.SMuFL.GlyphBBoxes, which maps only a fragment of JSON file, contains 2964 properties. From my measurements, it appears that deserialization of the whole metadata file with a popular framework Newtonsoft.Json takes 4.8 seconds on my machine (I have a 7th generation i7 CPU). Other users reported that it can take up to 8 seconds on some machines.

Most of the data contained in JSON metadata file is unnecessary for most applications. If you are familiar with music, you probably know that most of the scores do not use microtonal alterations, and the SMuFL specification offers characters from several microtonal systems. You probably also don’t need such glyphs like PictJingleBells, PictMusicalSawPeinkofer or WindRimOnly, whatever that means…

The obvious solution is to read only parts of JSON file but it is tempting to map JSON tree structure to strongly typed object that will allow you to access all the needed properties without the need to manually traverse the JSON tree. Notice how simple it looks in this method (where ISMuFLFontMetadata is a deserialized JSON file):

C#
[Units(Units.Linespaces)]
public BoundingBox GetSMuFLBoundingBox(ISMuFLFontMetadata metadata)
{
    if (RepeatSign == RepeatSignType.Backward) return metadata.GlyphBBoxes.RepeatRight;
    else if (RepeatSign == RepeatSignType.Forward) return metadata.GlyphBBoxes.RepeatLeft;
    else return null;
}

The way that will allow you to read JSON partially while maintaining the simplicity of strongly typed objects is to implement a dynamic proxy.

How Does It Work

There are many implementations of dynamic proxies but I want my code to target .NET Standard, so I decided to use Microsoft’s official System.Reflection.DispatchProxy. Dynamic proxy basically creates a type dynamically at runtime. This generated type decorates another type and adds some additional logic. It can, for example, override virtual methods and wrap them in blocks for measuring performance, handling exceptions, etc. This technique is a part of aspect programming methodology.

DispatchProxy is a simple proxy that implements a given interface and enables the programmer to provide implementation in Invoke method which is called every time any method of the interface is invoked:

C#
protected override object Invoke(MethodInfo targetMethod, object[] args)

This is our interface that maps JSON nodes into objects:

C#
public interface ISMuFLFontMetadata
{
    [JsonProperty("fontName")]
    string FontName { get; set; }

    [JsonProperty("fontVersion")]
    double FontVersion { get; set; }

    [JsonProperty("engravingDefaults")]
    Dictionary<string, double> EngravingDefaults { get; set; }

    [JsonProperty("glyphBBoxes")]
    IGlyphBBoxes GlyphBBoxes { get; set; }

    [JsonProperty("glyphsWithAlternates")]
    Dictionary<string, GlyphsWithAlternate> GlyphsWithAlternates { get; set; }

    [JsonProperty("glyphsWithAnchors")]
    GlyphsWithAnchors GlyphsWithAnchors { get; set; }

    [JsonProperty("ligatures")]
    Dictionary<string, Ligature> Ligatures { get; set; }

    [JsonProperty("optionalGlyphs")]
    OptionalGlyphs OptionalGlyphs { get; set; }

    [JsonProperty("sets")]
    Dictionary<string, GlyphSet> Sets { get; set; }
}

Our goal is to deserialize each JSON property at the time it is accessed. If a property is not used, it will not be deserialized at all. This is the sample implementation:

C#
public abstract class LazyLoadJsonProxy : DispatchProxy
{
        public static object Create(Type interfaceType, string json)
        {
            var proxyType = typeof(LazyLoadJsonProxy<>).MakeGenericType(interfaceType);
            var method = proxyType.GetTypeInfo().GetDeclaredMethods
            (nameof(Create)).First(m => m.GetParameters().First().ParameterType == typeof(string));
            return method.Invoke(null, new object[] { json });
        }
}

public class LazyLoadJsonProxy<TInterface> : LazyLoadJsonProxy
    {
        private ConcurrentDictionary<string, object> 
                                cache = new ConcurrentDictionary<string, object>();
        private string jsonString;

        public static TInterface Create(string json)
        {
            var proxy = Create<TInterface, LazyLoadJsonProxy<TInterface>>() 
                                                     as LazyLoadJsonProxy<TInterface>;
            proxy.jsonString = json;
            return (TInterface)(object)proxy;
        }

        protected override object Invoke(MethodInfo targetMethod, object[] args)
        {
            if (cache.ContainsKey(targetMethod.Name)) return cache[targetMethod.Name];

               var jsonPropertyAttribute = targetMethod.DeclaringType
                    .GetTypeInfo()
                    .GetDeclaredProperty(targetMethod.Name.Replace("get_", ""))?
                    .GetCustomAttribute<JsonPropertyAttribute>();

                if (jsonPropertyAttribute == null) return TryAddDefaultValue
                                      (targetMethod.Name, targetMethod.ReturnType);

                using (var textReader = new StringReader(jsonString))
                using (var reader = new JsonTextReader(textReader))
                {
                    while (reader.Read())
                    {
                        if (reader.Path != jsonPropertyAttribute.PropertyName) continue;

                        var token = JToken.Load(reader);
                        if (targetMethod.ReturnType.GetTypeInfo().IsInterface)
                        {
                            var prop = token as JProperty;
                            if (prop != null)
                            {
                                var proxy = Create(targetMethod.ReturnType, prop.Value.ToString());
                                return TryAddValue(targetMethod.Name, proxy);
                            }
                            else
                            {
                               var proxy = Create(targetMethod.ReturnType, token.ToString());
                                 return TryAddValue(targetMethod.Name, proxy);
                            }
                        }

                        var property = token as JProperty;
                        if (property != null)
                            return TryAddValue(targetMethod.Name, 
                                   property.Value.ToObject(targetMethod.ReturnType));
                        else
                            return TryAddValue(targetMethod.Name, 
                                               token.ToObject(targetMethod.ReturnType));
                     }
                }

                return TryAddDefaultValue(targetMethod.Name, targetMethod.ReturnType);
        }

        private object TryAddDefaultValue(string name, Type type)
        {
            var value = type.GetDefaultValue();
            return TryAddValue(name, value);
        }

        private object TryAddValue(string name, object value)
        {
            cache.TryAdd(name, value);
            return value;
        }
    }

The code attached to this article contains performance measurements but I removed them from this example for the sake of clarity.

Every time the property is accessed, a JSON serializer deserializes only the part of JSON file associated with this property. Some properties contain large objects, for example GlyphBBoxes contain more than 2000 nodes. Deserializing such property is time consuming so I decided to implement proxy nesting. If a property type is interface, another proxy will be created instead of deserializing the whole subtree at once.

How to Use

This is how you deserialize JSON in the old way (all at once):

C#
var metadata = JsonConvert.DeserializeObject<SMuFLFontMetadata>(jsonString);

This is how you do it with proxy:

C#
var metadata = LazyLoadJsonProxy<ISMuFLFontMetadata>.Create(jsonString);

In these two examples, the metadata is of type ISMuFLFontMetadata but the implementations are different. In the first example, we have a normal object SMuFLFontMetadata (implemented with autoproperties) and in the second one, we have our proxy object.

The Tests

 I created some unit tests to measure performance in this approach:

C#
[TestMethod]
public void JsonDeserialziationTestWithoutProxy()
{
    var assembly = typeof(SerializationTests).Assembly;
    var resourceName = $"{typeof(SerializationTests).Namespace}.Assets.bravura_metadata.json";

    using (var stream = assembly.GetManifestResourceStream(resourceName))
    using (var reader = new StreamReader(stream))
    {
        string result = reader.ReadToEnd();
        var sw = new Stopwatch();
        sw.Start();
        var traditionallyLoadedMetadata = JsonConvert.DeserializeObject<SMuFLFontMetadata>(result);
        sw.Stop();

        Debug.WriteLine(sw.Elapsed);
    }
}

[TestMethod]
public void JsonDeserializationTestWithProxy()
{
    var assembly = typeof(SerializationTests).Assembly;
    var resourceName = $"{typeof(SerializationTests).Namespace}.Assets.bravura_metadata.json";

    using (var stream = assembly.GetManifestResourceStream(resourceName))
    using (var reader = new StreamReader(stream))
    {
        string result = reader.ReadToEnd();
        var metadata = LazyLoadJsonProxy<ISMuFLFontMetadata>.Create(result);
        var defaults = metadata.EngravingDefaults;
            
        var bboxes = metadata.GlyphBBoxes;
        var prop1 = bboxes.AccdnCombDot;
        var prop2 = bboxes.WindTightEmbouchure;
        var prop3 = bboxes.WindRimOnly;
        var prop4 = bboxes.MensuralLongaVoidStemDownRight;
        var prop5 = bboxes.AccSagittalFlat11LDown;
        var prop6 = bboxes.NoteheadSquareBlackWhite;
        var prop7 = bboxes.NoteheadWholeWithX;
        var prop8 = bboxes.ElecMixingConsole;
        var prop9 = bboxes.ElecPause;
        var prop10 = bboxes.PictBeaterWoodTimpaniUp;
        var prop11 = bboxes.AccdnCombLh2RanksEmpty;
        var prop12 = bboxes.AccSagittalSharp5V13LUp;
        var prop13 = bboxes.MensuralNoteheadLongaWhite;
        var prop14 = bboxes.OrnamentTrill;
        var prop15 = bboxes.OrnamentTremblementCouperin;
        var prop16 = bboxes.AccSagittalSharp19SUp;
        var prop17 = bboxes.NoteShapeRoundDoubleWhole;
        var prop18 = bboxes.WindWeakAirPressure;
        var prop19 = bboxes.WindRelaxedEmbouchure;
        var prop20 = bboxes.AccdnCombLh2RanksEmpty;

        var metadataAsProxy = (LazyLoadJsonProxy)metadata;
        var bboxesAsProxy = (LazyLoadJsonProxy)bboxes;
        var elapsedWithProxy = metadataAsProxy.TotalTimeSpentOnDeserialization + 
                               bboxesAsProxy.TotalTimeSpentOnDeserialization;

        Debug.WriteLine(elapsedWithProxy);
    }
}

The first example measures performance during deserialization of the whole JSON at once. The second creates a proxy object and touches some random properties. No more than a few dozens properties are really used by Manufaktura.Controls so I decided to pick 20 random ones.

In real situations, it is very unlikely to access 20 properties at one time. I created a test to check the performance in real rendering:

C#
[TestMethod]
public void JsonDeserializationTestWithProxyOnRealExample()
{
    var assembly = typeof(SerializationTests).Assembly;
    var resourceName = 
      $"{typeof(SerializationTests).Namespace}.Assets.bravura_metadata.json";
    var scoreResourceName = 
      $"{typeof(SerializationTests).Namespace}.Assets.JohannChristophBachFull3.0.xml";

    using (var stream = assembly.GetManifestResourceStream(resourceName))
    using (var scoreStream = assembly.GetManifestResourceStream(scoreResourceName))
    using (var reader = new StreamReader(stream))
    using (var scoreReader = new StreamReader(scoreStream))
    {
        var sw = new Stopwatch();
        sw.Start();

        string metadataJson = reader.ReadToEnd();
        var metadata = LazyLoadJsonProxy<ISMuFLFontMetadata>.Create(metadataJson);

        var scoreString = scoreReader.ReadToEnd();
        var settings = new HtmlScoreRendererSettings();
        settings.RenderSurface = HtmlScoreRendererSettings.HtmlRenderSurface.Svg;
        settings.LoadSMuFLFont(metadata, "Bravura", 24, "/fakeuri");
        settings.Scale = 1;
        settings.CustomElementPositionRatio = 0.8;
        settings.IgnorePageMargins = true;

        var renderer = 
           new HtmlSvgScoreRenderer(new XElement("root"), "testCanvas", settings);
        renderer.Render(scoreString.ToScore());

        sw.Stop();
    
        var metadataAsProxy = (LazyLoadJsonProxy)metadata;

        Debug.WriteLine($"All rendering done in {sw.Elapsed}");
  
        var deserTime = metadataAsProxy.GetTotalDeserializationTimeWithChildElements();

        Debug.WriteLine($"Deserialization done in {deserTime}");
    }
}

The Results

These are the test results (in seconds):

 

Without proxy
(deserializing
whole JSON
at once)

With proxy on
1-st level

With nested proxies

With nested proxies
on real example

Measurement 1

4,97

3,93

1,68

1,51

Measurement 2

4,71

4,35

1,68

1,51

Measurement 3

4,71

4,00

1,69

1,49

Average

4,80

4,10

1,68

1,50

Also the startup time of test WPF application Manufaktura.Controls.WPF.Test (with attached debugger) has been reduced on my machine from 12 seconds to 5.

Conclusion

The proposed method offers a significant improvement in performance and allows a programmer to still use strongly typed models. Deserialization of specific nodes is postponed until they are requested which may lead to some unpredictability. To ensure stable performance, the programmer must know the structure of JSON file and decide which subtrees will be deserialized at once and which will be lazy deserialized by proxies.

You also have to have in mind that if you are debugging application and you try to inspect the contents of the proxy, the whole JSON will be deserialized at once because all properties will be resolved. It may cause some performance issues while debugging.

License

This article, along with any associated source code and files, is licensed under The MIT License