Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / C#

Cinchoo ETL - Parsing Huge JSON File as Stream

5.00/5 (2 votes)
14 Oct 2021CPOL2 min read 12.2K  
Quick tutorial about parsing large JSON file as stream
In this tip, you will learn how to parse large JSON file as stream using Cinchoo ETL framework. It is very simple to use, with few lines of code, the conversion can be done. You can convert large files as the conversion process is stream based, quite fast and with low memory footprint.

1. Introduction

ChoETL is an open source ETL (extract, transform and load) framework for .NET. It is a code based library for extracting data from multiple sources, transforming, and loading into your very own data warehouse in .NET environment. You can have data in your data warehouse in no time.

This article talks about parsing very large JSON file using Cinchoo ETL framework. It is very simple to use, with few lines of code, the parsing can be done. You can parse large files as stream based, quite fast and with low memory footprint.

2. Requirement

This framework library is written in C# using .NET 4.5 / .NET Core 3.x Framework.

3. How to Use

3.1 Sample Data

Let's begin by looking into the below JSON input file. Assume the file is very large (100+ MB), containing identical objects. The goal is to read the objects from the stream one at a time. This will eliminate the need to load entire contents of data into RAM as objects.

Listing 3.1.1. Sample JSON Data Input File (sample.json)
JavaScript
[
  {
    "id": 1,
    "value": "hello",
    "another_value": "world",
    "value_obj": {
      "name": "obj1"
    },
    "value_list": [
      1,
      2,
      3
    ]
  },
  {
    "id": 2,
    "value": "foo",
    "another_value": "bar",
    "value_obj": {
      "name": "obj2"
    },
    "value_list": [
      4,
      5,
      6
    ]
  }
]

The first thing to do is to install ChoETL.JSON / ChoETL.JSON.NETStandard nuget package. To do this, run the following command in the Package Manager Console.

.NET Framework

Install-Package ChoETL.JSON

.NET Core

Install-Package ChoETL.JSON.NETStandard

Now add ChoETL namespace to the program:

using ChoETL;

3.2 Quick Parsing

This approach shows how to parse large JSON file with little piece of code. No setup / POCO class are needed.

Listing 3.2.1. Quick JSON to YAML file conversion
JavaScript
private static void QuickParse()
{       
    using (var r = new ChoJSONReader("sample.json"))
    {
         foreach (var rec in r)
             Console.WriteLine(rec.Dump());
    }
}

Create an instance of ChoJSONReader for loading json (sample.yaml) file. Cinchoo ETL library parses the json file one item at a time and yield them as stream.

Sample fiddle: https://dotnetfiddle.net/PQz3ck

3.3 Using POCO Object

This approach shows you how to define POCO entity class and use them for the parsing process. This approach is more type safe and fine control over the conversion process like doing property validation, consuming callback machanism, etc.

First, create classes with properties matching JSON file:

Listing 3.3.1. Mapping Class
JavaScript
public class SampleObject
{
    [JsonProperty("id")]
    public int Id { get; set; }
    [JsonProperty("value")]
    public string Value { get; set; }
    [JsonProperty("another_value")]
    public string AnotherValue { get; set; }
    [JsonProperty("value_obj")]
    public ValueObject ValueObject { get; set; }
    [JsonProperty("value_list")]
    public int[] ValueList { get; set; }
}

public class ValueObject
{
    [JsonProperty("name")]
    public string Name { get; set; }
}

Then use them as below to do the parsing of the file.

Listing 3.3.2. Using POCO object to parse JSON file
JavaScript
private static void UsingPOCO()
{
    using (var r = new ChoJSONReader<SampleObject>("sample.json"))
    {
         foreach (var rec in r)
             Console.WriteLine(rec.Dump()); 
    }
}

Sample fiddle: https://dotnetfiddle.net/7gN02f

For more information about Cinchoo ETL, please visit the below Code Project article:

History

  • 14th October, 2021: Initial version

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)