In this tip, you will learn how to flatten complex, nested JSON objects using Cinchoo ETL framework. It is very simple to use. With few lines of code, the conversion can be done. You can perform such conversion process as stream based, quite fast and with low memory footprint.
ChoETL is an open source ETL (extract, transform and load) framework for .NET. It is a code based library for extracting data from multiple sources, transforming, and loading into your very own data warehouse in .NET environment. You can have data in your data warehouse in no time.
This tip talks about flattening complex, nested JSON file using Cinchoo ETL framework. It is very simple to use, with few lines of code, the conversion can be done. You can convert large files as the conversion process is stream based, quite fast and with low memory footprint.
This framework library is written in C# using .NET 4.5 / .NET Core 3.x Framework.
3.1 Sample Data
Let's begin by looking into the below JSON file containing complex, nested structure. Say, for example, we have two companies and the first one has two branches and the second has just one branch.
Listing 3.1.1 Sample JSON File (company.json)
[
{
"ID": "1",
"Name": "C1",
"Branches": [
{
"City": "New York",
"Country": "USA"
},{
"City": "California",
"Country": "USA"
}
]
},
{
"ID": "2",
"Name": "C2",
"Branches": [
{
"City": "Mexico City",
"Country": "Mexico"
}
]
}
]
We are going to flatten using Cinchoo ETL Framework to get the result as shown in the below table:
ID | Name | Branches/0/City | Branches/0/Country | Branches/1/City | Branches/1/Country |
1 | C1 | New York | USA | California | USA |
2 | C2 | Mexico City | Mexico | null | null |
Expected JSON output looks as below:
[
{
"ID": "1",
"Name": "C1",
"Branches/0/City": "New York",
"Branches/0/Country": "USA",
"Branches/1/City": "California",
"Branches/1/Country": "USA"
},
{
"ID": "2",
"Name": "C2",
"Branches/0/City": "Mexico City",
"Branches/0/Country": "Mexico",
"Branches/1/City": null,
"Branches/1/Country": null
}
]
3.2 Install Library
Next, install ChoETL.JSON / ChoETL.JSON.NETStandard
nuget package. To do this, run the following command in the Package Manager Console.
.NET Standard Framework
Install-Package ChoETL.JSON
.NET Core
Install-Package ChoETL.JSON.NETStandard
Now add ChoETL
namespace to the program.
using ChoETL;
3.3 Quick Conversion
Let's use the library to convert the complex structured JSON file to flatten format. It is as simple as can be done with few lines of code. No POCO class needed. It is fast, stream based, and consumes low memory.
Listing 3.3.1. Quick JSON file conversion
private static void QuickConversion()
{
using (var r = new ChoJSONReader.LoadText("companies.json"))
{
using (var w = new ChoJSONWriter("companies_out.json")
.Configure(c => c.ThrowAndStopOnMissingField = false)
.Configure(c => c.FlattenNode = true)
.Configure(c => c.DefaultArrayHandling = false)
.Configure(c => c.NestedKeySeparator = '/')
.WithMaxScanNodes(3)
)
w.Write(r);
}
}
Create an instance of ChoJSONWriter
for producing flatten JSON (companies_out.json) output file. Then create an instance of ChoJSONReader
object for reading complex JSON (companies.json) file.
Where:
Configure(c => c.FlattenNode = true)
- tells the writer to flatten the json Configure(c => c.ThrowAndStopOnMissingField = false)
- configure the writer to ignore missing field while writing. In the sample above, the second node, comes with 1 branch object, but expect to output the second branch with null
values. Configure(c => c.NestedKeySeparator = '/')
- tells the writer to use '/'
as nested key separator. Configure(c => c.DefaultArrayHandling = false)
- tells the writer to turn off the treating the empty nodes as simple object (not as array). WithMaxScanNodes(3)
- tells the writer to scan the number of nodes to discover the optimal maximum number of fields required to output the flat out JSON.
Sample Fiddle: https://dotnetfiddle.net/uzZdKU
3.4 Using POCO Conversion
This approach uses POCO model classes to load the JSON file. First, define the POCO classes matches the companies.json file.
Listing 3.4.1. POCO classes
public class Company
{
public string ID { get; set; }
public string Name { get; set; }
public List<Branch> Branches { get; set; }
}
public class Branch
{
public string City { get; set; }
public string Country { get; set; }
}
Listing 3.4.2. Conversion using POCO classes
private static void SelectiveColumnTest()
{
using (var r = new ChoJSONReader<Company>("companies.json"))
{
using (var w = new ChoJSONWriter<Company>("companies_out.json")
.Configure(c => c.FlattenNode = true)
.Configure(c => c.ThrowAndStopOnMissingField = false)
.Configure(c => c.NestedKeySeparator = '/')
.WithMaxScanNodes(3)
)
w.Write(r);
}
}
Create an instance of ChoJSONWriter
for producing flatten JSON (companies_out.json) output file. Then create an instance of ChoJSONReader
object for reading complex JSON (companies.json) file.
Where:
Configure(c => c.FlattenNode = true)
- tells the writer to flatten the json Configure(c => c.ThrowAndStopOnMissingField = false)
- configures the writer to ignore missing field while writing. In the sample above, the second node, comes with 1 branch object, but expect to output the second branch with null
values. Configure(c => c.NestedKeySeparator = '/')
- tells the writer to use '/'
as nested key separator. WithMaxScanNodes(3)
- tells the writer to scan the number of nodes to discover the optimal maximum number of fields required output the flat out JSON.
Sample fiddle: https://dotnetfiddle.net/AX02BN
Download the sample attached above, try it.
For more information about Cinchoo ETL, please visit the other CodeProject articles:
History
- 25th October, 2021: Initial version