Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / C#

JsonPathToModel: Generic Data Importer

5.00/5 (3 votes)
28 Aug 2024CPOL6 min read 5.2K  
Implementing a configurable Data Importer that remains independent of input file formats is straightforward with JSONPath mappings
It is a routine task to import clients' data from other systems as a part of the boarding process. However, sometimes it can be expensive and require developers to be involved when Data Import is not flexible and expects a particular input file format. We can use JSONPath to map any file to the internal model for processing.

Introduction

To demonstrate how JSONPath can be used in data importing I created a Visual Studio solution that has standard for ONION architecture projects:

  • App (Console)
  • Domain
  • Application
  • Infrastructure (MS SQL database context)
  • Unit tests

You can download it from GitHub https://github.com/euklad/BlogCode folder Story-11-Generic-Data-Import

This article explains the code and the approach I used.

Background

JsonPathToModel is a C# .NET open-source library that allows navigating through in-memory models extracting and setting property values using JSONPath dot notation.

Github: https://github.com/Binoculars-X/JsonPathToModel

NuGet: https://www.nuget.org/packages/JsonPathToModel

Solution Overview

Let's start by cloning the code locally and opening the solution file .\Story-11-Generic-Data-Import\Story-11-Generic-Data-Import.sln in Visual Studio 2022.

DataImport.Domain project contains DBContext entities and model classes, they don't have any logic inside and are supposed only to store data and represent database tables.

DataImport.Application project is responsible for the business logic, the mapping algorithms should be implemented there as well as functional helpers.

DataImport.Infrastructure project has DbContext and data persistence logic, it also manages database migrations.
DataImport.Console project is a console application that assembles all the pieces and executes the data import process in a few simple steps:

  • Accept cmd line arguments
  • Execute data import application algorithm
  • Execute the persistence layer to save imported data 

Domain

Let's have a look at the Entities folder, it has three classes Customer, Product and Purchase. These classes are mapped directly to the SQL database tables and used in Model classes to store data. As you can see the entities have some extra properties that are only needed to implement relationships between entities. These relationships are important for Entity Framework for a correct work.

For example Purchase.cs:

C#
namespace DataImport.Domain.Entities;

public class Purchase
{
    public int Id { get; set; }
    public string? ExternalId { get; set; }
    public string? CustomerExternalId { get; set; }
    public string? ProductExternalId { get; set; }
    public decimal? TotalAmount { get; set; }
    public DateTime? PeriodStartDate { get; set; }
    public DateTime? PeriodEndDate { get; set; }
    public DateTime? OrderDate { get; set; }
    public DateTime? PaymentDate { get; set; }

    public string? Source { get; set; }
    public DateTime Created { get; set; } = DateTime.UtcNow;

    public int CustomerId { get; set; }
    public Customer? Customer { get; set; }

    public int ProductId { get; set; }
    public Product? Product { get; set; }
}

The Model folder has DataSnapshotModel - a class representing a full snapshot of importing records and has three collections for each entity respectively.

Another model class DataLineModel is used for the record by record processing logic to store one CSV file record at a time.

C#
using DataImport.Domain.Entities;

namespace DataImport.Domain.Model;

public class DataSnapshotModel
{
    public List<Customer> Customers { get; set; } = [];
    public List<Product> Products { get; set; } = [];
    public List<Purchase> Purchases { get; set; } = [];
}

public class DataLineModel
{
    public Customer Customer { get; set; } = new();
    public Product Product { get; set; } = new();
    public Purchase Purchase { get; set; } = new();
}

The last model class MappingConfig is used to deserialize JSON configuration and provide this as an in-memory configuration object accessible by C# code.

C#
namespace DataImport.Domain.Model;
public class MappingConfig
{
    public Dictionary<string, List<FileMapping>> FileMappings { get; set; } = [];
}

public class FileMapping
{
    public string JsonPath { get; set; } = null!;
    public int Position { get; set; }
}

As you can see there is no logic in the Domain project, only classes with properties to store data.

Application

The Application project encapsulates all business logic and algorithms, that do all the work, except persistence. 

The FileHelper is a static utility class that encapsulates file-handling logic. It has methods to load a CSV file to memory List collection and deserialize MappingConfig from a JSON file.

C#
using DataImport.Domain.Model;
using System.Text.Json;

namespace DataImport.Application.Helpers;
public static class FileHelper
{
    public static List<string> GetCsvFileLines(string fileName)
    {
        var result = new List<string>();

        using Stream stream = File.OpenRead(fileName)!;
        using StreamReader reader = new StreamReader(stream);
            
        while (!reader.EndOfStream)
        {
            result.Add(reader.ReadLine()!);
        }

        return result;
    }

    public static MappingConfig LoadConfig(string fileName)
    {
        using Stream stream = File.OpenRead(fileName)!;
        using StreamReader reader = new StreamReader(stream);

        var json = reader.ReadToEnd();
        var config = JsonSerializer.Deserialize<MappingConfig>(json)!;
        return config;
    }
}

The most interesting class is DataImporter that implements all mapping logic and converts a CSV file of any format and structure to a corresponding Model class. To achieve this flexibility the MappingConfig should provide a mapping from a CSV file position to the corresponding property in the Model using JSONPath dot notation.

The JSON config looks like this:

{
  "FileMappings": {
    "PersonExport.csv": [
      {
        "JsonPath": "$.Customer.ExternalId",
        "Position": 0
      },
      {
        "JsonPath": "$.Customer.FirstName",
        "Position": 1
      },
...

You can find all configuration data in .\DataImport.Tests\TestData\TestMappingConfig.json

The configuration provides a list of mappings for each file. Each mapping has JsonPath and Position.

Let's go through DataImporter and see how it works:

C#
using DataImport.Domain.Model;
using JsonPathToModel;
using DataImport.Application.Helpers;

namespace DataImport.Application.Mappings;
public class DataImporter
{
    private readonly IJsonPathModelNavigator _navigator;

    public DataImporter()
    {
        _navigator = new JsonPathModelNavigator(
            new NavigatorConfigOptions
            {
                OptimizeWithCodeEmitter = false
            });
    }

    public DataSnapshotModel ReadModelFromFiles(List<string> fileList, MappingConfig config)
    {
        var model = new DataSnapshotModel();

        foreach (var filePath in fileList)
        {
            var csv = FileHelper.GetCsvFileLines(filePath);

            // skip the 1st line with column definitions
            foreach (var line in csv.Skip(1))
            {
                var cells = line.Split(',');
                ImportFileLine(model, config, filePath, cells);
            }
        }

        return model;
    }

    private void ImportFileLine(DataSnapshotModel model, MappingConfig config, string filePath, string[] cells)
    {
        var fileName = Path.GetFileName(filePath);
        var mappings = config.FileMappings[fileName];
        var target = new DataLineModel();

        // iterate through mappings and populate record values from cells
        foreach (var mapping in mappings)
        {
            _navigator.SetValue(target, mapping.JsonPath, cells[mapping.Position]);
        }

        // add to model only those entities that have at least ExternalId updated  
        if (target.Customer.ExternalId != null)
        {
            target.Customer.Source = fileName;
            model.Customers.Add(target.Customer);
        }
        if (target.Product.ExternalId != null)
        {
            target.Product.Source = fileName;
            model.Products.Add(target.Product);
        }
        if (target.Purchase.ExternalId != null)
        {
            target.Purchase.Source = fileName;
            model.Purchases.Add(target.Purchase);
        }
    }
}

In the constructor, we create an instance of JsonPathModelNavigator that will be used to set Model property values using JSONPath bindings like "$.Customer.ExternalId".

ReadModelFromFiles accepts a list of CSV files for processing and the MappingConfig. It iterates through the files and for each file reads a file content using the helper GetCsvFileLines method. Then for each line of the file, it splits the line into cells and executes the ImportFileLine method.

ImportFileLine method finds the corresponding mapping configuration for the file. Then it creates a model to process the file line. Then for each configuration mapping it calls JsonPathModelNavigator SetValue method, updating the model from a CSV line cell using JsonPath.

For example, for the first mapping for PersonExport.csv file the model update will look like:

C#
_navigator.SetValue(target, "$.Customer.ExternalId", cells[0]);

and it is an equivalent of a direct assignment operation in .NET:

C#
target.Customer.ExternalId = cells[0];

The JsonPathModelNavigator will find an appropriate property in the target model, convert the value to a target data type, and assign it.

Then once the model record is populated, we need to check which model object is affected, comparing  ExternalId to null. The affected objects are added to the final snapshot, which eventually will contain all CSV records mapped to the Model objects.

Infrastructure

The demo stores data in the SQL Server database using Entity Framework. MyDbContext class consists of all required database definitions and is pretty much standard. It applies database migrations automatically. It will create and update the database schema only once in the first run. It uses a named connection string, which is added to the Console appsetings.json file.

The database schema is represented in this diagram:

 

The project already has Migrations to generate the database, but if you want to make some changes and add more migrations there is an instruction in the solution readme.md file.

DataAccessService implements persistence logic and converts a model snapshot to database insert operations. The ImportDataSnapshot method accepts the snapshot and inserts Customers, Products and Purchases in one transaction:

C#
public void ImportDataSnapshot(DataSnapshotModel snapshot)
{
    using (var transaction = _db.Database.BeginTransaction())
    {
        InsertCustomers(snapshot);
        InsertProducts(snapshot);
        InsertPurchases(snapshot);

        transaction.Commit();
    }
}

Before insertion, the InsertPurchases method resolves Foreign Key properties to Customer and Product by ExternalId:

C#
    private void InsertPurchases(DataSnapshotModel snapshot)
    {
        // resolve Id by ExternalId
        var customerDict = snapshot.Customers.ToDictionary(c => c.ExternalId!, c => c.Id);
        var productDict = snapshot.Products.ToDictionary(c => c.ExternalId!, c => c.Id);

        foreach (var item in snapshot.Purchases)
        {
            item.CustomerId = customerDict[item.CustomerExternalId!];
            item.ProductId = productDict[item.ProductExternalId!];
            _db.Add(item);
        }

        _db.SaveChanges();
    }

Console

The Console application is a runnable project that you can test locally.  It has .\Properties\launchSettings.json file that specifies the necessary details for the run:

{
  "profiles": {
    "DataImport.Console": {
      "commandName": "Project",
      "commandLineArgs": "TestMappingConfig.json PersonExport.csv SubscriptionExport.csv OrderExport.csv"
    }
  }
}

The property "commandLineArgs" contains test JSON and CSV data file names that are used for debugging.

You can see the mentioned files as links. The actual files kept in .\Tests\TestData\ folder.

The Program class starts by defining the Host builder - a standard boilerplate to set up IServiceProvider for Dependency Injection (DI). It also adds a reference to the appsettings.json file that has a connection string:

{
  "ConnectionStrings": {
    "Default": "Data Source=.;Initial Catalog=mydatabase;Integrated Security=True;TrustServerCertificate=True;"
  }
}

The DI is used to construct the DataAccessService. Then we read the MappingConfig configuration and assemble CSV file names to one list, to supply all that to the DataImporter::ReadModelFromFiles. It returns the snapshot that we save to the database using ImportDataSnapshot method:

C#
using DataImport.Infrastructure;
using Microsoft.EntityFrameworkCore;
using Microsoft.Extensions.Configuration;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Hosting;
using DataImport.Application.Mappings;
using DataImport.Application.Helpers;

namespace DataImport.Console;

class Program
{
    static void Main(string[] args)
    {
        // app builder
        var app = Host.CreateDefaultBuilder()
            .ConfigureAppConfiguration((hostingContext, config) =>
            {
                config.Sources.Clear();
                config.AddConfiguration(hostingContext.Configuration);
                config.AddJsonFile("appsettings.json");
                config.AddJsonFile($"appsettings.Development.json", true, true);
            })
            .ConfigureServices(services =>
            {
                services.AddDbContext<MyDbContext>(options => options.UseSqlServer(), contextLifetime: ServiceLifetime.Singleton);
                services.AddSingleton<IDataAccessService, DataAccessService>();
            })
            .Build();

        var dataService = app.Services.GetService<IDataAccessService>()!;

        // read arguments
        var configFile = args[0];
        var config = FileHelper.LoadConfig(configFile);
        var csvFiles = new List<string>();

        for (int i = 1; i < args.Length; i++)
        {
            csvFiles.Add(args[i]);
        }

        // read snapshot
        var dataImporter = new DataImporter();
        var shapshot = dataImporter.ReadModelFromFiles(csvFiles, config);

        // save snapshot to db
        dataService.ImportDataSnapshot(shapshot);

        System.Console.ReadKey();
    }
}

If you run the application it should create a database for you and populate all three tables by data from the test files.

If you select data from the tables you should see something like this:

Unit Tests

DataImport.Tests project has a TestData folder with test data files and test mapping configuration. The Console application re-uses these files for running.

However, instead of running the Console app, you can try to run unit tests from the UnitTest1 class.

For example, this test calls ReadModelFromFiles and checks that all three lines from the CSV file are imported to the snapshot correctly:

C#
[Fact]
public void DataImporter_Should_ReadCustomer()
{
    var config = LoadConfig("DataImport.Tests.TestData.TestMappingConfig.json");
    List<string> fileList = ["./TestData/PersonExport.csv"];
    var importer = new DataImporter();
    var model = importer.ReadModelFromFiles(fileList, config);

    Assert.NotNull(model);
    Assert.Equal(3, model.Customers.Count);
    Assert.Equal("C00006", model.Customers[0].ExternalId);
    Assert.Empty(model.Products);
    Assert.Empty(model.Purchases);
}

I believe that if your code is easily covered by unit tests, it indicates good design.

Summary

This article is intended to demonstrate the flexibility of using JSONPath to access in-memory models when you cannot predict all possible future use cases.

We discussed a generic model mapping approach, where you can configure a set of mappings between a position in a CSV file and a target model property, and it will be enough to import data to your system.

Thanks for reading.

History

Keep a running update of any changes or improvements you've made here.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)