Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / C#

Cinchoo ETL - Compare Two CSV Files for Add, Changed or Deleted Records (Master vs Detail)

3.00/5 (1 vote)
13 Dec 2021CPOL2 min read 5.7K  
How to compare two CSV files for Add, Deleted and Changed records using Cinchoo ETL
In this tip, you will learn how to compare two CSV files for Add, Deleted and Changed records using Cinchoo ETL framework. It is very simple to use, with few lines of code, the conversion can be done. You can convert large files as the conversion process is stream based, quite fast and with low memory footprint.

1. Introduction

ChoETL is an open source ETL (extract, transform and load) framework for .NET. It is a code based library for extracting data from multiple sources, transforming, and loading into your very own data warehouse in .NET environment. You can have data in your data warehouse in no time.

This article talks about comparing two CSV files for Add, Deleted and Changed records using Cinchoo ETL framework. It is very simple to use, with few lines of code, the comparison can be done. You can do the comparison of large files easily as the operation is stream based, quite fast and with low memory footprint.

2. Requirement

This framework library is written in C# using .NET 4.5 / .NET Core 3.x Framework.

3. How to Use

3.1 Sample Data

Let's begin by looking into the below sample CSV files. Assume these CSV files are large in size, come with different fields, may have column counts vary on them.

Listing 3.1.1. Master CSV file (master.csv)
ID,name
1,Danny
2,Fred
3,Sam
Listing 3.1.2. Detail file (detail.csv)
csv
ID,name
1,Danny
3,Pamela
4,Fernando

After successful comparison, the expected CSV file should be produced as below:

In here, each record we captured the statuses from the comparison operation and output them to the file.

Listing 3.1.3. CSV output (output.csv)
csv
ID,name,Status
1,Danny,Unchanged
2,Fred,Deleted
3,Pamela,Changed
4,Fernando,New

The first thing to do is to install ChoETL / ChoETL.NETStandard nuget package. To do this, run the following command in the Package Manager Console.

.NET Framework

Install-Package ChoETL

.NET Core

Install-Package ChoETL.NETStandard

Now add ChoETL namespace to the program.

using ChoETL;

3.2 Comparison Operation

As input files may be large in sizes, we need to consider ways to merge them efficiently. Here is an approach to adapt to merge such CSV files.

  1. First, open each CSV file (master and detail CSV files), put them into variables.
  2. Then, open ChoCSVWriter for writing.
  3. Finally, call the Compare extension method on input CSV file streams for comparison. In here, specify the key ("ID") columns as well as compare ("name") columns. Key columns used to match the records between CSV files. Compare columns used to find if there is any change in the matched records. Compare operation yields Tuple<Master, Detail, Status> records.

Where CompareStatus enum holds below statuses

  • Unchanged
  • Changed
  • New
  • Deleted
Listing 3.2.1. Compare Master - Detail CSV files
JavaScript
private static void CompareCSVFiles()
{
    //Open master CSV file
    var r1 = ChoCSVReader.LoadText("master.csv").
             WithFirstLineHeader().WithMaxScanRows(1).OfType<ChoDynamicObject>();

    //Open detail CSV file
    var r2 = ChoCSVReader.LoadText("detail.csv").
             WithFirstLineHeader().WithMaxScanRows(1).OfType<ChoDynamicObject>();

    //Open final output CSV file
    using (var w = new ChoCSVWriter(Console.Out).WithFirstLineHeader())
    {
        //Compare Master and Detail records
        foreach (var t in r1.Compare(r2, "ID", "name" ))
        {
            dynamic v1 = t.MasterRecord as dynamic;
            dynamic v2 = t.DetailRecord as dynamic;

            //Check on the status and capture it
            if (t.Status == CompareStatus.Unchanged || t.Status == CompareStatus.Deleted)
            {
                v1.Status = t.Status.ToString();
                w.Write(v1);
            }
            else 
            {
                v2.Status = t.Status.ToString();
                w.Write(v2);
            }
        }
    }
}

Sample fiddle: https://dotnetfiddle.net/uPR5Sq

For more information about Cinchoo ETL, please visit the other CodeProject articles:

History

  • 13th December, 2021: Initial version

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)