In this tip, you will learn how to compare two CSV files for Add, Deleted and Changed records using Cinchoo ETL framework. It is very simple to use, with few lines of code, the conversion can be done. You can convert large files as the conversion process is stream based, quite fast and with low memory footprint.
ChoETL is an open source ETL (extract, transform and load) framework for .NET. It is a code based library for extracting data from multiple sources, transforming, and loading into your very own data warehouse in .NET environment. You can have data in your data warehouse in no time.
This article talks about comparing two CSV files for Add, Deleted and Changed records using Cinchoo ETL framework. It is very simple to use, with few lines of code, the comparison can be done. You can do the comparison of large files easily as the operation is stream based, quite fast and with low memory footprint.
This framework library is written in C# using .NET 4.5 / .NET Core 3.x Framework.
3.1 Sample Data
Let's begin by looking into the below sample CSV files. Assume these CSV files are large in size, come with different fields, may have column counts vary on them.
Listing 3.1.1. Master CSV file (master.csv)
ID,name
1,Danny
2,Fred
3,Sam
Listing 3.1.2. Detail file (detail.csv)
ID,name
1,Danny
3,Pamela
4,Fernando
After successful comparison, the expected CSV file should be produced as below:
In here, each record we captured the statuses from the comparison operation and output them to the file.
Listing 3.1.3. CSV output (output.csv)
ID,name,Status
1,Danny,Unchanged
2,Fred,Deleted
3,Pamela,Changed
4,Fernando,New
The first thing to do is to install ChoETL / ChoETL.NETStandard
nuget package. To do this, run the following command in the Package Manager Console.
.NET Framework
Install-Package ChoETL
.NET Core
Install-Package ChoETL.NETStandard
Now add ChoETL
namespace to the program.
using ChoETL;
3.2 Comparison Operation
As input files may be large in sizes, we need to consider ways to merge them efficiently. Here is an approach to adapt to merge such CSV files.
- First, open each CSV file (master and detail CSV files), put them into variables.
- Then, open
ChoCSVWriter
for writing. - Finally, call the
Compare
extension method on input CSV file streams for comparison. In here, specify the key
("ID
") columns as well as compare
("name
") columns. Key columns used to match the records between CSV files. Compare columns used to find if there is any change in the matched records. Compare operation yields Tuple<Master, Detail, Status>
records.
Where CompareStatus enum holds below statuses
- Unchanged
- Changed
- New
- Deleted
Listing 3.2.1. Compare Master - Detail CSV files
private static void CompareCSVFiles()
{
var r1 = ChoCSVReader.LoadText("master.csv").
WithFirstLineHeader().WithMaxScanRows(1).OfType<ChoDynamicObject>();
var r2 = ChoCSVReader.LoadText("detail.csv").
WithFirstLineHeader().WithMaxScanRows(1).OfType<ChoDynamicObject>();
using (var w = new ChoCSVWriter(Console.Out).WithFirstLineHeader())
{
foreach (var t in r1.Compare(r2, "ID", "name" ))
{
dynamic v1 = t.MasterRecord as dynamic;
dynamic v2 = t.DetailRecord as dynamic;
if (t.Status == CompareStatus.Unchanged || t.Status == CompareStatus.Deleted)
{
v1.Status = t.Status.ToString();
w.Write(v1);
}
else
{
v2.Status = t.Status.ToString();
w.Write(v2);
}
}
}
}
Sample fiddle: https://dotnetfiddle.net/uPR5Sq
For more information about Cinchoo ETL, please visit the other CodeProject articles:
History
- 13th December, 2021: Initial version