In this tip, you will learn how to merge different csv files into one CSV file using Cinchoo ETL framework. It is very simple to use, with few lines of code, the conversion can be done. You can convert large files as the conversion process is stream based, quite fast and with low memory footprint.
ChoETL is an open source ETL (extract, transform and load) framework for .NET. It is a code based library for extracting data from multiple sources, transforming, and loading into your very own data warehouse in .NET environment. You can have data in your data warehouse in no time.
This article talks about merging different CSV files into one large CSV file using Cinchoo ETL framework. It is very simple to use, with few lines of code, the conversion can be done. You can convert large files as the conversion process is stream based, quite fast and with low memory footprint.
This framework library is written in C# using .NET 4.5 / .NET Core 3.x Framework.
3.1 Sample Data
Let's begin by looking into below sample CSV files. Assuming these CSV files are large in sizes, comes with different fields, may have column counts vary on them.
Listing 3.1.1. CSV file 1 (sample1.csv)
col1, col2, col3
val1, val2, val3
val11, val21, val31
Listing 3.1.2. CSV file 2 (sample2.csv)
col1, col3
val4, val5
val41, val51
Listing 3.1.3. CSV file 3 (sample3.csv)
col1, col4
val6, val7
val61, val71
After successful merge, the expected CSV file should look like below:
Listing 3.1.4. CSV output (merge.csv)
col1, col2, col3, col4
val1, val2, val3,
val11, val21, val31,
val4, , val5,
val41, , val51,
val6, , , val7
val61, , , val71
The first thing to do is to install ChoETL.JSON /ChoETL.JSON.NETStandard
nuget package. To do this, run the following command in the Package Manager Console.
.NET Framework
Install-Package ChoETL.JSON
.NET Core
Install-Package ChoETL.JSON.NETStandard
Now add ChoETL
namespace to the program.
using ChoETL;
3.2 Merge Operation
As input files may be large in sizes, we need to consider way to merge them efficiently. Here is an approach to adapt to merge such CSV files.
- First open each CSV file, read out the first item. Put them into collection.
- Next assess all possible columns comes from all the input CSV files by writing the collection to dummy
ChoCSVWriter
. Use WithMaxScanRows()
call to scan for the columns from all CSV files. Capture the Configuration
object (containing all the scanned CSV columns) for later use. - Finally, open each CSV file and writer them to
ChoCSVWriter
by using the captured configuration object.
Listing 3.2.1. Merge CSV files
private static void MergeCSVFiles()
{
ChoCSVRecordConfiguration config = null;
List<object> items = new List<object>();
using (var r1 = new ChoCSVReader("sample1.csv").WithFirstLineHeader())
{
using (var r2 = new ChoCSVReader("sample2.csv").WithFirstLineHeader())
{
using (var r3 = new ChoCSVReader("sample3.csv").WithFirstLineHeader())
{
items.Add(r1.First());
items.Add(r2.First());
items.Add(r3.First());
}
}
}
StringBuilder csv = new StringBuilder();
using (var w = new ChoCSVWriter(csv)
.WithFirstLineHeader()
.WithMaxScanRows(5)
.ThrowAndStopOnMissingField(false)
)
{
w.Write(items);
config = w.Configuration;
}
using (var r1 = new ChoCSVReader("sample1.csv").WithFirstLineHeader())
{
using (var r2 = new ChoCSVReader("sample2.csv").WithFirstLineHeader())
{
using (var r3 = new ChoCSVReader("sample3.csv").WithFirstLineHeader())
{
using (var w = new ChoCSVWriter(Console.Out, config)
.WithFirstLineHeader()
)
{
w.Write(Enumerable.Concat(r1, r2).Concat(r3));
}
}
}
}
}
Sample fiddle: https://dotnetfiddle.net/4L8f0k
For more information about Cinchoo ETL, please visit the other CodeProject articles:
History
- 1st November, 2021: Initial version