Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / C#

Flatmapper

4.97/5 (27 votes)
9 Feb 2016CPOL4 min read 35.7K  
FlatMapper is a library to import and export data from and to plain text files

Introduction

FlatMapper is a library to import and export data from and to plain text files.

Plain text files are still very widely used on legacy systems and still a favorite format for Human-to-System interface. This project was born from a need to read and write data from plain text files. I needed a lightweight module that would do just that, no extra needless weight. Since most of the libraries that I found at the time were either code intrusive, or had extra dependencies, I decided to write my own.

My goal was to write it with a nice fluent API, must be simple to use, minimal dependencies and dependents and fast! It must work with any POCO and should not be bloated with features that are not part of the core. I had a great time developing it (I still do).

Since the project that originated this library, I already used it again on 2 other projects.

Key Features

  • Fast - Uses Static Reflection and Dynamic methods
  • LINQ Compatible
  • It supports character delimited and fixed length files
  • Non intrusive - You don't have to change your code. Any POCO will work
  • No external dependencies
  • Iterative reads - Doesn't need to load the entire file into memory
  • Multi-line support (Only on character delimited and quoted)
  • Nullables support
  • Enums support
  • Vitually any type support with FieldValueConverters
  • Fluent Interface
  • Per line/record Error handling
  • Simple to use

How to Install

To install Flatmapper, run the following command in the Package Manager Console:

Install-Package Flatmapper

How to Use

Before we start reading or writing from files, we need to specify the layout of the file. We only need to do this once.
Imagine the following scenario. We need to read and write from text files to and from this class:

C#
public class TestObject
{
    public int Id { get; set; }
    public string Description { get; set; }
    public int? NullableInt { get; set; }
    public Gender? NullableEnum { get; set; }
    public DateTime Date { get; set; }
}

In the following sections, you'll find out how to setup the file layout with both fixed length and character delimited.

Fixed Length Layout

C#
    var layout = new Layout<TestObject>.FixedLengthLayout()
                    .HeaderLines(1)
                    .WithMember(o => o.Id, set => 
                    set.WithLength(5).WithLeftPadding('0'))
                    .WithMember(o => o.Description, set => 
                    set.WithLength(25).WithRightPadding(' '))
                    .WithMember(o => o.NullableInt, 
			set => set.WithLength(5).AllowNull("=Null").WithLeftPadding('0'))
                    .WithMember(o => o.NullableEnum, set => set.WithLength(10).AllowNull("======NULL").WithLeftPadding(' '))
                    .WithMember(o => o.Date, set => set.WithLength(19).WithFormat(new CultureInfo("pt-PT"))); //PT-pt default dates are always fixed 19 chars "13-12-2015 23:41:41";    

Delimited Layout

C#
    var layout = new Layout<TestObject>.DelimitedLayout()
                    .WithDelimiter(";")
                    .WithQuote("\"")
                    .HeaderLines(1)
                    .WithMember(o => o.Id, set => set.WithLength(5))
                    .WithMember(o => o.Description, set => set.WithLength(25))
                    .WithMember(o => o.NullableInt, set => set.AllowNull("NULL"))
                    .WithMember(o => o.NullableEnum, set => set.AllowNull("NULL"))
                    .WithMember(o => o.Date, set => set.WithFormat(new CultureInfo("pt-PT")));

With this setup is also possible to have multi-line fields, as long they are quoted.

Reading and Writing

The reading is interactive, meaning that only when a new item is requested, the data will be read. This helps to avoid reading the entire file into memory and only then parsing the data. Data is parsed on demand.

This library connects into the `Stream` class of the core framework. This way, don't have restrictions in the encoding and it's outside the scope of the library to free any resource.

C#
//Reading data
using (var fileStream = File.OpenRead("c:\temp\data.txt"))
{
    var flatfile = new FlatFile<TestObject>(layout, fileStream);
    foreach(var objectInstance in flatfile.Read())
    {
        //Do Something....
    }
}

//Writing data
using (var fileStream = File.OpenWrite("c:\temp\data.txt"))
{
    var flatfile = new FlatFile<TestObject>(layout, fileStream);
    flatfile.Write(listOfObjects);
}

Error Handling

Optionally, per line/object instance, you can control the behavior if any error is thrown due to some unexpected format or any other error, for that matter.
By specifying a `Func<string, Exception, bool>` into the `handleEntryReadError` parameter of the constructor of `FlatFile<T>`, every-time any input error occurs, that function is executed, with the line and the `Exception` that was thrown. If that function returns `true`, the Exception is ignored, and the import continues. If not, the `Exception` that originated the call, will be re-thrown.

C#
private bool HandleEntryReadError(string line, Exception exception)
{
    Log.LogError("Error reading line :" + line, exception);
    return true;
}

/* ... */
var flatfile = new FlatFile<TestObject>(layout, fileStream, HandleEntryReadError);

Plays Well With Others

One of the core philosophies behind this library is that there are other libraries that already excel on their goal, so this should not be a one library make all, but a one goal library that follows the standards and can play well with others.

An example of that is how this library can work with Dapper in order to import and export data from the database.

C#
public void ImportData(IDbConnection connection)
{
    using (var file = File.OpenRead(ImportFile))
    {
        var flatfile = new FlatFile<TestObject>(layout, file);
        var itemsEnumerable = flatfile.Read();
        connection.Execute("spInsertItem", itemsEnumerable);
    }
}

public void ExportData(IDbConnection connection)
{
    using (var file = File.OpenWrite(ImportFile))
    {
        var flatfile = new FlatFile<TestObject>(layout, file);
        var query = "select Id, Description, NullableInt from TestDataTable";
        var items = connection.Query<TestObject>(query);
        flatfile.Write(items);
    }
}

Since is works with every POCO and there is no code intrusion in any way, there is no outstanding reason why this won't work as well with EntityFramework or any other micro or full fledged ORM.

Performance

Basic performance tests are made on https://github.com/kappy/FlatMapper.PerformanceTests.
For the test, I used a record with 3 parameters (very similar to the previous examples), 1.000.000 of them.
The files generated were about 22 MB, and these were the results:

Performance Test results

The test was run on a i7 Q740 @ 1.73GHz, SSD, and 6GB Ram.
This test used the nuget package. On recent tests, I discovered that a local build from the sources, using VS 2015, results in quite some performance gain, maybe because of different compiler versions.

The Road to 1.0

Before the 1.0 release, there is some stuff that I would like to have in the library.

  • Async (for this to actually work, it will need to work by blocks, dropping the Iterative reads feature)

Ideas

Some other ideas for after 1.0.

  • Per line/record Layout.. using a discriminator as base.

License and Contributions

Contributions are welcome, either bug-fixes, new features, or just filing up issues with ideas and suggestions. Any feedback is appreciated.

This library is open source and licensed under Apache License 2.0.

Thanks

  • Need to thank my wife for all those nights where the attention was on the computer and not on her;
  • My company Mindbus that allowed me not only to use my library on live projects but helped me fine in tuning it;
  • My colleague Nuno Santos, the man that originally wrote the Multi-line parser.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)