Contents
- Introduction
- Requirement
- "Hello World!" Sample
- Quick load - Data First Approach
- Code First Approach
- Configuration First Approach
- Code First with declarative configuration
- Reading All Records
- Read Records Manually
- Customize FixedLength Record
- Customize FixedLength Header
- Customize FixedLength Fields
- DefaultValue
- ChoFallbackValue
- Type Converters
- Declarative Approach
- Configuration Approach
- Validations
- Callback Mechanism
- BeginLoad
- EndLoad
- BeforeRecordLoad
- AfterRecordLoad
- RecordLoadError
- BeforeRecordFieldLoad
- AfterRecordFieldLoad
- RecordLoadFieldError
- Customization
- AsDataReader Helper Method
- AsDataTable Helper Method
- Using Dynamic Object
- DefaultValue
- ChoFallbackValue
- FieldType
- Type Converters
- Validations
- Working with sealed POCO object
- Exceptions
- Tips
- Multiline FixedLength column value
- FixedLength column value with single quotes
- Using MetadataType Annotation
- Configuration Choices
- Manual Configuration
- Auto Map Configuration
- Attaching MetadataType class
- LoadText Helper Method
- Advanced Topics
- Override Converters Format Specs
- Currency Support
- Enum Support
- Boolean Support
- DateTime Support
- Fluent API
- WithRecordLength
- WithFirstLineHeader
- WithField
- QuoteAllFields
- History
ChoETL is an open source ETL (extract, transform and load) framework for .NET. It is a code based library for extracting data from multiple sources, transforming, and loading into your very own data warehouse in .NET environment. You can have data in your data warehouse in no time.
This article talks about using FixedLengthReader
component offered by ChoETL framework. It is a simple utility class to extract FixedLength
data from file / source.
Features
- Follows
FixedLength
standard file rules. Gracefully handles data fields that contains commas and line breaks - Exposes
IEnumarable
list of objects - which is often used with LINQ query for projection, aggregation and filtration, etc. - Supports deferred reading
- Supports processing files with culture specific date, currency and number formats.
- Supports different character encoding
- Recognizes a wide variety of date, currency, enum, boolean and number formats when reading files
- Provides fine control of date, currency, enum, boolean, number formats when writing files
- Detailed and robust error handling, allowing you to quickly find and fix problems
- Shorten your development time
This framework library is written in C# using .NET 4.5 Framework.
- Open VS.NET 2013 or higher.
- Create a sample VS.NET (.NET Framework 4.5) Console Application project.
- Install ChoETL via Package Manager Console using Nuget Command:
Install-Package ChoETL
- Use the
ChoETL
namespace.
Let's begin by looking into a simple example of reading FixedLength
file with record length of 18 chars, having 2 columns (Id: 8 chars long, Name: 10 chars long)
Listing 3.1 Sample FixedLength Data File (RecordLength: 18 chars)
Id Name
1 Carl
2 Mark
There are number of ways in which you can get the FixedLength
file parsing started with minimal setup.
It is the quick way to load a FixedLength
file in no time. No POCO object is required. Sample code below shows how to load the file.
Listing 3.1.1 Load FixedLength File using Iterator
static void QuickDynamicLoadTestUsingIterator()
{
using (var stream = new MemoryStream())
using (var reader = new StreamReader(stream))
using (var writer = new StreamWriter(stream))
{
writer.WriteLine("Id Name ");
writer.WriteLine("1 Carl ");
writer.WriteLine("2 Mark ");
writer.Flush();
stream.Position = 0;
foreach (dynamic row in new ChoFixedLengthReader(reader).WithFirstLineHeader())
{
Console.WriteLine(String.Format("Id: {0}", row.Id));
Console.WriteLine(String.Format("Name: {0}", row.Name));
}
}
}
Listing 3.1.2 Load FixedLength File using Loop
static void QuickDynamicLoadTest()
{
dynamic row = null;
using (var stream = new MemoryStream())
using (var reader = new StreamReader(stream))
using (var writer = new StreamWriter(stream))
using (var parser = new ChoFixedLengthReader(reader).WithFirstLineHeader())
{
writer.WriteLine("Id Name ");
writer.WriteLine("1 Carl ");
writer.WriteLine("2 Mark ");
writer.Flush();
stream.Position = 0;
while ((row = parser.Read()) != null)
{
Console.WriteLine(String.Format("Id: {0}", row.Id));
Console.WriteLine(String.Format("Name: {0}", row.Name));
}
}
}
This model uses generic FixedLengthReader
object for parsing the file. FixedLengthReader
auto discovers the file layout schema from the first row automatically, ONLY if column values of first row are spaced between them. In case the auto discovery leads to yield incorrect file layout schema, you can spell out the schema by yourself via Configuration. Will be discussed down below.
If the input feed does not have header, FixedLengthReader
auto name the columns as Column1, Column2 ...
in the dynamic object model.
This is another way to parse and load FixedLength
file using POCO class. First, define a simple data class to match the underlying FixedLength
file layout.
Listing 3.2.1 Simple POCO Entity Class
public partial class EmployeeRecSimple
{
public int Id { get; set; }
public string Name { get; set; }
}
In the above, the class defines properties matching the sample FixedLength
file template.
Listing 3.2.2 Load FixedLength File
static void CodeFirstApproach()
{
EmployeeRecSimple row = null;
using (var stream = new MemoryStream())
using (var reader = new StreamReader(stream))
using (var writer = new StreamWriter(stream))
using (var parser = new ChoFixedLengthReader<EmployeeRecSimple>
(reader).WithFirstLineHeader())
{
writer.WriteLine("Id Name ");
writer.WriteLine("1 Carl ");
writer.WriteLine("2 Mark ");
writer.Flush();
stream.Position = 0;
while ((row = parser.Read()) != null)
{
Console.WriteLine(String.Format("Id: {0}", row.Id));
Console.WriteLine(String.Format("Name: {0}", row.Name));
}
}
}
FixedLengthReader
auto discovers the file layout schema from the first row automatically, if column values of header row are spaced between them. In case, if the auto discovery leads to incorrect file layout schema, you can spell out the schema via Configuration.
In this model, we define the FixedLength
configuration with all the necessary parsing parameters along with FixedLength
columns matching with the underlying FixedLength
file.
Listing 3.3.1 Define FixedLength Configuration
ChoFixedLengthRecordConfiguration config = new ChoFixedLengthRecordConfiguration();
config.FixedLengthRecordFieldConfigurations.Add
(new ChoFixedLengthRecordFieldConfiguration("Id", 0, 8)
{ FieldType = typeof(int) });
config.FixedLengthRecordFieldConfigurations.Add
(new ChoFixedLengthRecordFieldConfiguration("Name", 8, 10)
{ FieldType = typeof(string) });
In the above, we define the configuration matching the sample FixedLength
file template.
Listing 3.3.2 Load FixedLength file without POCO Object
static void ConfigFirstApproachReadAsDynamicRecords()
{
ChoFixedLengthRecordConfiguration config = new ChoFixedLengthRecordConfiguration();
config.FixedLengthRecordFieldConfigurations.Add
(new ChoFixedLengthRecordFieldConfiguration("Id", 0, 8)
{ FieldType = typeof(int) });
config.FixedLengthRecordFieldConfigurations.Add
(new ChoFixedLengthRecordFieldConfiguration("Name", 8, 10)
{ FieldType = typeof(string) });
dynamic row = null;
using (var stream = new MemoryStream())
using (var reader = new StreamReader(stream))
using (var writer = new StreamWriter(stream))
using (var parser = new ChoFixedLengthReader(reader, config).WithFirstLineHeader())
{
writer.WriteLine("Id Name ");
writer.WriteLine("1 Carl ");
writer.WriteLine("2 Mark ");
writer.Flush();
stream.Position = 0;
while ((row = parser.Read()) != null)
{
Console.WriteLine(String.Format("Id: {0}", row.Id));
Console.WriteLine(String.Format("Name: {0}", row.Name));
}
}
}
Listing 3.3.3 Load FixedLength file with POCO object
static void ConfigFirstApproachReadAsTypedRecords()
{
ChoFixedLengthRecordConfiguration config = new ChoFixedLengthRecordConfiguration();
config.FixedLengthRecordFieldConfigurations.Add
(new ChoFixedLengthRecordFieldConfiguration("Id", 0, 8)
{ FieldType = typeof(int) });
config.FixedLengthRecordFieldConfigurations.Add
(new ChoFixedLengthRecordFieldConfiguration("Name", 8, 10)
{ FieldType = typeof(string) });
EmployeeRecSimple row = null;
using (var stream = new MemoryStream())
using (var reader = new StreamReader(stream))
using (var writer = new StreamWriter(stream))
using (var parser =
new ChoFixedLengthReader<EmployeeRecSimple>(reader, config).WithFirstLineHeader())
{
writer.WriteLine("Id Name ");
writer.WriteLine("1 Carl ");
writer.WriteLine("2 Mark ");
writer.Flush();
stream.Position = 0;
while ((row = parser.Read()) != null)
{
Console.WriteLine(String.Format("Id: {0}", row.Id));
Console.WriteLine(String.Format("Name: {0}", row.Name));
}
}
}
This is the combined approach to define POCO entity class along with FixedLength
configuration parameters decorated declaratively.
Listing 3.4.1 Define POCO Object
public partial class EmployeeRec
{
[ChoFixedLengthRecordField(0, 8)]
public int Id { get; set; }
[ChoFixedLengthRecordField(8, 10)]
public string Name { get; set; }
}
The code above illustrates about defining POCO object to carry the values of each record line in the input file. First thing defines property for each record field with ChoFixedLengthRecordFieldAttribute
to qualify for FixedLength
record mapping. Each property must specify StartIndex, Size
in order to be mapped to FixedLength column. StartIndex
is 0
based.
It is very simple and ready to extract FixedLength
data in no time.
Listing 3.4.2 Main Method
static void CodeFirstWithDeclarativeApproachRead()
{
EmployeeRec row = null;
using (var stream = new MemoryStream())
using (var reader = new StreamReader(stream))
using (var writer = new StreamWriter(stream))
using (var parser =
new ChoFixedLengthReader<EmployeeRec>(reader).WithFirstLineHeader())
{
writer.WriteLine("Id Name ");
writer.WriteLine("1 Carl ");
writer.WriteLine("2 Mark ");
writer.Flush();
stream.Position = 0;
while ((row = parser.Read()) != null)
{
Console.WriteLine(String.Format("Id: {0}", row.Id));
Console.WriteLine(String.Format("Name: {0}", row.Name));
}
}
}
We start by creating a new instance of ChoFixedLengthReader
object. That's all. All the heavy lifting of parsing and loading FixedLength
data stream into the objects is done by the parser under the hood.
By default, FixedLengthReader
discovers and uses default configuration parameters while loading FixedLength
file. These can be overridable according to your needs. The following sections will give details about each configuration attributes.
It is as easy as setting up POCO object match up with FixedLength
file structure, you can read the whole file as enumerable pattern. It is a deferred execution mode, but be extra careful while making any aggregate operation on them. This will load the entire file records into memory.
Listing 4.1 Read FixedLength File
foreach (var e in new ChoFixedLengthReader<EmployeeRec>("Emp.txt"))
{
Console.WriteLine(String.Format("Id: {0}", e.Id));
Console.WriteLine(String.Format("Name: {0}", e.Name));
}
or:
Listing 4.2 Read FixedLength File Stream
foreach (var e in new ChoFixedLengthReader<EmployeeRec>(textReader))
{
Console.WriteLine(String.Format("Id: {0}", e.Id));
Console.WriteLine(String.Format("Name: {0}", e.Name));
}
This model keeps your code elegant, clean, easy to read and maintain. Also leverages LINQ extension methods to perform grouping, joining, projection, aggregation, etc.
Listing 4.3 Using LINQ
var list = (from o in new ChoFixedLengthReader<EmployeeRec>("Emp.txt")
where o.Name != null && o.Name.StartsWith("R")
select o).ToArray();
foreach (var e in list)
Console.WriteLine(e.ToStringEx());
It is as easy as setting up POCO object match up with FixedLength
file structure, you can read the whole file as enumerable pattern
Listing 5.1 Read FixedLength File
var reader = new ChoFixedLengthReader<EmployeeRec>("Emp.txt");
var rec = (object)null;
while ((rec = reader.Read()) != null)
{
Console.WriteLine(String.Format("Id: {0}", rec.Id));
Console.WriteLine(String.Format("Name: {0}", rec.Name));
}
Using ChoFixedLengthRecordObjectAttribute
, you can customize the POCO entity object declaratively.
Listing 6.1 Customizing POCO Object for Each Record
[ChoFixedLengthRecordObject(18)]
public class EmployeeRec
{
[ChoFixedLengthRecordField(0, 8, FieldName = "id")]
public int Id { get; set; }
[ChoFixedLengthRecordField(8, 10, FieldName ="Name", QuoteField = true)]
[Required]
[DefaultValue("ZZZ")]
public string Name { get; set; }
}
Here are the available attributes to carry out customization of FixedLength
load operation on a file.
RecordLength
- Optional. If not specified, it will be calculated from all the properties defined in the POCO class. EOLDelimiter
- The value used to separate FixedLength
rows. Default is \r\n (NewLine). CultureName
- The culture name (e.g., en-US
, en-GB
) used to read and write FixedLength
data. IgnoreEmptyLine
- A flag to let the reader know if a record should be skipped when reading if it's empty. A record is considered empty if all fields are empty. Comments
- The value used to denote a line that is commented out. Multiple comments can be specified. Must be separated by comma. QuoteChar
- The value used to escape fields that contain a delimiter, quote, or line ending. QuoteAllFields
- N/A for reader. Encoding
- The encoding of the FixedLength
file. HasExcelSeperator
- N/A for reader. Reader seamlessly recognizes the Excel separator if specified in the FixedLength
file and use them for parsing. ColumnCountStrict
- This flag indicates if an exception should be thrown if reading an expected field is missing. ColumnOrderStrict
- This flag indicates if an exception should be thrown if reading an expected field is in wrong position in the file. This check will be performed only when ColumnCountStrict
is true
. BufferSize
- The size of the internal buffer that is used when reader is from the StreamReader
. ErrorMode
- This flag indicates if an exception should be thrown if reading and an expected field is failed to load. This can be overridden per property. Possible values are:
IgnoreAndContinue
- Ignore the error, record will be skipped and continue with next. ReportAndContinue
- Report the error to POCO entity if it is of IChoNotifyRecordRead
type ThrowAndStop
- Throw the error and stop the execution
IgnoreFieldValueMode
- A flag to let the reader know if a record should be skipped when reading if it's empty / null
. This can be overridden per property. Possible values are:
Null
- N/A DBNull
- N/A Empty
- skipped if the record value is empty WhiteSpace
- skipped if the record value contains only whitespaces
ObjectValidationMode
- A flag to let the reader know about the type of validation to be performed with record object. Possible values are:
Off
- No object validation performed MemberLevel
- Validation performed at the time of each FixedLength
property gets loaded with value ObjectLevel
- Validation performed after all the properties are loaded to the POCO object
If the FixedLength
file has header, you can instruct the POCO entity by using ChoFixedLengthFileHeaderAttribute
.
Listing 6.1 Customizing POCO object for file header
[ChoFixedLengthFileHeader]
public class EmployeeRec
{
[ChoFixedLengthRecordField(0, 8, FieldName = "id")]
public int Id { get; set; }
[ChoFixedLengthRecordField(8, 10, FieldName ="Name", QuoteField = true)]
[Required]
[DefaultValue("ZZZ")]
public string Name { get; set; }
}
Here are the available members to add some customization to it according to your need.
FillChar
- N/A for reader Justification
- N/A for reader TrimOption
- This flag tells the reader to trim whitespace from the beginning and ending of the FixedLength
column header when reading. Possible values are Trim
, TrimStart
, TrimEnd
. Truncate
- N/A for reader
For each FixedLength
column, you can specify the mapping in POCO entity property using ChoFixedLengthRecordFieldAttribute
.
Listing 6.1 Customizing POCO Object for FixedLength Columns
[ChoFixedLengthFileHeader]
public class EmployeeRec
{
[ChoFixedLengthRecordField(0, 8, FieldName = "id")]
public int Id { get; set; }
[ChoFixedLengthRecordField(8, 10, FieldName ="Name", QuoteField = true)]
[Required]
[DefaultValue("ZZZ")]
public string Name { get; set; }
}
Here are the available members to add some customization to it for each property:
StartIndex
- The zero-based starting character position of a column in the fixed length row. Size
- Number of characters in the column. FieldName
- When mapping by name, you specify the name of the FixedLength
column that you want to use for that property. For this to work, the FixedLength
file must have a header record. The name you specify must match with the name of the header record. FillChar
- N/A for reader. FieldValueJustification
- N/A for reader. FieldValueTrimOption
- This flag tells the reader to trim whitespace from the beginning and ending of the field value when reading. Possible values are Trim
, TrimStart
, TrimEnd
. Truncate
- N/A for reader. Size
- N/A for reader. QuoteField
- A flag that tells the reader that the FixedLength
column value is surrounded by quotes. ErrorMode
- This flag indicates if an exception should be thrown if reading and an expected field failed to load. Possible values are:
IgnoreAndContinue
- Ignore the error and continue to load other properties of the record. ReportAndContinue
- Report the error to POCO entity if it is of IChoRecord
type. ThrowAndStop
- Throw the error and stop the execution.
IgnoreFieldValueMode
- A flag to let the reader know if a record should be skipped when reading if it's empty / null
. Possible values are:
Null
- N/A DBNull
- N/A Empty
- skipped if the record value is empty WhiteSpace
- skipped if the record value contains only whitespaces
It is the value used and set to the property when the FixedLength
value is empty or whitespace (controlled via IgnoreFieldValueMode
).
Any POCO entity property can be specified with default value using System.ComponentModel.DefaultValueAttribute
.
It is the value used and set to the property when the FixedLength
value failed to set. Fallback
value only set when ErrorMode
is either IgnoreAndContinue
or ReportAndContinue
.
Any POCO entity property can be specified with fallback value using ChoETL.ChoFallbackValueAttribute
.
Most of the primitive types are automatically converted and set them to the properties. If the value of the FixedLength
field can't automatically be converted into the type of the property, you can specify a custom / built-in .NET converters to convert the value. These can be either IValueConverter
or TypeConverter
converters.
There are couple of ways in which you can specify the converters for each field:
- Declarative Approach
- Configuration Approach
This model is applicable to POCO entity object only. If you have POCO class, you can specify the converters to each property to carry out necessary conversion on them. The samples below show the way to do it.
[ChoFixedLengthFileHeader]
public class EmployeeRec
{
[ChoFixedLengthRecordField(0, 8)]
[ChoTypeConverter(typeof(IntConverter))]
public int Id { get; set; }
[ChoFixedLengthRecordField(8, 10, QuoteField = true)]
[Required]
[DefaultValue("ZZZ")]
public string Name { get; set; }
}
Listing 8.3.1.2 IntConverter Implementation
public class IntConverter : IValueConverter
{
public object Convert(object value, Type targetType,
object parameter, CultureInfo culture)
{
return value;
}
public object ConvertBack(object value, Type targetType,
object parameter, CultureInfo culture)
{
return value;
}
}
In the example above, we defined custom IntConverter
class. And showed how to use it with 'Id
' FixedLength
property.
This model is applicable to both dynamic and POCO entity object. This gives freedom to attach the converters to each property at runtime. This takes the precedence over the declarative converters on POCO classes.
Listing 8.3.2.2 Specifying TypeConverters
ChoFixedLengthRecordConfiguration config = new ChoFixedLengthRecordConfiguration();
config.FileHeaderConfiguration.HasHeaderRecord = true;
config.ThrowAndStopOnMissingField = false;
ChoFixedLengthRecordFieldConfiguration idConfig =
new ChoFixedLengthRecordFieldConfiguration("Id", 0, 8);
idConfig.AddConverter(new IntConverter());
config.FixedLengthRecordFieldConfigurations.Add(idConfig);
config.FixedLengthRecordFieldConfigurations.Add
(new ChoFixedLengthRecordFieldConfiguration("Name", 8, 10));
In the above, we construct and attach the IntConverter
to 'Id
' field using AddConverter
helper method in ChoFixedLengthRecordFieldConfiguration
object.
Likewise, if you want to remove any converter from it, you can use RemoveConverter
on ChoFixedLengthRecordFieldConfiguration
object.
FixedLengthReader
leverages both System.ComponentModel.DataAnnotations and Validation Block
validation attributes to specify validation rules for individual fields of POCO entity. Refer to the MSDN site for a list of available DataAnnotation
s validation attributes.
Listing 8.4.1 Using Validation Attributes in POCO Entity
[ChoFixedLengthFileHeader]
[ChoFixedLengthRecordObject(18)]
public partial class EmployeeRec
{
[ChoFixedLengthRecordField(0, 8, FieldName = "id")]
[ChoTypeConverter(typeof(IntConverter))]
[Range(1, int.MaxValue, ErrorMessage = "Id must be > 0.")]
[ChoFallbackValue(1)]
public int Id { get; set; }
[ChoFixedLengthRecordField(8, 10, FieldName = "Name")]
[Required]
[DefaultValue("ZZZ")]
[ChoFallbackValue("XXX")]
public string Name { get; set; }
}
In the example above, used Range
validation attribute for Id
property. Required
validation attribute to Name
property. FixedLengthReader
performs validation on them during load based on Configuration.ObjectValidationMode
is set to ChoObjectValidationMode.MemberLevel
or ChoObjectValidationMode.ObjectLevel
.
Sometimes, you may want to override the defined declarative validation behaviors that come with POCO class, you can do with Cinchoo ETL via configuration approach. The sample below shows the way to override them.
static void ValidationOverridePOCOTest()
{
ChoFixedLengthRecordConfiguration config = new ChoFixedLengthRecordConfiguration();
var idConfig = new ChoFixedLengthRecordFieldConfiguration("Id", 0, 8);
idConfig.Validators = new ValidationAttribute[] { new RequiredAttribute() };
config.FixedLengthRecordFieldConfigurations.Add(idConfig);
config.FixedLengthRecordFieldConfigurations.Add
(new ChoFixedLengthRecordFieldConfiguration("Name", 8, 10));
config.FixedLengthRecordFieldConfigurations.Add
(new ChoFixedLengthRecordFieldConfiguration("Salary", 8, 10)
{ FieldType = typeof(ChoCurrency) });
using (var stream = new MemoryStream())
using (var reader = new StreamReader(stream))
using (var writer = new StreamWriter(stream))
using (var parser =
new ChoFixedLengthReader<EmployeeRecWithCurrency>(reader, config))
{
writer.WriteLine(",Carl,$100000");
writer.WriteLine("2,Mark,$50000");
writer.WriteLine("3,Tom,1000");
writer.Flush();
stream.Position = 0;
object rec;
while ((rec = parser.Read()) != null)
{
Console.WriteLine(String.Format("Id: {0}", rec.Id));
Console.WriteLine(String.Format("Name: {0}", rec.Name));
Console.WriteLine(String.Format("Salary: {0}", rec.Salary));
}
}
}
public class EmployeeRecWithCurrency
{
public int? Id { get; set; }
public string Name { get; set; }
public ChoCurrency Salary { get; set; }
}
In some cases, you may want to take control and perform manual self validation within the POCO entity class. This can be achieved by inheriting POCO object from IChoValidatable
interface.
Listing 8.4.2 Manual Validation on POCO Entity
[ChoFixedLengthFileHeader]
[ChoFixedLengthRecordObject(18)]
public partial class EmployeeRec : IChoValidatable
{
[ChoFixedLengthRecordField(0, 8, FieldName = "id")]
[ChoTypeConverter(typeof(IntConverter))]
[Range(1, int.MaxValue, ErrorMessage = "Id must be > 0.")]
[ChoFallbackValue(1)]
public int Id { get; set; }
[ChoFixedLengthRecordField(8, 10, FieldName = "Name")]
[Required]
[DefaultValue("ZZZ")]
[ChoFallbackValue("XXX")]
public string Name { get; set; }
public bool TryValidate
(object target, ICollection<ValidationResult> validationResults)
{
return true;
}
public bool TryValidateFor(object target, string memberName,
ICollection<ValidationResult> validationResults)
{
return true;
}
public void Validate(object target)
{
}
public void ValidateFor(object target, string memberName)
{
}
}
The sample above shows how to implement custom self-validation in POCO object.
IChoValidatable
interface exposes the below methods:
TryValidate
- Validate entire object, return true
if all validation passed. Otherwise return false
. Validate
- Validate entire object, throw exception if validation is not passed. TryValidateFor
- Validate specific property of the object, return true
if all validation passed. Otherwise return false
. ValidateFor
- Validate specific property of the object, throw exception if validation is not passed.
FixedLengthReader
offers industry standard FixedLength
parsing out of the box to handle most of the parsing needs. If the parsing is not handling any of the needs, you can use the callback mechanism offered by FixedLengthReader
to handle such situations. In order to participate in the callback mechanism, Either POCO entity object or DataAnnotation
's MetadataType
type object must be inherited by IChoNotifyRecordRead
interface.
Tip: Any exceptions raised out of these interface methods will be ignored.
IChoNotifyRecordRead
exposes the below methods:
BeginLoad
- Invoked at the begin of the FixedLength
file load EndLoad
- Invoked at the end of the FixedLength
file load BeforeRecordLoad
- Raised before the FixedLength
record load AfterRecordLoad
- Raised after FixedLength
record load RecordLoadError
- Raised when FixedLength
record load errors out BeforeRecordFieldLoad
- Raised before FixedLength
column value load AfterRecordFieldLoad
- Raised after FixedLength
column value load RecordFieldLoadError
- Raised when FixedLength
column value errors out
Listing 9.1 Direct POCO Callback Mechanism Implementation
[ChoFixedLengthFileHeader]
[ChoFixedLengthRecordObject(18)]
public partial class EmployeeRec : IChoNotifyRecordRead
{
[ChoFixedLengthRecordField(0, 8, FieldName = "id")]
[ChoTypeConverter(typeof(IntConverter))]
[Range(1, int.MaxValue, ErrorMessage = "Id must be > 0.")]
[ChoFallbackValue(1)]
public int Id { get; set; }
[ChoFixedLengthRecordField(8, 10, FieldName = "Name", QuoteField = true)]
[Required]
[DefaultValue("ZZZ")]
[ChoFallbackValue("XXX")]
public string Name { get; set; }
public bool AfterRecordFieldLoad(object target, int index,
string propName, object value)
{
throw new NotImplementedException();
}
public bool AfterRecordLoad(object target, int index, object source)
{
throw new NotImplementedException();
}
public bool BeforeRecordFieldLoad(object target, int index,
string propName, ref object value)
{
throw new NotImplementedException();
}
public bool BeforeRecordLoad(object target, int index, ref object source)
{
throw new NotImplementedException();
}
public bool BeginLoad(object source)
{
throw new NotImplementedException();
}
public void EndLoad(object source)
{
throw new NotImplementedException();
}
public bool RecordFieldLoadError(object target, int index,
string propName, object value, Exception ex)
{
throw new NotImplementedException();
}
public bool RecordLoadError
(object target, int index, object source, Exception ex)
{
throw new NotImplementedException();
}
}
Listing 9.2 MetaDataType Based Callback Mechanism Implementation
[ChoFixedLengthFileHeader]
[ChoFixedLengthRecordObject(18)]
public class EmployeeRecMeta : IChoNotifyRecordRead
{
[ChoFixedLengthRecordField(0, 8, FieldName = "id")]
[ChoTypeConverter(typeof(IntConverter))]
[Range(1, int.MaxValue, ErrorMessage = "Id must be > 0.")]
[ChoFallbackValue(1)]
public int Id { get; set; }
[ChoFixedLengthRecordField(8, 10, FieldName = "Name", QuoteField = true)]
[Required]
[DefaultValue("ZZZ")]
[ChoFallbackValue("XXX")]
public string Name { get; set; }
public bool AfterRecordFieldLoad(object target, int index,
string propName, object value)
{
throw new NotImplementedException();
}
public bool AfterRecordLoad(object target, int index, object source)
{
throw new NotImplementedException();
}
public bool BeforeRecordFieldLoad(object target, int index,
string propName, ref object value)
{
throw new NotImplementedException();
}
public bool BeforeRecordLoad(object target, int index, ref object source)
{
throw new NotImplementedException();
}
public bool BeginLoad(object source)
{
throw new NotImplementedException();
}
public void EndLoad(object source)
{
throw new NotImplementedException();
}
public bool RecordFieldLoadError(object target, int index,
string propName, object value, Exception ex)
{
throw new NotImplementedException();
}
public bool RecordLoadError
(object target, int index, object source, Exception ex)
{
throw new NotImplementedException();
}
}
[MetadataType(typeof(EmployeeRecMeta))]
public partial class EmployeeRec
{
public int Id { get; set; }
public string Name { get; set; }
}
This callback invoked once at the beginning of the FixedLength
file load. source
is the FixedLength
file stream object. In here, you have a chance to inspect the stream, return true
to continue the FixedLength
load. Return false
to stop the parsing.
Listing 9.1.1 BeginLoad Callback Sample
public bool BeginLoad(object source)
{
StreamReader sr = source as StreamReader;
return true;
}
This callback invoked once at the end of the FixedLength
file load. source
is the FixedLength
file stream object. In here, you have a chance to inspect the stream, do any post steps to be performed on the stream.
Listing 9.2.1 EndLoad Callback Sample
public void EndLoad(object source)
{
StreamReader sr = source as StreamReader;
}
This callback invoked before each record line in the FixedLength
file is loaded. target
is the instance of the POCO record object. index
is the line index in the file. source
is the FixedLength
record line. In here, you have chance to inspect the line, and override it with new line if want to.
TIP: If you want to skip the line from loading, set the source to null
.
TIP: If you want to take control of parsing and loading the record properties by yourself, set the source to String.Empty
.
Return true
to continue the load process, otherwise return false
to stop the process.
Listing 9.3.1 BeforeRecordLoad Callback Sample
public bool BeforeRecordLoad(object target, int index, ref object source)
{
string line = source as string;
return true;
}
This callback invoked after each record line in the FixedLength
file is loaded. target
is the instance of the POCO record object. index
is the line index in the file. source
is the FixedLength
record line. In here, you have chance to do any post step operation with the record line.
Return true
to continue the load process, otherwise return false
to stop the process.
Listing 9.4.1 AfterRecordLoad Callback Sample
public bool AfterRecordLoad(object target, int index, object source)
{
string line = source as string;
return true;
}
This callback invoked if error encountered while loading record line. target
is the instance of the POCO record object. index
is the line index in the file. source
is the FixedLength
record line. ex
is the exception object. In here, you have chance to handle the exception. This method invoked only when Configuration.ErrorMode
is ReportAndContinue
.
Return true
to continue the load process, otherwise return false
to stop the process.
Listing 9.5.1 RecordLoadError Callback Sample
public bool RecordLoadError(object target, int index, object source, Exception ex)
{
string line = source as string;
return true;
}
This callback invoked before each FixedLength
record column is loaded. target
is the instance of the POCO record object. index
is the line index in the file. propName
is the FixedLength
record property name. value
is the FixedLength
column value. In here, you have a chance to inspect the FixedLength
record property value and perform any custom validations, etc.
Return true
to continue the load process, otherwise return false
to stop the process.
Listing 10.6.1 BeforeRecordFieldLoad Callback Sample
public bool BeforeRecordFieldLoad
(object target, int index, string propName, ref object value)
{
return true;
}
This callback is invoked after each FixedLength
record column is loaded. target
is the instance of the POCO record object. index
is the line index in the file. propName
is the FixedLength
record property name. value
is the FixedLength
column value. Any post field operation can be performed here, like computing other properties, validations, etc.
Return true
to continue the load process, otherwise return false
to stop the process.
Listing 10.7.1 AfterRecordFieldLoad Callback Sample
public bool AfterRecordFieldLoad(object target, int index, string propName, object value)
{
return true;
}
This callback is invoked when error is encountered while loading FixedLength
record column value. target
is the instance of the POCO record object. index
is the line index in the file. propName
is the FixedLength
record property name. value
is the FixedLength
column value. ex
is the exception object. In here, you have a chance to handle the exception. This method is invoked only after the below two sequences of steps performed by the FixedLengthReader
:
FixedLengthReader
looks for FallbackValue
value of each FixedLength
property. If present, it tries to assign its value to it. - If the
FallbackValue
value is not present and the Configuration.ErrorMode
is specified as ReportAndContinue
, this callback will be executed.
Return true
to continue the load
process, otherwise return false
to stop the process.
Listing 9.8.1 RecordFieldLoadError Callback Sample
public bool RecordFieldLoadError(object target, int index,
string propName, object value, Exception ex)
{
return true;
}
FixedLengthReader
automatically detects and loads the configured settings from POCO entity. At runtime, you can customize and tweak these parameters before FixedLength
parsing. FixedLengthReader
exposes Configuration
property, it is of ChoFixedLengthRecordConfiguration
object. Using this property, you can customize them.
Listing 10.1 Customizing FixedLengthReader at Run-time
class Program
{
static void Main(string[] args)
{
using (var stream = new MemoryStream())
using (var reader = new StreamReader(stream))
using (var writer = new StreamWriter(stream))
using (var parser = new ChoFixedLengthReader<EmployeeRec>(reader))
{
writer.WriteLine("1 Carl ");
writer.WriteLine("2 Mark ");
writer.Flush();
stream.Position = 0;
object row = null;
parser.Configuration.ColumnCountStrict = true;
while ((row = parser.Read()) != null)
{
Console.WriteLine(String.Format("Id: {0}", row.Id));
Console.WriteLine(String.Format("Name: {0}", row.Name));
}
}
}
FixedLengthReader
exposes AsDataReader
helper method to retrieve the FixedLength
records in .NET datareader
object. DataReader
are fast-forward streams of data. This datareader
can be used in few places like bulk coping data to database using SqlBulkCopy
, loading disconnected DataTable
, etc.
Listing 11.1 Reading as DataReader Sample
static void AsDataReaderTest()
{
using (var stream = new MemoryStream())
using (var reader = new StreamReader(stream))
using (var writer = new StreamWriter(stream))
using (var parser = new ChoFixedLengthReader<EmployeeRec>(reader))
{
writer.WriteLine("1 Carl ");
writer.WriteLine("2 Mark ");
writer.Flush();
stream.Position = 0;
IDataReader dr = parser.AsDataReader();
while (dr.Read())
{
Console.WriteLine("Id: {0}, Name: {1}", dr[0], dr[1]);
}
}
}
FixedLengthReader
exposes AsDataTable
helper method to retrieve the FixedLength
records in .NET DataTable
object. It then can be persisted to disk, displayed in grid/controls or stored in memory like any other object.
Listing 12.1 Reading as DataTable Sample
static void AsDataTableTest()
{
using (var stream = new MemoryStream())
using (var reader = new StreamReader(stream))
using (var writer = new StreamWriter(stream))
using (var parser = new ChoFixedLengthReader<EmployeeRec>(reader))
{
writer.WriteLine("Id Name ");
writer.WriteLine("1 Carl ");
writer.Flush();
stream.Position = 0;
DataTable dt = parser.AsDataTable();
foreach (DataRow dr in dt.Rows)
{
Console.WriteLine("Id: {0}, Name: {1}", dr[0], dr[1]);
}
}
}
So far, the article explained about using FixedLengthReader
with POCO object. FixedLengthReader
also supports loading FixedLength
file without POCO object. It leverages .NET dynamic feature. The sample below shows how to read FixedLength
stream without POCO object.
If you have FixedLength
file, you can parse and load the file with minimal/zero configuration. If the FixedLength
file does not have header record line, the parser automatically names the columns as Column1
, Column2
, etc.
The sample below shows it:
Listing 13.1 Loading FixedLength File without Header Sample
class Program
{
static void Main(string[] args)
{
dynamic row;
using (var stream = new MemoryStream())
using (var reader = new StreamReader(stream))
using (var writer = new StreamWriter(stream))
using (var parser = new ChoFixedLengthReader(reader))
{
writer.WriteLine("1 Carl ");
writer.WriteLine("2 Mark ");
writer.Flush();
stream.Position = 0;
while ((row = parser.Read()) != null)
{
Console.WriteLine(row.Column1);
}
}
}
}
If the FixedLength
file has a header, you can state that in the configuration as HasHeaderRecord
is true
and parse the file as simple as below:
Listing 13.2 Loading FixedLength file with Header Sample
class Program
{
static void Main(string[] args)
{
ChoFixedLengthRecordConfiguration config =
new ChoFixedLengthRecordConfiguration();
config.FixedLengthFileHeaderConfiguration.HasHeaderRecord = true;
dynamic row;
using (var stream = new MemoryStream())
using (var reader = new StreamReader(stream))
using (var writer = new StreamWriter(stream))
using (var parser = new ChoFixedLengthReader(reader. config))
{
writer.WriteLine("Id Name ");
writer.WriteLine("1 Carl ");
writer.WriteLine("2 Mark ");
writer.Flush();
stream.Position = 0;
while ((row = parser.Read()) != null)
{
Console.WriteLine(row.Name);
}
}
}
}
The above example automatically discovers the FixedLength
columns from the header and parses the file.
You can override the default behavior of discovering columns automatically by adding field configurations manually and pass it to FixedLengthReader
for parsing file.
The sample shows how to do it.
Listing 13.3 Loading FixedLength File with Configuration
class Program
{
static void Main(string[] args)
{
ChoFixedLengthRecordConfiguration config =
new ChoFixedLengthRecordConfiguration();
config.FixedLengthFileHeaderConfiguration.HasHeaderRecord = true;
config.FixedLengthRecordFieldConfigurations.Add
(new ChoFixedLengthRecordFieldConfiguration("Id", 0, 8));
config.FixedLengthRecordFieldConfigurations.Add
(new ChoFixedLengthRecordFieldConfiguration("Name", 8, 10));
dynamic row;
using (var stream = new MemoryStream())
using (var reader = new StreamReader(stream))
using (var writer = new StreamWriter(stream))
using (var parser = new ChoFixedLengthReader(reader. config))
{
writer.WriteLine("Id Name ");
writer.WriteLine("1 Carl ");
writer.WriteLine("2 Mark ");
writer.Flush();
stream.Position = 0;
while ((row = parser.Read()) != null)
{
Console.WriteLine(row.Name);
}
}
}
}
To completely turn off the auto column discovery, you will have to set ChoFixedLengthRecordConfiguration.AutoDiscoverColumns
to false
.
It is the value used and set to the property when the FixedLength
value is empty or whitespace (controlled via IgnoreFieldValueMode
).
Any POCO entity property can be specified with default value using System.ComponentModel.DefaultValueAttribute
.
For dynamic object members or to override the declarative POCO object member's default value specification, you can do so through configuration as shown below:
ChoFixedLengthRecordConfiguration config = new ChoFixedLengthRecordConfiguration();
config.FixedLengthRecordFieldConfigurations.Add
(new ChoFixedLengthRecordFieldConfiguration("Id", 0, 8));
config.FixedLengthRecordFieldConfigurations.Add
(new ChoFixedLengthRecordFieldConfiguration("Name", 8, 10)
{ DefaultValue = "NoName" })
It is the value used and set to the property when the FixedLength
value failed to set. Fallback
value only set when ErrorMode
is either IgnoreAndContinue
or ReportAndContinue
.
Any POCO entity property can be specified with fallback value using ChoETL.ChoFallbackValueAttribute
.
For dynamic object members or to override the declarative POCO object member's fallback values, you can do through configuration as shown below:
ChoFixedLengthRecordConfiguration config = new ChoFixedLengthRecordConfiguration();
config.FixedLengthRecordFieldConfigurations.Add
(new ChoFixedLengthRecordFieldConfiguration("Id", 0, 8));
config.FixedLengthRecordFieldConfigurations.Add
(new ChoFixedLengthRecordFieldConfiguration
("Name", 8, 10) { FallbackValue = "Tom" });
In the type less dynamic object model, the reader reads individual field value and populates them to dynamic object members in 'string
' value. If you want to enforce the type and do extra type checking during load, you can do so by declaring the field type at the field configuration.
Listing 8.5.1 Defining FieldType
ChoFixedLengthRecordConfiguration config = new ChoFixedLengthRecordConfiguration();
config.FixedLengthRecordFieldConfigurations.Add
(new ChoFixedLengthRecordFieldConfiguration("Id", 0, 8) { FieldType = typeof(int) });
config.FixedLengthRecordFieldConfigurations.Add
(new ChoFixedLengthRecordFieldConfiguration("Name", 8, 10));
The above sample shows to define field type as 'int
' to 'Id
' field. This instructs the FixedLengthReader
to parse and convert the value to integer before assigning to it. This extra type safety alleviate the incorrect values being loaded to object while parsing.
Most of the primitive types are automatically converted and set them to the properties by FixedLengthReader
. If the value of the FixedLength
field can't automatically be converted into the type of the property, you can specify a custom / built-in .NET converters to convert the value. These can be either IValueConverter
or TypeConverter
converters.
In the dynamic object model, you can specify these converters via configuration. See the below example on the approach taken to specify type converters for FixedLength
columns.
Listing 13.4.1 Specifying TypeConverters
ChoFixedLengthRecordConfiguration config = new ChoFixedLengthRecordConfiguration();
config.FileHeaderConfiguration.HasHeaderRecord = true;
config.ThrowAndStopOnMissingField = false;
ChoFixedLengthRecordFieldConfiguration idConfig =
new ChoFixedLengthRecordFieldConfiguration("Id", 0, 8);
idConfig.AddConverter(new IntConverter());
config.FixedLengthRecordFieldConfigurations.Add(idConfig);
config.FixedLengthRecordFieldConfigurations.Add
(new ChoFixedLengthRecordFieldConfiguration("Name", 8, 10));
config.FixedLengthRecordFieldConfigurations.Add
(new ChoFixedLengthRecordFieldConfiguration("Name1", 18, 10));
In the above, we construct and attach the IntConverter
to 'Id
' field using AddConverter
helper method in ChoFixedLengthRecordFieldConfiguration
object.
Likewise, if you want to remove any converter from it, you can use RemoveConverter
on ChoFixedLengthRecordFieldConfiguration
object.
FixedLengthReader
leverages both System.ComponentModel.DataAnnotations and Validation Block
validation attributes to specify validation rules for individual FixedLength
fields. Refer to the MSDN site for a list of available DataAnnotation
s validation attributes.
Listing 13.5.1 Specifying Validations
ChoFixedLengthRecordConfiguration config = new ChoFixedLengthRecordConfiguration();
config.FileHeaderConfiguration.HasHeaderRecord = true;
config.ThrowAndStopOnMissingField = false;
ChoFixedLengthRecordFieldConfiguration idConfig =
new ChoFixedLengthRecordFieldConfiguration("Id", 0, 8);
idConfig.Validators = new ValidationAttribute[] { new RangeAttribute(0, 100) };
config.FixedLengthRecordFieldConfigurations.Add(idConfig);
config.FixedLengthRecordFieldConfigurations.Add
(new ChoFixedLengthRecordFieldConfiguration("Name", 8, 10));
config.FixedLengthRecordFieldConfigurations.Add
(new ChoFixedLengthRecordFieldConfiguration("Name1", 18, 10));
In the example above, we used Range
validation attribute for Id
property. FixedLengthReader
performs validation on them during load based on Configuration.ObjectValidationMode
is set to ChoObjectValidationMode.MemberLevel
or ChoObjectValidationMode.ObjectLevel
.
PS: Self validation NOT supported in Dynamic object model
If you already have existing sealed POCO object or the object is in 3rd party library, we can use them with FixedLengthReader
. All you need is the FixedLength
file with header in it.
Listing 14.1 Exisiting sealed POCO Object
public sealed class ThirdPartyRec
{
public int Id
{
get;
set;
}
public string Name
{
get;
set;
}
}
Listing 14.2 Consuming FixedLength File
class Program
{
static void Main(string[] args)
{
using (var stream = new MemoryStream())
using (var reader = new StreamReader(stream))
using (var writer = new StreamWriter(stream))
using (var parser = new ChoFixedLengthReader<ThirdPartyRec>
(reader).WithFirstLineHeader())
{
writer.WriteLine("Id Name ");
writer.WriteLine("1 Carl ");
writer.WriteLine("2 Mark ");
writer.Flush();
stream.Position = 0;
object row = null;
while ((row = parser.Read()) != null)
{
Console.WriteLine(String.Format("Id: {0}", row.Id));
Console.WriteLine(String.Format("Name: {0}", row.Name));
}
}
}
}
In this case, FixedLengthReader
reverse discover the FixedLength
columns from the FixedLength
file and load the data into POCO object. If the FixedLength
file structure and POCO object matches, the load will success with populating all corresponding data to its properties. In case the property is missing for any FixedLength
column, FixedLengthReader
silently ignores them and continue on with rest.
You can override this behavior by setting ChoFixedLengthRecordConfiguration.ThrowAndStopOnMissingField
property to false
. In this case, the FixedLengthReader
will throw ChoMissingRecordFieldException
exception if a property is missing for a FixedLength
column.
FixedLengthReader
throws different types of exceptions in different situations:
ChoParserException
- FixedLength
file is bad and parser is not able to recover. ChoRecordConfigurationException
- Any invalid configuration settings are specified, this exception will be raised. ChoMissingRecordFieldException
- A property is missing for a FixedLength
column, this exception will be raised.
If FixedLength
file contains column values with newline characters in it, ChoFixedLengthReader
can handle it with surrounded quotes.
Listing 16.1.1 Multiline Column Values in FixedLength File
Id Name
1 "Carl
's"
2 Mark
In the above, the Id (1) has name in multiline with surrounded quotes. ChoFixedLengthReader
recognize this situation, load them propertly.
ChoFixedLengthReader
can read FixedLength
column values with single quotes in it seamlessly. No surrounded quotes required.
Listing 16.3.1 FixedLength Column Value with Single Quotes
Id,Name
1,Tom Cassawaw
2,Carl'Malcolm
3,Mark
In the above, the Id (2) has name with single quotes (') in it. ChoFixedLengthReader
recognize this situation, load these values successfully.
Cinchoo ETL works better with data annotation's MetadataType
model. It is a way to attach MetaData
class to data model class. In this associated class, you provide additional metadata information that is not in the data model. Its role is to add attribute to a class without having to modify this one. You can add this attribute that takes a single parameter to a class that will have all the attributes. This is useful when the POCO classes are auto generated (by Entity Framework, MVC, etc.) by an automatic tool. This is why the second class comes into play. You can add new stuffs without touching the generated file. Also, this promotes modularization by separating the concerns into multiple classes.
For more information about it, please search on MSDN.
Listing 17.1 MetadataType Annotation Usage Sample
[MetadataType(typeof(EmployeeRecMeta))]
public class EmployeeRec
{
public int Id { get; set; }
public string Name { get; set; }
}
[ChoFixedLengthFileHeader]
[ChoFixedLengthRecordObject(18)]
public class EmployeeRecMeta : IChoNotifyRecordRead, IChoValidatable
{
[ChoFixedLengthRecordField(0, 8, FieldName = "id",
ErrorMode = ChoErrorMode.ReportAndContinue )]
[ChoTypeConverter(typeof(IntConverter))]
[Range(1, 1, ErrorMessage = "Id must be > 0.")]
[ChoFallbackValue(1)]
public int Id { get; set; }
[ChoFixedLengthRecordField(8, 10, FieldName = "Name", QuoteField = true)]
[StringLength(1)]
[DefaultValue("ZZZ")]
[ChoFallbackValue("XXX")]
public string Name { get; set; }
public bool AfterRecordFieldLoad(object target, int index,
string propName, object value)
{
throw new NotImplementedException();
}
public bool AfterRecordLoad(object target, int index, object source)
{
throw new NotImplementedException();
}
public bool BeforeRecordFieldLoad(object target, int index,
string propName, ref object value)
{
throw new NotImplementedException();
}
public bool BeforeRecordLoad(object target, int index, ref object source)
{
throw new NotImplementedException();
}
public bool BeginLoad(object source)
{
throw new NotImplementedException();
}
public void EndLoad(object source)
{
throw new NotImplementedException();
}
public bool RecordFieldLoadError(object target, int index,
string propName, object value, Exception ex)
{
throw new NotImplementedException();
}
public bool RecordLoadError(object target, int index, object source, Exception ex)
{
throw new NotImplementedException();
}
public bool TryValidate(object target,
ICollection<ValidationResult> validationResults)
{
return true;
}
public bool TryValidateFor
(object target, string memberName,
ICollection<ValidationResult> validationResults)
{
return true;
}
public void Validate(object target)
{
}
public void ValidateFor(object target, string memberName)
{
}
}
In the above, EmployeeRec
is the data class. It contains only domain specific properties and operations. Mark it as a very simple class to look at it.
We separate the validation, callback mechanism, configuration, etc. into metadata type class, EmployeeRecMeta
.
If the POCO entity class is an auto-generated class or exposed via library or it is a sealed class, it limits you to attach FixedLength
schema definition to it declaratively. In such case, you can choose one of the options below to specify FixedLength
layout configuration:
- Manual Configuration
- Auto Map Configuration
- Attaching
MetadataType
class
I'm going to show you how to configure the below POCO entity class on each approach.
Listing 18.1 Sealed POCO Entity Class
public sealed class EmployeeRec
{
public int Id { get; set; }
public string Name { get; set; }
}
Define a brand new configuration object from scratch and add all the necessary FixedLength
fields to the ChoFixedLengthConfiguration.FixedLengthRecordFieldConfigurations
collection property. This option gives you greater flexibility to control the configuration of FixedLength
parsing. But the downside is that possibility of making mistakes and hard to manage them if the FixedLength
file layout is large.
Listing 18.1.1 Manual Configuration
ChoFixedLengthRecordConfiguration config = new ChoFixedLengthRecordConfiguration();
config.FixedLengthFileHeaderConfiguration.HasHeaderRecord = true;
config.ThrowAndStopOnMissingField = true;
config.FixedLengthRecordFieldConfigurations.Add
(new ChoFixedLengthRecordFieldConfiguration("Id", 0, 8));
config.FixedLengthRecordFieldConfigurations.Add
(new ChoFixedLengthRecordFieldConfiguration("Name", 8, 10));
This is an alternative approach and very less error-prone method to auto map the FixedLength
columns for the POCO entity class.
First, define a schema class for EmployeeRec
POCO entity class as below:
Listing 18.2.1 Auto Map Class
public class EmployeeRecMap
{
[ChoFixedLengthRecordField(0, 8, FieldName = "id")]
public int Id { get; set; }
[ChoFixedLengthRecordField(8, 10, FieldName = "Name")]
public string Name { get; set; }
}
Then you can use it to auto map FixedLength
columns by using ChoFixedLengthRecordConfiguration.MapRecordFields
method.
Listing 18.2.2 Using Auto Map Configuration
ChoFixedLengthRecordConfiguration config = new ChoFixedLengthRecordConfiguration();
config.MapRecordFields<EmployeeRecMap>();
foreach (var e in new ChoFixedLengthReader<EmployeeRec>("Emp.txt", config))
Console.WriteLine(e.ToString());
This is one another approach to attach MetadataType
class for POCO entity object. Previous approach simply cares for auto mapping of FixedLength
columns only. Other configuration properties like property converters, parser parameters, default/fallback values, etc. are not considered.
This model accounts for everything by defining MetadataType
class and specifying the FixedLength
configuration parameters declaratively. This is useful when your POCO entity is sealed and not partial class. Also it is one of favorable and less error-prone approach to configure FixedLength
parsing of POCO entity.
Listing 18.3.1 Define MetadataType class
[ChoFixedLengthFileHeader()]
[ChoFixedLengthRecordObject(18)]
public class EmployeeRecMeta : IChoNotifyRecordRead, IChoValidatable
{
[ChoFixedLengthRecordField(0, 8, FieldName = "id",
ErrorMode = ChoErrorMode.ReportAndContinue )]
[ChoTypeConverter(typeof(IntConverter))]
[Range(1, 1, ErrorMessage = "Id must be > 0.")]
public int Id { get; set; }
[ChoFixedLengthRecordField(8, 10, FieldName = "Name", QuoteField = true)]
[StringLength(1)]
[DefaultValue("ZZZ")]
[ChoFallbackValue("XXX")]
public string Name { get; set; }
public bool AfterRecordFieldLoad(object target, int index,
string propName, object value)
{
throw new NotImplementedException();
}
public bool AfterRecordLoad(object target, int index, object source)
{
throw new NotImplementedException();
}
public bool BeforeRecordFieldLoad(object target, int index,
string propName, ref object value)
{
throw new NotImplementedException();
}
public bool BeforeRecordLoad(object target, int index, ref object source)
{
throw new NotImplementedException();
}
public bool BeginLoad(object source)
{
throw new NotImplementedException();
}
public void EndLoad(object source)
{
throw new NotImplementedException();
}
public bool RecordFieldLoadError(object target, int index,
string propName, object value, Exception ex)
{
return true;
}
public bool RecordLoadError
(object target, int index, object source, Exception ex)
{
throw new NotImplementedException();
}
public bool TryValidate(object target,
ICollection<ValidationResult> validationResults)
{
return true;
}
public bool TryValidateFor
(object target, string memberName, ICollection<ValidationResult> validationResults)
{
return true;
}
public void Validate(object target)
{
}
public void ValidateFor(object target, string memberName)
{
}
}
Listing 18.3.2 Attaching MetadataType Class
ChoMetadataObjectCache.Default.Attach<EmployeeRec>(new EmployeeRecMeta());
foreach (var e in new ChoFixedLengthReader<EmployeeRec>("Emp.txt"))
Console.WriteLine(e.ToString()
This is a little nifty helper method to parse and load FixedLength
text string into objects.
Listing 19.1 Using LoadText Method
static void LoadTextTest()
{
string txt = "Id Name \r\n1 Carl \r\n2 Mark ";
foreach (dynamic e in ChoFixedLengthReader.LoadText(txt).WithFirstLineHeader())
{
Console.WriteLine(String.Format("Id: {0}", e.Id));
Console.WriteLine(String.Format("Name: {0}", e.Name));
}
}
Cinchoo ETL automatically parses and converts each FixedLength
column values to the corresponding FixedLength
column's underlying data type seamlessly. Most of the basic .NET types are handled automatically without any setup needed.
This is achieved through two key settings in the ETL system:
ChoFixedLengthRecordConfiguration.CultureInfo
- Represents information about a specific culture including the names of the culture, the writing system, and the calendar used, as well as access to culture-specific objects that provide information for common operations, such as formatting dates and sorting strings. Default is 'en-US
'. ChoTypeConverterFormatSpec
- It is global format specifier class holds all the intrinsic .NET types formatting specs.
In this section, I'm going to talk about changing the default format specs for each .NET intrinsic data types according to parsing needs.
ChoTypeConverterFormatSpec
is singleton class, the instance is exposed via 'Instance
' static member. It is thread local, means that there will be separate instance copy kept on each thread.
There are two sets of format specs members given to each intrinsic type, one for loading and another one for writing the value, except for Boolean
, Enum
, DataTime
types. These types have only one member for both loading and writing operations.
Specifying each intrinsic data type format specs through ChoTypeConverterFormatSpec
will impact system wide, i.e., setting ChoTypeConverterFormatSpec.IntNumberStyle = NumberStyles.AllowParentheses
will impact all integer members of FixedLength
objects to allow parentheses. If you want to override this behavior and take control of specific FixedLength
data member to handle its own unique parsing of FixedLength
value from global system wide setting, it can be done by specifying TypeConverter
at the FixedLength
field member level. Refer to section 13.4 for more information.
NumberStyles
(optional) used for loading values from FixedLength
stream and Format
string are used for writing values to FixedLength
stream.
In this article, I'll brief about using NumberStyles
for loading FixedLength
data from stream. These values are optional. It determines the styles permitted for each type during parsing of FixedLength
file. System automatically figures out the way to parse and load the values from underlying Culture
. In an odd situation, you may want to override and set the styles the way you want in order to successfully load the file. Refer to the MSDN for more about NumberStyles and its values.
Listing 20.1.1 ChoTypeConverterFormatSpec Members
public class ChoTypeConverterFormatSpec
{
public static readonly ThreadLocal<ChoTypeConverterFormatSpec>
Instance = new ThreadLocal<ChoTypeConverterFormatSpec>(() =>
new ChoTypeConverterFormatSpec());
public string DateTimeFormat { get; set; }
public ChoBooleanFormatSpec BooleanFormat { get; set; }
public ChoEnumFormatSpec EnumFormat { get; set; }
public NumberStyles? CurrencyNumberStyle { get; set; }
public string CurrencyFormat { get; set; }
public NumberStyles? BigIntegerNumberStyle { get; set; }
public string BigIntegerFormat { get; set; }
public NumberStyles? ByteNumberStyle { get; set; }
public string ByteFormat { get; set; }
public NumberStyles? SByteNumberStyle { get; set; }
public string SByteFormat { get; set; }
public NumberStyles? DecimalNumberStyle { get; set; }
public string DecimalFormat { get; set; }
public NumberStyles? DoubleNumberStyle { get; set; }
public string DoubleFormat { get; set; }
public NumberStyles? FloatNumberStyle { get; set; }
public string FloatFormat { get; set; }
public string IntFormat { get; set; }
public NumberStyles? IntNumberStyle { get; set; }
public string UIntFormat { get; set; }
public NumberStyles? UIntNumberStyle { get; set; }
public NumberStyles? LongNumberStyle { get; set; }
public string LongFormat { get; set; }
public NumberStyles? ULongNumberStyle { get; set; }
public string ULongFormat { get; set; }
public NumberStyles? ShortNumberStyle { get; set; }
public string ShortFormat { get; set; }
public NumberStyles? UShortNumberStyle { get; set; }
public string UShortFormat { get; set; }
}
The sample below shows how to load FixedLength
data stream having 'se-SE
' (Swedish) culture specific data using FixedLengthReader
. Also the input feed comes with 'EmployeeNo
' values containing parentheses. In order to make the load successful, we have to set the ChoTypeConverterFormatSpec.IntNumberStyle to NumberStyles.AllowParenthesis
.
Listing 20.1.2 Using ChoTypeConverterFormatSpec in Code
static void UsingFormatSpecs()
{
ChoFixedLengthRecordConfiguration config = new ChoFixedLengthRecordConfiguration();
config.Culture = new System.Globalization.CultureInfo("se-SE");
config.FixedLengthRecordFieldConfigurations.Add
(new ChoFixedLengthRecordFieldConfiguration("Id", 0, 8) { FieldType = typeof(int) });
config.FixedLengthRecordFieldConfigurations.Add
(new ChoFixedLengthRecordFieldConfiguration("Name", 8, 10));
config.FixedLengthRecordFieldConfigurations.Add
(new ChoFixedLengthRecordFieldConfiguration("Salary", 18, 20)
{ FieldType = typeof(ChoCurrency) });
ChoTypeConverterFormatSpec.Instance.IntNumberStyle = NumberStyles.AllowParentheses;
using (var stream = new MemoryStream())
using (var reader = new StreamReader(stream))
using (var writer = new StreamWriter(stream))
using (var parser = new ChoFixedLengthReader(reader, config).WithFirstLineHeader())
{
writer.WriteLine("Id Name Salary ");
writer.WriteLine("1 Carl 12.345679 kr ");
writer.WriteLine("2 Mark 50000 kr ");
writer.Flush();
stream.Position = 0;
dynamic row = null;
while ((row = parser.Read()) != null)
{
Console.WriteLine(String.Format("Id: {0}", row.Id));
Console.WriteLine(String.Format("Name: {0}", row.Name));
Console.WriteLine(String.Format("Salary: {0}", row.Salary));
}
}
}
Cinchoo ETL provides ChoCurrency
object to read and write currency values in FixedLength
files. ChoCurrency
is a wrapper class to hold the currency value in decimal type along with support of serializing them in text format during FixedLength
load.
Listing 20.2.1 Using Currency Members in Dynamic Model
static void CurrencyDynamicTest()
{
ChoFixedLengthRecordConfiguration config = new ChoFixedLengthRecordConfiguration();
config.FixedLengthRecordFieldConfigurations.Add
(new ChoFixedLengthRecordFieldConfiguration("Id", 0, 8));
config.FixedLengthRecordFieldConfigurations.Add
(new ChoFixedLengthRecordFieldConfiguration("Name", 8, 10));
config.FixedLengthRecordFieldConfigurations.Add
(new ChoFixedLengthRecordFieldConfiguration("Salary", 18, 20)
{ FieldType = typeof(ChoCurrency) });
using (var stream = new MemoryStream())
using (var reader = new StreamReader(stream))
using (var writer = new StreamWriter(stream))
using (var parser = new ChoFixedLengthReader(reader, config))
{
writer.WriteLine("Id Name Salary ");
writer.WriteLine("1 Carl 100000 ");
writer.WriteLine("2 Mark 250000 ");
writer.Flush();
stream.Position = 0;
dynamic row;
while ((row = parser.Read()) != null)
{
Console.WriteLine(String.Format("Id: {0}", row.Id));
Console.WriteLine(String.Format("Name: {0}", row.Name));
Console.WriteLine(String.Format("Salary: {0}", row.Salary));
}
}
}
The sample above shows how to load currency values using dynamic object model. By default, all the members of dynamic object are treated as string
type, unless specified explicitly via ChoFixedLengthFieldConfiguration.FieldType
. By specifying the field type as ChoCurrency
to the 'Salary
' FixedLength
field, FixedLengthReader
loads them as currency object.
P.S.: The format of the currency value is figured by FixedLengthReader
through ChoRecordConfiguration.Culture
and ChoTypeConverterFormatSpec.CurrencyNumberStyle
.
The sample below shows how to use ChoCurrency FixedLength
field in POCO entity class.
Listing 20.2.2 Using Currency Members in POCO Model
public class EmployeeRecWithCurrency
{
[ChoFixedLengthRecordField(0, 8)]
public int Id { get; set; }
[ChoFixedLengthRecordField(8, 10)]
public string Name { get; set; }
[ChoFixedLengthRecordField(18, 28)]
public ChoCurrency Salary { get; set; }
}
static void CurrencyTest()
{
using (var stream = new MemoryStream())
using (var reader = new StreamReader(stream))
using (var writer = new StreamWriter(stream))
using (var parser = new ChoFixedLengthReader<EmployeeRecWithCurrency>(reader))
{
writer.WriteLine("Id Name Salary ");
writer.WriteLine("1 Carl 100000 ");
writer.WriteLine("2 Mark 250000 ");
writer.Flush();
stream.Position = 0;
dynamic row;
while ((row = parser.Read()) != null)
{
Console.WriteLine(String.Format("Id: {0}", row.Id));
Console.WriteLine(String.Format("Name: {0}", row.Name));
Console.WriteLine(String.Format("Salary: {0}", row.Salary));
}
}
}
Cinchoo ETL implicitly handles parsing of enum
column values from FixedLength
files. If you want to fine control the parsing of these values, you can specify them globally via ChoTypeConverterFormatSpec.EnumFormat
. Default is ChoEnumFormatSpec.Value
.
FYI, changing this value will impact system wide.
There are three possible values that can be used:
ChoEnumFormatSpec.Value
- Enum
value is used for parsing. ChoEnumFormatSpec.Name
- Enum
key name is used for parsing. ChoEnumFormatSpec.Description
- If each enum
key is decorated with DescriptionAttribute
, its value will be used for parsing.
Listing 20.3.1 Specifying Enum Format Specs During Parsing
public enum EmployeeType
{
[Description("Full Time Employee")]
Permanent = 0,
[Description("Temporary Employee")]
Temporary = 1,
[Description("Contract Employee")]
Contract = 2
}
static void EnumTest()
{
ChoTypeConverterFormatSpec.Instance.EnumFormat = ChoEnumFormatSpec.Value;
ChoFixedLengthRecordConfiguration config = new ChoFixedLengthRecordConfiguration();
config.FixedLengthRecordFieldConfigurations.Add
(new ChoFixedLengthRecordFieldConfiguration("Id", 0, 8)
{ FieldType = typeof(int) });
config.FixedLengthRecordFieldConfigurations.Add
(new ChoFixedLengthRecordFieldConfiguration("Name", 8, 10));
config.FixedLengthRecordFieldConfigurations.Add
(new ChoFixedLengthRecordFieldConfiguration("Salary", 18, 20)
{ FieldType = typeof(ChoCurrency) });
config.FixedLengthRecordFieldConfigurations.Add
(new ChoFixedLengthRecordFieldConfiguration("ET", 38, 2)
{ FieldType = typeof(EmployeeType) });
ChoTypeConverterFormatSpec.Instance.IntNumberStyle = NumberStyles.AllowParentheses;
using (var stream = new MemoryStream())
using (var reader = new StreamReader(stream))
using (var writer = new StreamWriter(stream))
using (var parser = new ChoFixedLengthReader(reader, config))
{
writer.WriteLine("Id Name Salary ET");
writer.WriteLine("1 Carl 100000 0");
writer.WriteLine("2 Mark 250000 2");
writer.Flush();
stream.Position = 0;
dynamic row = null;
while ((row = parser.Read()) != null)
{
Console.WriteLine(String.Format("Id: {0}", row.Id));
Console.WriteLine(String.Format("Name: {0}", row.Name));
Console.WriteLine(String.Format("Salary: {0}", row.Salary));
}
}
}
Cinchoo ETL implicitly handles parsing of boolean FixedLength
column values from FixedLength
files. If you want to fine control the parsing of these values, you can specify them globally via ChoTypeConverterFormatSpec.BooleanFormat
. Default value is ChoBooleanFormatSpec.ZeroOrOne
.
FYI, changing this value will impact system wide.
There are four possible values that can be used:
ChoBooleanFormatSpec.ZeroOrOne
- '0
' for false
. '1
' for true
. ChoBooleanFormatSpec.YOrN
- 'Y
' for true
, 'N
' for false
. ChoBooleanFormatSpec.TrueOrFalse
- 'True
' for true
, 'False
' for false
. ChoBooleanFormatSpec.YesOrNo
- 'Yes
' for true
, 'No
' for false
.
Listing 20.4.1 Specifying Boolean Format Specs During Parsing
static void BoolTest()
{
ChoTypeConverterFormatSpec.Instance.BooleanFormat = ChoBooleanFormatSpec.ZeroOrOne;
ChoFixedLengthRecordConfiguration config = new ChoFixedLengthRecordConfiguration();
config.FixedLengthRecordFieldConfigurations.Add
(new ChoFixedLengthRecordFieldConfiguration("Id", 0, 8)
{ FieldType = typeof(int) });
config.FixedLengthRecordFieldConfigurations.Add
(new ChoFixedLengthRecordFieldConfiguration("Name", 8, 10));
config.FixedLengthRecordFieldConfigurations.Add
(new ChoFixedLengthRecordFieldConfiguration("Salary", 18, 20)
{ FieldType = typeof(ChoCurrency) });
config.FixedLengthRecordFieldConfigurations.Add
(new ChoFixedLengthRecordFieldConfiguration("AT", 38, 40)
{ FieldType = typeof(bool) });
ChoTypeConverterFormatSpec.Instance.IntNumberStyle = NumberStyles.AllowParentheses;
using (var stream = new MemoryStream())
using (var reader = new StreamReader(stream))
using (var writer = new StreamWriter(stream))
using (var parser = new ChoFixedLengthReader(reader, config))
{
writer.WriteLine("Id Name Salary AT");
writer.WriteLine("1 Carl 100000 0");
writer.WriteLine("2 Mark 250000 1");
writer.Flush();
stream.Position = 0;
dynamic row = null;
while ((row = parser.Read()) != null)
{
Console.WriteLine(String.Format("Id: {0}", row.Id));
Console.WriteLine(String.Format("Name: {0}", row.Name));
Console.WriteLine(String.Format("Salary: {0}", row.Salary));
Console.WriteLine(String.Format("AT: {0}", row.AT));
}
}
}
Cinchoo ETL implicitly handles parsing of datetime FixedLength
column values from FixedLength
files using system Culture or custom set culture. If you want to fine control the parsing of these values, you can specify them globally via ChoTypeConverterFormatSpec.DateTimeFormat
. Default value is 'd
'.
FYI, changing this value will impact system wide.
You can use any valid standard or custom datetime .NET format specification to parse the datetime FixedLength
values from the file.
Listing 20.5.1 Specifying Datetime Format Specs During Parsing
static void DateTimeTest()
{
ChoTypeConverterFormatSpec.Instance.DateTimeFormat = "d";
ChoFixedLengthRecordConfiguration config = new ChoFixedLengthRecordConfiguration();
config.FixedLengthRecordFieldConfigurations.Add
(new ChoFixedLengthRecordFieldConfiguration("Id", 0, 8) { FieldType = typeof(int) });
config.FixedLengthRecordFieldConfigurations.Add
(new ChoFixedLengthRecordFieldConfiguration("Name", 8, 10));
config.FixedLengthRecordFieldConfigurations.Add
(new ChoFixedLengthRecordFieldConfiguration("Salary", 18, 20)
{ FieldType = typeof(ChoCurrency) });
config.FixedLengthRecordFieldConfigurations.Add
(new ChoFixedLengthRecordFieldConfiguration("JoinedDate", 38, 10)
{ FieldType = typeof(DateTime) });
ChoTypeConverterFormatSpec.Instance.IntNumberStyle = NumberStyles.AllowParentheses;
using (var stream = new MemoryStream())
using (var reader = new StreamReader(stream))
using (var writer = new StreamWriter(stream))
using (var parser = new ChoFixedLengthReader(reader, config))
{
writer.WriteLine("Id Name Salary JoinedDate");
writer.WriteLine("1 Carl 100000 01/01/2001");
writer.WriteLine("2 Mark 250000 12/23/1996");
writer.Flush();
stream.Position = 0;
dynamic row = null;
while ((row = parser.Read()) != null)
{
Console.WriteLine(String.Format("Id: {0}", row.Id));
Console.WriteLine(String.Format("Name: {0}", row.Name));
Console.WriteLine(String.Format("Salary: {0}", row.Salary));
Console.WriteLine(String.Format("JoinedDate: {0}", row.JoinedDate));
}
}
}
FixedLengthReader
exposes few frequent to use configuration parameters via fluent API methods. This will make the programming of parsing of FixedLength
files quicker.
This API method sets the record length of fixed length data file.
static void QuickDynamicTest()
{
using (var stream = new MemoryStream())
using (var reader = new StreamReader(stream))
using (var writer = new StreamWriter(stream))
using (var parser = new ChoFixedLengthReader(reader).WithRecordLength(18))
{
writer.WriteLine("1 Carl ");
writer.WriteLine("2 Mark ");
writer.Flush();
stream.Position = 0;
dynamic row;
while ((row = parser.Read()) != null)
{
Console.WriteLine(String.Format("Id: {0}", row.Id));
Console.WriteLine(String.Format("Name: {0}", row.Name));
}
}
}
This API method flags the FixedLength
file contains first row as header or not. Optional bool parameter specifies the first row header or not. Default is true
.
static void QuickDynamicTest()
{
using (var stream = new MemoryStream())
using (var reader = new StreamReader(stream))
using (var writer = new StreamWriter(stream))
using (var parser = new ChoFixedLengthReader(reader).WithFirstLineHeader())
{
writer.WriteLine("Id Name ");
writer.WriteLine("1 Carl ");
writer.WriteLine("2 Mark ");
writer.Flush();
stream.Position = 0;
dynamic row;
while ((row = parser.Read()) != null)
{
Console.WriteLine(String.Format("Id: {0}", row.Id));
Console.WriteLine(String.Format("Name: {0}", row.Name));
}
}
}
This API method used to add FixedLength
column with StartIndex
, Size
and/or date type. This method helpful in dynamic object model, by specifying each and individual CSV column with appropriate datatype
.
static void QuickDynamicTest()
{
using (var stream = new MemoryStream())
using (var reader = new StreamReader(stream))
using (var writer = new StreamWriter(stream))
using (var parser = new ChoFixedLengthReader(reader).WithFirstLineHeader().WithField
("Id", 0, 8, typeof(ini).WithField("Name", 8, 10, typeof(string))
{
writer.WriteLine("Id Name ");
writer.WriteLine("1 Carl ");
writer.WriteLine("2 Mark ");
writer.Flush();
stream.Position = 0;
dynamic row;
while ((row = parser.Read()) != null)
{
Console.WriteLine(String.Format("Id: {0}", row.Id));
Console.WriteLine(String.Format("Name: {0}", row.Name));
}
}
}
This API method is used to specify if all fields are surrounded by quotes or not.
static void QuickDynamicTest()
{
using (var stream = new MemoryStream())
using (var reader = new StreamReader(stream))
using (var writer = new StreamWriter(stream))
using (var parser =
new ChoFixedLengthReader(reader).WithFirstLineHeader().QuoteAllFields())
{
writer.WriteLine("Id Name ");
writer.WriteLine("1 Carl ");
writer.WriteLine("2 Mark ");
writer.Flush();
stream.Position = 0;
dynamic row;
while ((row = parser.Read()) != null)
{
Console.WriteLine(String.Format("Id: {0}", row.Id));
Console.WriteLine(String.Format("Name: {0}", row.Name));
}
}
}
- 23rd January, 2017: Initial version