Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / C#

C# CSV File and String Reader Classes

5.00/5 (16 votes)
3 Sep 2014CPOL5 min read 57.7K   2.3K  
CSVFileReader and CSVStringReader are light weighted and fast classes that resemble unidirectional data set

Introduction

Considering the significant number of existing free and open source CSV reader components/classes, I was surprised I could not pick one to use because of a number of different reasons: some readers need additional work since they do not cover formats I am interested in, some are not that simple in use, some implement logic in the way that is not easy to analyze, etc.

I wanted to resolve the “I need CSV reader” issue once and forever and decided to come up with one more solution. Eventually, the main requirements for the reader were crystallized into the following:

  • Ability to handle majority if not all existing variations of CSV formats, including "TAB separated", etc.
  • Extremely intuitive and simple in use. Ideally, you get the idea how to use it by just looking at the list of public methods and properties.
  • Expandable with regard to sources of CSV data
  • Fast, straightforward and clean parser with minimum conditional logic

I believe that base class (and its descendants), I present here, satisfies the above requirements.

CSVFileReader and CSVStringReader are light weighted and fast classes that resemble unidirectional data set. They are very simple in use and have properties that allow handling a number of existing variations of CSV and "CSV-like" formats.

Classes are derived from abstract CSVReader class that does not specify data source and instead works with instance of TextReader class.

CSVFileReader and CSVStringReader accept file and string as data sources respectively. They introduce additional “CSV source” related properties and override abstract method that returns instance of specific TextReader descendant:

C#
protected abstract TextReader CreateDataSourceReader();

Classes for other CSV data sources can be created in a similar way.

Input Data Format (CSV Format)

According to Wikipedia: “A general standard for the CSV file format does not exist, but RFC 4180 provides a de facto standard for some aspects of it.”

While this CSVReader is RFC4180-compliant, it provides lots of “extras” (see Appendix below for summary of RFC4180).

CSVReader Features

  • Supports three kinds of line delimiters: <CR>, <CR><LF> and <LF>, all of which can be present in the same CSV file simultaneously. Consequently, pair <LF><CR> will result in empty line, which situation still can be handled setting property IgnoreEmptyLines to true.
  • Presence of header in the very first record of file is controlled by bool property HeaderPresent.
  • Empty lines can be ignored (by default, they are not ignored).
  • Number of fields is auto-detected (by default) on the base of first record or must be set explicitly if auto-detection is off.
  • Field separator by default is comma (0x2C) but virtually any (Unicode) character can be used, for example, TAB, etc.
  • Field quoting allows multi-line field values and presence of quote and field separator characters within field. By default, it is assumed that field may or may not be enclosed in quotes but reader can be instructed not to use field quoting.
  • Quote character by default is double quotes (0x22) but virtually any (Unicode) character can be used. It is assumed that quote character is also used as escape character.
  • Unicode range of the character codes is assumed by default but can be limited to ASCII only by setting corresponding property to true
  • Characters with codes below 0x20 are considered to be “Special characters” and by default must not appear in the file. That requirement does not affect line delimiters and field separator and/or quote character if they are from that range. Optionally, the reader can be instructed to simply ignore special characters.
  • Reader itself does not use buffering. It uses memory just enough to store field names and field values of current record. If any buffering is happening, then standard .NET classes like StreamReader and StringReader are responsible for that.
  • Reader supposedly is fast since it reads each character directly from TextReader and analyzes character just once, i.e., reader does one-pass parsing. Also, parser uses minimum conditional logic.

Public Class Members

Constructors

Each class has single constructor with no parameters.

Input Properties

Attempt to change their values in Active/Open state causes exception.

Common (CSVReader) properties
C#
bool HeaderPresent
/*- False (by default): All records are considered as records with values.
  - True: Values in the very first record are considered as field names.*/

bool FieldCount_AutoDetect
/*- False: In this situation FieldCount should be set before Open().
  - True (by default): FieldCount auto detection is done during/within Open 
  on the base of very first record (or on the base of first non-empty record 
  when IgnoreEmptyLines==true) and is available after Open()is complete.*/

int FieldCount
/*- FieldCount is always >= 0. Attempt to assign negative value is ignored. 
If FieldCount_AutoDetect is true then assigned value is meaningless and will be replaced during Open().*/

int FieldSeparatorCharCode
/*- It is a code (!) and not a char. Default value is 0x2C (comma). 
Virtually any (see also property ASCIIonly) Unicode character including 
special characters like TAB, etc. can be used as field separator.*/

bool UseFieldQuoting
/*- False: In this case it is assumed that "quote char" is never used 
for field value surrounding and is considered as ordinary data character 
provided that code is in data char code range, i.e. not special character. 
In general, value specified as QuoteCharCode (see below) is meaningless.
  - True (by default): Field value may or may not be enclosed 
  in characters specified in QuoteCharCode (see below).*/

int QuoteCharCode
/*- It is a code (!) and not a char. Default value is 0x22 (double quotes). 
Virtually any (see also property ASCIIonly) Unicode character can be used as "quote char". 
It is assumed that this character is also used as escape character.*/
           
bool IgnoreEmptyLines
/*- False (by default): Presence of empty lines in source, 
which is indication of wrong input data format, causes exception.
  - True: Empty lines are ignored.*/

bool ASCIIonly
/*- False (by default): Full Unicode range of characters is handled. 
Characters with codes less than 0x20 are considered as "special characters" 
(see property IgnoreSpecialCharacters below).
  - True: Only ASCII range of characters is handled. 
  Characters with codes outside range 0x20 – 0x7E are considered as 
  "special characters" (see property IgnoreSpecialCharacters below).*/

bool IgnoreSpecialCharacters
/*- False (by default): Presence of "special characters", as they 
defined above in property ASCIIonly description, causes exception. 
This does not affect line breaks, field separator and quote characters 
even if last two are from the "special character" range.
  - True: "Special characters" are ignored except line breaks, 
  field separator and quote characters even if last two are from the 
  "special character" range.*/
CSVFileReader specific properties
C#
string FileName
//- CSV data file path. 
CSVStringReader specific properties
C#
string DataString
//- CSV data string. 

Other Common Properties

C#
bool Active
/*- Active indicates current state of the reader. Active is true after successful Open() 
until Close(). Any exception related to reading source data changes state to "inactive" 
(Active == false). Setting Active = true is equivalent to Open() and Active = false is equivalent to Close().*/

bool Bof
//- It is true at the beginning of data source.

bool Eof
//- It is true at the end of data source.

CSVFields Fields
/*- Indexed property that provides access to fields of the current record. 
Each Fields[i] element is an instance of the class CSVField that exposes two properties: 
string Value, which holds value of the field, and string Name, which holds name of the 
field and is always empty if property HeaderPresent is false. 
Field names are available after Open() and until Close().*/

int RecordCountProcessedSoFar
//- Number of records (not lines!) from data source processed at any given moment

Methods

C#
void Open()

void Close()

void Next()
/*- Reads next record. Calling Next() after end of file is reached results 
in return of the record with empty field values.*/     

Events

C#
event EventHandler FieldCountAutoDetectCompleted
/*- This event fires from within Open() if FieldCount_AutoDetect is true. 
Use of this event is optional since "auto-detected" FieldCount is 
available upon completion of Open() any way.*/

Using the Code

Use is straightforward. Just create instance of corresponding class, specify source of CSV data, modify some properties if necessary, call Open() and iterate through records calling Next(). Within each record iterate through field values. Call Close() when done.

Using CSVFileReader Class

C#
using Nvv.Components.CSV;    //namespace

using (CSVFileReader csvReader = new CSVFileReader())
{
    csvReader.FileName = "CSVFilePath"; // Assign CSV data file path
    // Modify values of other input properties if necessary. For example:
    csvReader.HeaderPresent = true;

    csvReader.Open();

    if (csvReader.HeaderPresent)
        for (int i = 0; i < csvReader.FieldCount; i++)
        {
            // Process field names csvReader.Fields[i].Name. For example:
            Console.WriteLine("Name{0}={1}", i, csvReader.Fields[i].Name);
        }
    while (!csvReader.Eof)
    {
        for (int i = 0; i < csvReader.FieldCount; i++)
        {
            // Process current record's field values csvReader.Fields[i].Value. For example:
            Console.WriteLine("Value{0}={1}", i, csvReader.Fields[i].Value);
        }
        csvReader.Next();
    }

    csvReader.Close(); //Recommended but optional if within "using"
}

Using CSVStringReader Class

C#
using Nvv.Components.CSV;    //namespace

using (CSVStringReader csvReader = new CSVStringReader())
{
    csvReader.DataString = "1,2,3"; // Assign string containing CSV data
    // Modify values of other input properties if necessary. For example:
    csvReader.HeaderPresent = true;

    csvReader.Open();

    if (csvReader.HeaderPresent)
        for (int i = 0; i < csvReader.FieldCount; i++)
        {
            // Process field names csvReader.Fields[i].Name. For example:
            Console.WriteLine("Name{0}={1}", i, csvReader.Fields[i].Name);
        }
    while (!csvReader.Eof)
    {
        for (int i = 0; i < csvReader.FieldCount; i++)
        {
            // Process current record's field values csvReader.Fields[i].Value. For example:
            Console.WriteLine("Value{0}={1}", i, csvReader.Fields[i].Value);
        }
        csvReader.Next();
    }
    csvReader.Close(); //Recommended but optional if within "using"
}

Downloading Source Code

The following source code which was prepared in Visual C# 2010 Express is available for download:

  • C# solution and project of assembly containing classes CSVReader, CSVFileReader and CSVStringReader is in CSVClasses folder.
  • C# solution of application that tests both CSVFileReader and CSVStringReader classes is in CSVTest folder. This solution also includes and references the above CSVClasses assembly project. If there is interest in this application, then in order to avoid reference breakage, it probably makes sense to unzip everything together exactly how it is in zip file.

Both solutions target .NET 4.0 though at least CSVClasses most likely can be “retargeted” to other versions as well.

Appendix

Brief summary of definition of the CSV Format from RFC 4180 (http://tools.ietf.org/html/rfc4180):

  • Each record is on separate line(s) delimited by line break (<CR><LF> = 0x0D 0x0A) except last record where it is optional.
  • Optional header (with field names) can be present as first record.
  • Each record should contain the same number of fields throughout the file. That actually does not allow empty lines except for CSV file with single field, in which case it just holds single “empty” value.
  • Field separator is comma (0x2C).
  • Field may or may not be enclosed in double quotes (0x22), which, if enclosed, allows line break, double quotes and comma within field. Double quotes is also used as escape character.
  • Spaces are considered part of a field and should not be ignored.
  • Text data that can appear in the field is limited to code ranges 0x20 – 0x7E (which obviously limits it to ASCII codes).

History

Version 1.1 (2014-09-03)

1. Namespace changed

2. Significant performance improvement:

  • Use of StringBuilder where it is appropriate.
  • Assembled frequently called methods/procedures into big procedure at expense of code structuring and readability. Apparently time of procedural call is significant.

Version 1.0 (2014-05-20)

  • First release

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)