Introduction
Over time, I've frequently needed to read and write CSV (Comma Separated Value) files. While most follow a fairly standard format, there are many variations on this theme.
Some common variations include Tab Separated Value files, files with header rows, variations in quoting rules, and files that support comment characters / lines.
The classes described in this article simplify reading and writing of many different format variations. CSV in the code and this article refers
to Character Separated Value (of which Comma Separated Value is a sub-category).
While the demonstration project allows the display and manipulation of a few common CSV variants, it does not demonstrate all of the capabilities of the included classes.
While XML files frequently replace CSV files in common usage today, CSV remains a compact and simple alternative to this file format. A number of different applications,
including Excel, are capable of reading and writing these files.
Background
While learning the code, the user should focus on the CsvFormBase
, CsvFormReader
, and CsvFormWriter
classes, which provide some simple
use cases of the code. The user interface code is long and not of direct relevance to this article. However, it does demonstrate some related themes such
as the DataGridView
control and the BackgroundWorker
component.
Also, the mainBackgroundWorker_DoWork
, mainBackgroundWorker_ProgressChanged
, and mainBackgroundWorker_RunWorkerCompleted
methods
in the MainForm
class may be of particular interest to those developers who need to report progress during the time a CSV file is read or written.
Using the Code
The main classes in this library are CsvReader
and CsvWriter
. Because of the flexibility inherent in the code, they support a large number of methods.
The code itself, which is fully commented, remains the best source of information for most of these methods. This article will only describe some of the more common usage cases.
Reading with CsvReader
At its simplest, a user instantiates a CsvReader
and reads an entire DataTable
at once as follows. This is, essentially,
the approach used in the CsvFormReader
class for the demo project.
using (CsvReader reader = new CsvReader("MyFile.csv"))
{
bool hasHeader = true;
using (DataTable table = reader.ReadTable(hasHeader))
{
foreach (DataRow row in table.Rows)
{
foreach (DataColumn column in table.Columns)
{
string columnValue = (string)row[column];
}
string firstValue = (string)row["First"];
}
}
}
Alternatively, a user can bypass DataTable
altogether and read the file more directly. The following code demonstrates some of the methods for this approach:
using (CsvReader reader = new CsvReader("MyFile.csv"))
{
List<string> columns = reader.ReadRow();
while (reader.ReadRowStart())
{
while (reader.HasColumn)
{
string columnValue = reader.ReadString();
}
reader.ReadRowEnd();
}
}
One problem Windows developers often deal with is that of long running operations. A common approach to this problem is to perform these long running operations
using the BackgroundWorker
component. When using this component, it is usual to report progress. The CsvReader
class exposes
a RowEnd
event for this purpose. The event is raised each time the end of a row is read.
Writing with CsvWriter
The CsvWriter
class generally provides methods which are, in effect, complements of their CsvReader
equivalents. So, for example,
the WriteTable
method is provided to write an entire DataTable
. The WriteRow
, WriteRowStart
,
Write
(a column), and WriteRowEnd
methods are provided for more direct access to the file.
The CsvFormWriter
class has examples of both of these usages.
The Write
method provides many overloads for different data types. If none apply, the object version of the Write
method is used.
This method uses the ToString
method to convert to a string before writing.
File Format Variants
Both the CsvReader
and CsvWriter
classes provide support for many variants of the CSV file format. Principally, this is controlled
via the Options
property which is of type CsvOption
. The CsvOption
enumeration specifies the Flags
attribute,
so multiple options can be combined, as in the following example:
reader.Options = CsvOption.Comment | CsvOption.CommentLine;
Also, options are provided for special characters. For example, the user may specify the character that separates columns. To read a file containing tab separated values,
for example, the user might do the following:
reader.EndColumnChar = '\t';
Generally, the properties for these options and characters should be set immediately after instantiation and before the CsvReader
or CsvWriter
is used.
Points of Interest
Because of the unusually wide variety of options / methods, and the time demands of my job, I have not tested every possible combination. I have been a bit selfish
and tested the portions I used in the demo project and those that were required for my own needs. Hopefully, I haven't missed too much in my testing, but if I missed
something that affects you, the reader, I apologize in advance.
History
- 8/3/2011 - The original version.