Introduction
In this tip, we will see how to parse delimited files, which may have the delimiter as valid data.
Background
In my recent project, I had to parse data from pipe(|)
delimited files and insert that into database table. The code was completed, deployed and everything was fine until one day an exception was thrown saying the number of columns in the table does not match with number of columns we try to insert (not the exact error message). During analysis, I found that there was data in the file which had pipe(|)
as valid data, which is supposed to be considered as input for a single column, but the file parser logic split that data, which is:
Do you want to know how I solved this issue and parsed the file? Read on.
Using the Code
Before jumping to the code, let's learn about the class that I used for parsing and some of the important methods and properties of it.
The class that is used for parsing is TextFieldParser
. This class is found Microsoft.VisualBasic.FileIO.
SetDelimiters(params string[] delimiters)
- This method sets the delimiter for the file. LineNumber
- This property returns the current line number in the file TrimWhiteSpace
- This property if set to true
trims leading and trailing white spaces ReadFields()
- This method reads all fields on the current line, returns them as an array of string
s, and advances the cursor to the next line containing data. EndOfData
- This property indicates the end of the file.
For the purpose of this tip, I am not going to insert data into DB. Instead, I will be displaying the parsed data in a datagrid view.
Let us first see a working example, then a file causing the exception and see how to solve the exception.
Working Example
Here is my sample file data:
ID|FirstName|LastName|Phone|Address
1|John|John|951425|"Hollywood"
2|Eli|Eli|725486|"New York"
Here is the code for parsing and displaying the data from the file in a datagrid view.
private void btnParse_Click(object sender, EventArgs e)
{
try
{
string[] delimiter = new string[] {"|"};
DataTable fileData = new DataTable();
using (TextFieldParser pipeParser = new TextFieldParser(@"<filepath with file name>"))
{
pipeParser.SetDelimiters(delimiter);
string[] columnHeaders = pipeParser.ReadFields();
foreach (string column in columnHeaders)
{
if (!string.IsNullOrEmpty(column))
{
DataColumn datecolumn = new DataColumn(column);
datecolumn.AllowDBNull = true;
fileData.Columns.Add(datecolumn);
}
}
while (!pipeParser.EndOfData)
{
string[] dataFields = pipeParser.ReadFields();
fileData.Rows.Add(dataFields);
}
}
dataGridView1.DataSource = fileData;
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
}
}
Initial screen when the application is run.
Click the "Parse File and Display Data" button to fire the event. Now we can see the data displayed in the data grid view.
Exception Causing Scenario Example
Now, I am adding a third row to the input file.
ID|FirstName|LastName|Phone|Address
1|John|John|951425|"Hollywood"
2|Eli|Eli|725486|"New York"
3|Tom|Tom|235046|"123 street|Downtown|LA"
You can see that the address column of the third row has pipes(|)
, which is the actual delimiter. Let's see how the code works now. Run the application and click the button to parse and load data. The application throws the following exception.
Surprisingly, the code that worked before, now fails after adding the third row. Let us debug and see what is causing the exception.
In the above screenshot, we can see that the address column has been parsed into 3 different columns instead of a single column, thus causing the exception.
Now the question is how to properly parse this column. Before addressing how to parse, we have to figure out how to distinguish this column from the rest of the columns. There is a hint in the file.
If we look at the Address
column closely, we can find that the data for this column alone is enclosed within double quotes (""). This is what will help us distinguish it from other columns and parse properly.
Solution for the Issue
There is a property called HasFieldsEnclosedInQuotes
, in TextFieldParser
class, which provides the ability to parse the fields within double quotes properly, when set to TRUE
.
HasFieldsEnclosedInQuotes
- This property denotes whether fields are enclosed in quotation marks.
Add this line of code to the button click handler.
pipeParser.HasFieldsEnclosedInQuotes = true;
Now the code will look like this (highlighted the line added).
private void btnParse_Click(object sender, EventArgs e)
{
try
{
string[] delimiter = new string[] {"|"};
DataTable fileData = new DataTable();
using (TextFieldParser pipeParser = new TextFieldParser(@"<filepath with file name>"))
{
pipeParser.SetDelimiters(delimiter);
pipeParser.HasFieldsEnclosedInQuotes = true;
string[] columnHeaders = pipeParser.ReadFields();
foreach (string column in columnHeaders)
{
if (!string.IsNullOrEmpty(column))
{
DataColumn datecolumn = new DataColumn(column);
datecolumn.AllowDBNull = true;
fileData.Columns.Add(datecolumn);
}
}
while (!pipeParser.EndOfData)
{
string[] dataFields = pipeParser.ReadFields();
fileData.Rows.Add(dataFields);
}
}
dataGridView1.DataSource = fileData;
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
}
}
Just build and run the application. Click the button to parse and display data. Now you can the application is able to parse the data successfully.
Here is a screenshot of parsed data from debug mode.
Hope this helps you. If you have any other ideas or feedback, do let me know in the comments section.
Reference
History