Introduction
This is how CSV can be parsed when it is embedded in double quotes.
Background
The article in based on RFC 4180, which states that the standard CSV may contain values separated by comma and embedded in double quotes.
Take a look at the RFC here: http://tools.ietf.org/html/rfc4180.
The Code
public string[] SplitOnDoubleQuotes(string line)
{
int i = 0;
ArrayList occurs = new ArrayList();
if (line.IndexOf('\n') <= 0)
line = line + '\n';
while (true)
{
if (line[i] == '\n')
break;
if (line[i] == ',')
if (line[i-1] == '"' && line[i+1] == '"')
occurs.Add(i);
i++;
}
line= line.Remove(line.Length - 1);
ArrayList tokens = new ArrayList();
int startIdx =0;
int endIdx;
int len ;
for(int t =0; t<= occurs.Count ; t++)
{
if (t != occurs.Count)
{
len = (int)occurs[t] - startIdx;
tokens.Add((line.Substring(startIdx, len).StartsWith(
",") == true) ? line.Substring(startIdx, len).Remove(
0,1):line.Substring(startIdx, len) );
startIdx = (int)occurs[t];
}
else
{
tokens.Add((line.Substring(startIdx).StartsWith(",") == true) ?
line.Substring(startIdx).Remove(0, 1) : line.Substring(
startIdx));
}
}
i=0;
for (i = 0; i < tokens.Count;i++ )
{
string str = tokens[i].ToString();
if (str.StartsWith("\"", StringComparison.Ordinal))
str= str.Remove(0, 1);
if (str.EndsWith("\""))
str= str.Remove(str.Length - 1);
tokens[i] = str;
}
return (string[])tokens.ToArray(typeof(string));
}
Points of Interest
There are many possible ways to do this, and this is just one of it.
Here are the possible ways:
- Define a Finite State Automata to parse the line character by character.
- Using
String.Replace("\",\"",SpecialChar)
, replace the pattern with a special character value which is not possible to occur in your normal set of values in the CSV file.
I hope this code might have been useful for beginners.