Introduction
This is an Extension Method which will split a string into substrings much like String.Split
only better -- when requested, it will not split on delimiters within quotes or ones that have been "escaped".
Background
Often when I have to split a string -- from a CSV file or a command line perhaps -- I can't use Split
because the values may contain the delimiter character. Many years ago, I wrote a string splitter function in C and later ported it to C#, but I haven't been very happy with it. This week I decided to begin afresh and write a new version. This version doesn't have all the features of the old one, but it is easier to read and has more flexibility than Split
does.
Option enumeration
As with Split
, an enumeration controls which features to use during the split operation; however, Rive supports more options -- specifically the ability to ignore delimiters within quotes. I also threw in the ability to escape characters so they won't be treated as delimiters or quotes.
[System.FlagsAttribute()]
public enum Option
{
None = 0
,
RemoveEmptyEntries = 1
,
HonorEscapes = 2
,
HonorQuotes = 4
,
HonorApostrophes = 8
}
Rive
The public Rive methods (there are overloads, so the calling code needn't specify every parameter) are just front-ends to the DoRive
method.
public static System.Collections.Generic.IList<string>
Rive
(
this string Subject
,
int Count
,
Option Options
,
params char[] Delimiters
)
{
if ( Subject == null )
{
throw ( new System.ArgumentNullException
( "Subject" , "Subject must not be null" ) ) ;
}
if ( Count < 0 )
{
throw ( new System.ArgumentOutOfRangeException
( "Count" , "Count must not be negative" ) ) ;
}
return ( DoRive ( Subject , Count , Options , Delimiters ) ) ;
}
DoRive
DoRive
behaves much like Split
except that it returns an IList<string>
rather than a string[]
, and has additional features.
- The default delimiters are as documented for
String.Split
. - If
Count
is zero (0), then an empty collection is returned. - If
Count
is one (1), then the original string is returned unchanged. - Otherwise, iterate the string, checking for delimiters and other characters as requested.
- If
Count
-1 substrings have been produced, then the rest of the string becomes the final substring.
The additional features are straight-forward:
- If
HonorEscapes
is specified and a backslash (\) is encountered, then the following character is copied intact. - If
HonorQuotes
is specified and a quote (") is encountered, then the characters up to the next quote are copied intact. - If
HonorApostrophes
is specified and an apostrophe (') is encountered, then the characters up to the next apostrophe are copied intact. - Backslashes, Quotes, and Apostrophes may be escaped.
private static System.Collections.Generic.IList<string>
DoRive
(
string Subject
,
int Count
,
Option Options
,
char[] Delimiters
)
{
System.Collections.Generic.List<string> result =
new System.Collections.Generic.List<string>() ;
if ( Count > 1 )
{
System.Text.StringBuilder temp =
new System.Text.StringBuilder() ;
System.Collections.Generic.HashSet<char> delims =
new System.Collections.Generic.HashSet<char>() ;
if ( Delimiters != null )
{
delims.UnionWith ( Delimiters ) ;
}
if ( delims.Count == 0 )
{
delims.UnionWith ( defaultdelimiters ) ;
}
bool remove = ( Options & Option.RemoveEmptyEntries ) == Option.RemoveEmptyEntries ;
bool escape = ( Options & Option.HonorEscapes ) == Option.HonorEscapes ;
bool quote = ( Options & Option.HonorQuotes ) == Option.HonorQuotes ;
bool apos = ( Options & Option.HonorApostrophes ) == Option.HonorApostrophes ;
char ch ;
int pos = 0 ;
int len = Subject.Length ;
while ( pos < len )
{
ch = Subject [ pos++ ] ;
if ( delims.Contains ( ch ) )
{
if ( ( temp.Length > 0 ) || !remove )
{
result.Add ( temp.ToString() ) ;
temp.Length = 0 ;
if
(
( result.Count == Count - 1 )
&&
( pos < len )
)
{
temp.Append ( Subject.Substring ( pos ) ) ;
pos = len ;
}
}
}
else
{
if ( escape && ( ch == '\\' ) && ( pos < len ) )
{
temp.Append ( ch ) ;
ch = Subject [ pos++ ] ;
}
else if ( quote && ( ch == '\"' ) && ( pos < len ) )
{
do
{
if ( escape && ( ch == '\\' ) )
{
temp.Append ( ch ) ;
ch = Subject [ pos++ ] ;
}
temp.Append ( ch ) ;
ch = Subject [ pos++ ] ;
}
while ( ( pos < len ) && ( ch != '\"' ) ) ;
}
else if ( apos && ( ch == '\'' ) && ( pos < len ) )
{
do
{
if ( escape && ( ch == '\\' ) )
{
temp.Append ( ch ) ;
ch = Subject [ pos++ ] ;
}
temp.Append ( ch ) ;
ch = Subject [ pos++ ] ;
}
while ( ( pos < len ) && ( ch != '\'' ) ) ;
}
temp.Append ( ch ) ;
}
}
if ( ( temp.Length > 0 ) || !remove )
{
result.Add ( temp.ToString() ) ;
}
}
else if ( Count == 1 )
{
result.Add ( Subject ) ;
}
return ( result.AsReadOnly() ) ;
}
History
- 2010-03-26: First submitted.