Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / C#

Rive

4.97/5 (15 votes)
28 Mar 2010CPOL2 min read 1   121  
An improved string split method.

Introduction

This is an Extension Method which will split a string into substrings much like String.Split only better -- when requested, it will not split on delimiters within quotes or ones that have been "escaped".

Background

Often when I have to split a string -- from a CSV file or a command line perhaps -- I can't use Split because the values may contain the delimiter character. Many years ago, I wrote a string splitter function in C and later ported it to C#, but I haven't been very happy with it. This week I decided to begin afresh and write a new version. This version doesn't have all the features of the old one, but it is easier to read and has more flexibility than Split does.

Option enumeration

As with Split, an enumeration controls which features to use during the split operation; however, Rive supports more options -- specifically the ability to ignore delimiters within quotes. I also threw in the ability to escape characters so they won't be treated as delimiters or quotes.

C#
/**
<summary>
    Options for use with Rive.
</summary>
*/
[System.FlagsAttribute()]
public enum Option
{
    /**
    <summary>
        No options.
    </summary>
    */
    None = 0
,
    /**
    <summary>
        Do not include empty substrings.
    </summary>
    */
    RemoveEmptyEntries = 1
,
    /**
    <summary>
        Treat a special character following a backslash (\) as a regular character.
    </summary>
    */
    HonorEscapes = 2
,
    /**
    <summary>
        Do not split on delimiters within quotes (").
    </summary>
    */
    HonorQuotes = 4
,
    /**
    <summary>
        Do not split on delimiters within apostrophes (').
    </summary>
    */
    HonorApostrophes = 8
}

Rive

The public Rive methods (there are overloads, so the calling code needn't specify every parameter) are just front-ends to the DoRive method.

C#
public static System.Collections.Generic.IList<string>
Rive
(
    this string   Subject
,
    int           Count
,
    Option        Options
,
    params char[] Delimiters
)
{
    if ( Subject == null )
    {
        throw ( new System.ArgumentNullException
            ( "Subject" , "Subject must not be null" ) ) ;
    }

    if ( Count < 0 )
    {
        throw ( new System.ArgumentOutOfRangeException
            ( "Count" , "Count must not be negative" ) ) ;
    }

    return ( DoRive ( Subject , Count , Options , Delimiters ) ) ;
}

DoRive

DoRive behaves much like Split except that it returns an IList<string> rather than a string[], and has additional features.

  • The default delimiters are as documented for String.Split.
  • If Count is zero (0), then an empty collection is returned.
  • If Count is one (1), then the original string is returned unchanged.
  • Otherwise, iterate the string, checking for delimiters and other characters as requested.
  • If Count-1 substrings have been produced, then the rest of the string becomes the final substring.

The additional features are straight-forward:

  • If HonorEscapes is specified and a backslash (\) is encountered, then the following character is copied intact.
  • If HonorQuotes is specified and a quote (") is encountered, then the characters up to the next quote are copied intact.
  • If HonorApostrophes is specified and an apostrophe (') is encountered, then the characters up to the next apostrophe are copied intact.
  • Backslashes, Quotes, and Apostrophes may be escaped.
C#
private static System.Collections.Generic.IList<string>
DoRive
(
    string Subject
,
    int    Count
,
    Option Options
,
    char[] Delimiters
)
{
    System.Collections.Generic.List<string> result =
        new System.Collections.Generic.List<string>() ;

    if ( Count > 1 )
    {
        System.Text.StringBuilder temp =
            new System.Text.StringBuilder() ;

        System.Collections.Generic.HashSet<char> delims =
            new System.Collections.Generic.HashSet<char>() ;

        if ( Delimiters != null )
        {
            delims.UnionWith ( Delimiters ) ;
        }

        if ( delims.Count == 0 )
        {
            delims.UnionWith ( defaultdelimiters ) ;
        }

        bool remove = ( Options & Option.RemoveEmptyEntries ) == Option.RemoveEmptyEntries ;
        bool escape = ( Options & Option.HonorEscapes       ) == Option.HonorEscapes       ;
        bool quote  = ( Options & Option.HonorQuotes        ) == Option.HonorQuotes        ;
        bool apos   = ( Options & Option.HonorApostrophes   ) == Option.HonorApostrophes   ;

        char ch  ;
        int  pos = 0 ;
        int  len = Subject.Length ;

        while ( pos < len )
        {
            ch = Subject [ pos++ ] ;

            if ( delims.Contains ( ch ) )
            {
                if ( ( temp.Length > 0 ) || !remove )
                {
                    result.Add ( temp.ToString() ) ;

                    temp.Length = 0 ;

                    if
                    (
                        ( result.Count == Count - 1 )
                    &&
                        ( pos < len )
                    )
                    {
                        temp.Append ( Subject.Substring ( pos ) ) ;

                        pos = len ;
                    }
                }
            }
            else
            {
                if ( escape && ( ch == '\\' ) && ( pos < len ) )
                {
                    temp.Append ( ch ) ;

                    ch = Subject [ pos++ ] ;
                }
                else if ( quote && ( ch == '\"' ) && ( pos < len ) )
                {
                    do
                    {
                        if ( escape && ( ch == '\\' ) )
                        {
                            temp.Append ( ch ) ;

                            ch = Subject [ pos++ ] ;
                        }

                        temp.Append ( ch ) ;

                        ch = Subject [ pos++ ] ;
                    }
                    while ( ( pos < len ) && ( ch != '\"' ) ) ;
                }
                else if ( apos && ( ch == '\'' ) && ( pos < len ) )
                {
                    do
                    {
                        if ( escape && ( ch == '\\' ) )
                        {
                            temp.Append ( ch ) ;

                            ch = Subject [ pos++ ] ;
                        }

                        temp.Append ( ch ) ;

                        ch = Subject [ pos++ ] ;
                    }
                    while ( ( pos < len ) && ( ch != '\'' ) ) ;
                }

                temp.Append ( ch ) ;
            }
        }

        if ( ( temp.Length > 0 ) || !remove )
        {
            result.Add ( temp.ToString() ) ;
        }
    }
    else if ( Count == 1 )
    {
        result.Add ( Subject ) ;
    }

    return ( result.AsReadOnly() ) ;
}

History

  • 2010-03-26: First submitted.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)