Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / C#

A Flexible String Trim Method

4.83/5 (6 votes)
6 Apr 2018CPOL2 min read 7.9K  
A Trim method for strings that provides flexibility without requiring the use of large character arrays

Introduction

This tip presents a technique for implementing a Trim method for strings that provides added flexibility, yet should avoid the performance degradation involved in searching large character arrays.

Background

The basic Trim, TrimStart, and TrimEnd methods of .NET's String class are fine for many uses, but may become unwieldy when an application requires more flexibility.

Without parameters (or when a null or empty array is provided), these methods remove "any leading and[/or] trailing characters that produce a return value of true when they are passed to the Char.IsWhiteSpace method". That should perform pretty well.

What has caught me out several times is that many characters I think of as whitespace (such as LINEFEED, CARRIAGE RETURN, and NULL) are not considered whitespace characters; they're control characters.

With the provided methods, if you want to omit other characters, you have to make a character array that contains all the characters you don't want. If your needs are simple, maybe a very small array might do, e.g., new char[] { ' ' , '\t' , '\n' , '\r' , '\0' } .

Even if the Trim method needs to search this array (as I suspect it does) for each character it finds until it finds a "good" character, this should be pretty quick. But, as the array grows -- such as if you want to include all of the whitespace characters and all of the control characters and who knows what else -- then performance must degrade. Obviously, you should put the more frequently used characters at the beginning of the array.
An option may be to use a HashSet if there are many characters to omit.
I have not made a concerted effort to test the performance of any of these options.

The Code

Personally, I also think that having separate methods for trimming the strings differently is pretty silly, so this code includes an alternative to that.

This enumeration is used to allow the caller to specify which ends of the string to trim.

C#
[System.ComponentModel.DescriptionAttribute("Specifies the ends of a string.")]
public enum StringEnd
{ 
  [System.ComponentModel.DescriptionAttribute("Neither end.")]
  None   = 0
, 
  [System.ComponentModel.DescriptionAttribute("The end with the lower indices.")]
  Little = 1
, 
  [System.ComponentModel.DescriptionAttribute("The end with the higher indices.")]
  Big    = 2 
, 
  [System.ComponentModel.DescriptionAttribute("Both ends.")]
  Both   = 3
}

And here is the Trim method. It's pretty simple. It requires that the caller pass in a reference to a method that returns true if the provided character is to be omitted, or false otherwise.

C#
public static partial class LibExt
{
  public delegate bool Unwanted ( char C ) ;

  public static string
  Trim
  (
    this string Victim
  ,
    StringEnd   WhichEnd
  ,
    Unwanted    Unwanted
  )
  {
    int offset = 0 ;

    if ( ( WhichEnd & StringEnd.Little ) == StringEnd.Little )
    {
      while ( ( offset < Victim.Length ) && Unwanted ( Victim [ offset ] ) ) offset++ ;
    }

    int length = Victim.Length ;

    if ( ( WhichEnd & StringEnd.Big ) == StringEnd.Big )
    {
      while ( ( length > offset ) && Unwanted ( Victim [ length - 1 ] ) ) length-- ;
    }

    return ( Victim.Substring ( offset , length - offset ) ) ;
  }
} 

Using the Code

One of the classes I'm working on this week needs to trim all whitespace and control characters (primarily SPACES and NULLs) from both ends of several strings, so I chose to do that like this:

C#
result = result.Trim 
( 
  PIEBALD.Lib.LibExt.Trim.StringEnd.Both 
, 
  delegate 
  ( 
    char C
  )
  {
    return ( System.Char.IsWhiteSpace ( C ) || System.Char.IsControl ( C ) ) ;
  }
) ;

Again, I put the test for whitespace characters before the test for control characters because there's likely to be more of them.

I could also do that with a HashSet:

C#
private static readonly System.Collections.Generic.HashSet<char> unwanted = 
  new System.Collections.Generic.HashSet<char> 
  ( new char[] { ' ' , '\t' , '\n' , '\r' , '\0' } ) ; /* This set would have all the characters, 
                                                          not just these */
C#
result = result.Trim 
( 
  PIEBALD.Lib.LibExt.Trim.StringEnd.Both 
, 
  delegate 
  ( 
    char C
  )
  {
    return ( unwanted.Contains ( C ) ) ;
  }
) ;

A small HashSet probably doesn't perform as well as a small array, but as the number of characters grows, it may become a good option.

History

  • 2018-04-06: First submitted

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)