Introduction
This tip presents a technique for implementing a Trim
method for string
s that provides added flexibility, yet should avoid the performance degradation involved in searching large character arrays.
Background
The basic Trim
, TrimStart
, and TrimEnd
methods of .NET's String
class are fine for many uses, but may become unwieldy when an application requires more flexibility.
Without parameters (or when a null
or empty array is provided), these methods remove "any leading and[/or] trailing characters that produce a return value of true
when they are passed to the Char.IsWhiteSpace
method". That should perform pretty well.
What has caught me out several times is that many characters I think of as whitespace (such as LINEFEED
, CARRIAGE RETURN
, and NULL
) are not considered whitespace characters; they're control characters.
With the provided methods, if you want to omit other characters, you have to make a character array that contains all the characters you don't want. If your needs are simple, maybe a very small array might do, e.g., new char[] { ' ' , '\t' , '\n' , '\r' , '\0' }
.
Even if the Trim
method needs to search this array (as I suspect it does) for each character it finds until it finds a "good" character, this should be pretty quick. But, as the array grows -- such as if you want to include all of the whitespace characters and all of the control characters and who knows what else -- then performance must degrade. Obviously, you should put the more frequently used characters at the beginning of the array.
An option may be to use a HashSet
if there are many characters to omit.
I have not made a concerted effort to test the performance of any of these options.
The Code
Personally, I also think that having separate methods for trimming the string
s differently is pretty silly, so this code includes an alternative to that.
This enumeration is used to allow the caller to specify which ends of the string
to trim.
[System.ComponentModel.DescriptionAttribute("Specifies the ends of a string.")]
public enum StringEnd
{
[System.ComponentModel.DescriptionAttribute("Neither end.")]
None = 0
,
[System.ComponentModel.DescriptionAttribute("The end with the lower indices.")]
Little = 1
,
[System.ComponentModel.DescriptionAttribute("The end with the higher indices.")]
Big = 2
,
[System.ComponentModel.DescriptionAttribute("Both ends.")]
Both = 3
}
And here is the Trim
method. It's pretty simple. It requires that the caller pass in a reference to a method that returns true
if the provided character is to be omitted, or false
otherwise.
public static partial class LibExt
{
public delegate bool Unwanted ( char C ) ;
public static string
Trim
(
this string Victim
,
StringEnd WhichEnd
,
Unwanted Unwanted
)
{
int offset = 0 ;
if ( ( WhichEnd & StringEnd.Little ) == StringEnd.Little )
{
while ( ( offset < Victim.Length ) && Unwanted ( Victim [ offset ] ) ) offset++ ;
}
int length = Victim.Length ;
if ( ( WhichEnd & StringEnd.Big ) == StringEnd.Big )
{
while ( ( length > offset ) && Unwanted ( Victim [ length - 1 ] ) ) length-- ;
}
return ( Victim.Substring ( offset , length - offset ) ) ;
}
}
Using the Code
One of the classes I'm working on this week needs to trim all whitespace and control characters (primarily SPACES and NULL
s) from both ends of several string
s, so I chose to do that like this:
result = result.Trim
(
PIEBALD.Lib.LibExt.Trim.StringEnd.Both
,
delegate
(
char C
)
{
return ( System.Char.IsWhiteSpace ( C ) || System.Char.IsControl ( C ) ) ;
}
) ;
Again, I put the test for whitespace characters before the test for control characters because there's likely to be more of them.
I could also do that with a HashSet
:
private static readonly System.Collections.Generic.HashSet<char> unwanted =
new System.Collections.Generic.HashSet<char>
( new char[] { ' ' , '\t' , '\n' , '\r' , '\0' } ) ;
result = result.Trim
(
PIEBALD.Lib.LibExt.Trim.StringEnd.Both
,
delegate
(
char C
)
{
return ( unwanted.Contains ( C ) ) ;
}
) ;
A small HashSet
probably doesn't perform as well as a small array, but as the number of characters grows, it may become a good option.
History
- 2018-04-06: First submitted