Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / programming / performance

Efficient Algorithms to Trim In-String White Spaces

4.98/5 (23 votes)
14 Aug 2015CPOL6 min read 80.5K  
This is an alternative for Fastest method to trim all whitespace from Strings in .NET

Introduction

The article describes several algorithmic solutions which allow to Trim In-String White Spaces, or reduce the sequence of White Spaces to just a single one. 

White Spaces vs Blank Spaces: Important clarification

The term White Space should not be confused with Blank Space (aka Space) symbol corresponding to ASCII 32 or Unicode U+0020 [1]. White Spaces represent much broader category which also includes Line Feed, Tab, Carriage Return, etc [1].  The article provides universal solutions to trim (or reduce) all in-string White Spaces, and also several algorithms pertinent to Blank Spaces selected as the most common practical examples. The same principles and algorithmic solutions could be applied to other White Spaces as well.

Extension to String.Trim() Method

Standard .NET/C# Library contains Trim(), TrimStart() and TrimEnd() methods allowing to strip leading/trailing, or both White Spaces, but there is no method capable of trimming in-string White Spaces. Proposed solution extends the standard functionality allowing to either trim in-string White Spaces (corresponding to the newly introduced TrimWithin() method) or reduce the sequence of White Spaces to just a single one (another newcomer titled TrimWithin2one() method). Multiple coding techniques have been tested/benchmarked in order to find the most efficient algoritmic solution. So far, two algoritms based on String-to-Char[] conversion [5...7] found to be the most efficient ones (see Listing 1 and 2), recommended for practical use. Other algorithms are listed mostly for didactic purpose.

Safe code

Proposed solution implements safe/managed code base, thus it does NOT include any pointers arithmetic and/or any low level function API calls. It's relevant to mention, that unsafe code may demonstrate some marginal performance improvement (refer to the discussion in [5]), but, by the very definition, it is unsafe and, therefore, not recommended for use.

Performance estimates: sample test results

Sample test performed on the local PC (i3 Core Duo 3220 w/Kingston HyperX 1600MHz DDR3) using following multilingual Unicode test string w/multiple Blank Spaces added intentionally, sample performance benchmarks estimates follow (note: highlighted algos 1 and 2 demonstrate best performance, i.e. are recommended):

TEST STRING

MY  WAY  OR              WIFI  !  VERSTEHEN, АННА ?     ХОРОШО ГРАФ, BAZAARU НЕТ  !  Εντάξει   !


TRIM ALL IN-STRING WHITE SPACES (RESULT STRING)

MYWAYORWIFI!VERSTEHEN,АННА?ХОРОШОГРАФ,BAZAARUНЕТ!Εντάξει! 

REDUCE IN-STRING WHITE SPACES SEQUENCE TO A SINGLE ONE (RESULT STRING)

MY WAY OR WIFI ! VERSTEHEN, АННА ? ХОРОШО ГРАФ, BAZAARU НЕТ ! Εντάξει !


BENCHMARKS

ALGORITHM RUNTIME, μs ELAPSED, ms ITERATIONS NOTES
1). TrimWithin 0.9 942 1,000,000 Trim In-String White Spaces
2). TrimWithin2one 1.2 1,224 1,000,000 Reduce In-String White Spaces to 1
3). String.Replace method: 1.3 1,314 1,000,000 Trim In-String Blank Spaces
4). String Split/Concat algo: 1.4 1,371 1,000,000 Trim In-String Blank Spaces
5). String Split/Join algo: 1.4 1,366 1,000,000 Trim In-String Blank Spaces
6). Trim2one algo: 1.8 1,817 1,000,000 Reduce In-String Blank Spaces to 1

Background

This article [1] inspired by the original article 'Fastest method to trim all whitespace from Strings in .NET' posted by Felipe R. Machado [2]. Some algos are included mostly for the 'academic' interest w/limited practical value demonstrated rather poor performance (namely, the algos based on Regex and Linq); therefore, they are excluded from the current consideration. The following code snippets allows accurate benchmarking of different algorithms based on the following coding techniques:

  • String-to-Char[] and Char[]-to-String technique (the fastest/recommended one)
  • String.Replace() method as suggested by Chris Maunder
  • String.Split() in conjunction with String.Join() included in the original article [2]
  • String.Split() in conjunction with String.Concat() as per [4]

Using the code

Listing 1 includes the code snippet pertinent to the most efficient (so far) in-string White Space TrimWithin() algorithm and corresponding benchmark. It could be used in conjunction with standard Trim() or TrimStart() methods removing the leading White Space.

Listing 1: TrimWithin() method to trim all in-string White Spaces from any Unicode/ASCII String
C#
#region TrimWithin() Algo/Benchmark
/// <summary>
/// Trim out all in-string White Spaces using String-to-Char[] conversion
/// </summary>
/// <param name="InputString">string</param>
/// <returns>string</returns>
public string TrimWithin(string InputString)
{
    char[] _arInput;
    char[] _arBuffer;
    string _result = String.Empty;
    try
    {
        // basic input validation: cannot be null, empty or white space
        if (String.IsNullOrEmpty(InputString) || String.IsNullOrWhiteSpace(InputString) )
        { throw new ArgumentNullException("Empty String"); }

        // get char[] from InputString
        _arInput = InputString.ToCharArray();

        // create buffer char[] array to populate without white spaces
        _arBuffer = new char[_arInput.Length];

        // set initial index of buffer array
        int _idx = 0;

        // iterate through input array _arInput
        // and populate buffer array excluding white spaces
        for (int i = 0; i < _arInput.Length; i++)
        {
            if (!Char.IsWhiteSpace(_arInput[i]))
            {
                _arBuffer[_idx] = (_arInput[i]);
                _idx++;
            }
        }

        // convert buffer Char[] into string and return
        return new String(_arBuffer, 0, _idx);
    }
    catch { throw; }
    finally { _arInput = null; _arBuffer = null; }
}

/// <summary>
/// Trim all in-string White Spaces using String-to-Char[] - Benchmark
/// Re: Ayoola-Bell Toggle Case Algorithm
/// http://www.codeproject.com/Tips/160799/How-to-Toggle-String-Case-in-NET
/// </summary>
/// <param name="InputString">string</param>
/// <returns>string</returns>
public double TrimWithinBenchmark(string InputString,
                                  Int64 LoopCounter)
{
    // result string
    string _result = String.Empty;

    // stopwatch obj (ref: System.Diagnostics)
    Stopwatch _sw = new Stopwatch();

    // start count ticks
    _sw.Start();

    // trim using String-to-Char[] and String (Char[]) conversions
    for (Int64 i = 0; i < LoopCounter; ++i)
    {
        char[] _arInput = InputString.ToCharArray();
        char[] _arBuffer = new char[_arInput.Length];

        // set initial index of buffer array
        int _idx = 0;

        // iterate through input array _arInput
        // and populate buffer array excluding white spaces
        for (int j = 0; j < _arInput.Length; j++)
        {
            // populate buffer array excluding white spaces
            if (!Char.IsWhiteSpace(_arInput[j]))
            {
                _arBuffer[_idx] = (_arInput[j]);
                _idx++;
            }
        }
        _result = (new String(_arBuffer, 0, _idx));
    }
    // stop count
    _sw.Stop();

    // duration measured in ticks
    Int64 _ticks = _sw.ElapsedTicks;

    // duration in msec (rounded)
    return Math.Round(TicksToMillisecond(_ticks), 2);
}
#endregion

 

Listing 2 shown below includes the code snippet pertinent to the most efficient (so far) in-string White Space TrimWithin2one() algorithm and corresponding benchmark. The algoritms reduces any in-string sequence of the same White Spaces into a single one. It could also be used in conjunction with standard Trim() method removing the leading/trailing White Spaces.

Important clarification: there is another algo proposed in comments thread by CP member Alexander Chernosvitov, which implements a different logic, namely: trimming a sequence of any White Spaces to the first one found in sequence, thus pertinent to the sequence of 5 blanks followed by 5 tabs the algo below will produce one blank and one tab char, while the proposed alternative will produce just one blank ignoring tab char

Listing 2: TrimWithin2one() method to reduce in-string sequence of White Spaces to a single one
C#
#region TrimWithin2one
  /// <summary>
  /// Reduce in-string White Spaces sequences to a single one
  /// </summary>
  /// <param name="InputString">string</param>
  /// <returns>string</returns>
  public string TrimWithin2one(string InputString)
  {
      char[] _arInput;
      char[] _arBuffer;
      string _result = String.Empty;
      try
      {
          // basic input validation: cannot be null, empty or white space
          if (String.IsNullOrEmpty(InputString) || String.IsNullOrWhiteSpace(InputString) )
          { throw new ArgumentNullException("Empty String"); }

          // get Char[] from input string
          _arInput = InputString.ToCharArray();

          // create buffer array
          _arBuffer = new char[_arInput.Length];

          // insert first element from  input array into buffer
          _arBuffer[0] = _arInput[0];

          // set initial index of buffer array
          int _idx = 1;

          // iterate through input array starting from 2nd element
          // and populate buffer array reducing amount of white spaces
          for (int i = 1; i < _arInput.Length; i++)
          {
              if (!Char.IsWhiteSpace(_arInput[i]) ||
                  (Char.IsWhiteSpace(_arInput[i]) && _arInput[i] != _arBuffer[_idx - 1]))
              { _arBuffer[_idx] = (_arInput[i]); _idx++; }
          }

          // convert buffer array into string to return
          return new String(_arBuffer, 0, _idx);
      }
      catch { throw; }
      finally { _arInput = null; _arBuffer = null; }
  }

  /// <summary>
  /// Reduce in-string White Spaces sequence to a single one: Benchmark
  /// </summary>
  /// <param name="InputString">string</param>
  /// <returns>string</returns>
  public double TrimWithin2oneBenchmark(string InputString,
                                        Int64 LoopCounter)
  {
      // result string
      string _result = String.Empty;

      // stopwatch obj (ref: System.Diagnostics)
      Stopwatch _sw = new Stopwatch();

      // start count ticks
      _sw.Start();

      // trim using String-to-Char[] conversion
      for (Int64 i = 0; i < LoopCounter; ++i)
      {
          char[] _arChars = InputString.ToCharArray();
          char[] _buffer = new char[_arChars.Length];
          _buffer[0] = _arChars[0];
          int pos = 1;
          for (int j = 1; j < _arChars.Length; j++)
          {
              if (!Char.IsWhiteSpace(_arChars[j]) ||
                 (Char.IsWhiteSpace(_arChars[j]) && _arChars[j] != _buffer[pos-1] ))
                 { _buffer[pos] = (_arChars[j]); pos++; }
          }
          _result = (new String(_buffer, 0, pos));
      }
      // stop count
      _sw.Stop();

      // duration measured in ticks
      Int64 _ticks = _sw.ElapsedTicks;

      // duration in msec (rounded)
      return Math.Round(TicksToMillisecond(_ticks), 2);
  }
  #endregion

Points of Interest

Included mostly for didactic purpose, three Blank Spaces Trimming Algorithms in .NET/C# managed code are included in the following snippet:

Listing 3: Trim out all Blank Spaces algos
C#
//******************************************************************************
// Module           :   TrimString.cs
// Description      :   Trim White Space Algorithm benchcmarks
//******************************************************************************
// Author           :   Alexander Bell
// Date             :   08/01/2015
//******************************************************************************
// DISCLAIMER: This Application is provide on AS IS basis without any warranty
//******************************************************************************
using System;
using System.Diagnostics;

namespace StringOp
{
    public class StringTrim
    {
        #region TrimUsingReplace
        /// <summary>
        /// Trim White Spaces using String.Replace method
        /// </summary>
        /// <param name="InputString">string</param>
        /// <param name="OldString">string</param>
        /// <param name="NewString">string</param>
        /// <param name="LoopCounter">Int64</param>
        /// <returns>double</returns>
        public double TrimUsingReplace(string InputString, 
                                        string OldString, 
                                        string NewString, 
                                        Int64 LoopCounter)
        {
            // result string
            string _result = String.Empty;

            // stopwatch obj (ref: System.Diagnostics)
            Stopwatch _sw = new Stopwatch();

            // start count ticks
            _sw.Start();

            // trim using String.Replace
            for (Int64 i = 0; i < LoopCounter; ++i)
            {
                _result=InputString.Replace(OldString, NewString);
            }

            // stop count
            _sw.Stop();

            // duration measured in ticks
            Int64 _ticks = _sw.ElapsedTicks;

            // duration in msec (rounded)
            return Math.Round(TicksToMillisecond(_ticks), 2);
        }
        #endregion

        #region TrimUsingSplitJoin
        /// <summary>
        /// Trim White Spaces using String Split/Join methods
        /// </summary>
        /// <param name="InputString">string</param>
        /// <param name="Separator">string</param>
        /// <param name="LoopCounter">Int64</param>
        /// <returns>double</returns>
        public double TrimUsingSplitJoin(string InputString, 
                                        string Separator, 
                                        Int64 LoopCounter)
        {
            // result string
            string _result = String.Empty;

            // stopwatch obj (ref: System.Diagnostics)
            Stopwatch _sw = new Stopwatch();

            // start count ticks
            _sw.Start();

            // trim using String Split/Join
            for (Int64 i = 0; i < LoopCounter; ++i)
            {
                _result = String.Join(String.Empty,InputString.Split(new string[]{Separator}, 
                    StringSplitOptions.RemoveEmptyEntries));
            }

            // stop count
            _sw.Stop();

            // duration measured in ticks
            Int64 _ticks = _sw.ElapsedTicks;

            // duration in msec (rounded)
            return Math.Round(TicksToMillisecond(_ticks), 2);
        }
        #endregion

        #region TrimUsingSplitConcat
        /// <summary>
        /// Trim White Spaces using String Split/Concat methods
        /// </summary>
        /// <param name="InputString"></param>
        /// <param name="Separator"></param>
        /// <param name="LoopCounter">Int64</param>
        /// <returns>double</returns>
        public double TrimUsingSplitConcat(string InputString, 
                                           string Separator, 
                                            Int64 LoopCounter)
        {
            // result string
            string _result = String.Empty;

            // stopwatch obj (ref: System.Diagnostics)
            Stopwatch _sw = new Stopwatch();

            // start count ticks
            _sw.Start();

            // trim using String Split/Concat
            for (Int64 i = 0; i < LoopCounter; ++i)
            {
                _result = String.Concat(InputString.Split(new string[] { Separator }, 
                    StringSplitOptions.RemoveEmptyEntries));
            }

            // stop count
            _sw.Stop();

            // duration measured in ticks
            Int64 _ticks = _sw.ElapsedTicks;

            // duration in msec (rounded)
            return Math.Round(TicksToMillisecond(_ticks), 2);
        }
        #endregion

        /// <summary>
        /// Ticks-to-ms
        /// </summary>
        /// <param name="Ticks">Int64</param>
        /// <returns>double</returns>
        internal double TicksToMillisecond(Int64 Ticks)
        {
            // msec per tick (stopwatch frequency in kHz)
            double _msPerTick = (double)1000 / Stopwatch.Frequency;

            // duration in msec
            return Ticks * _msPerTick;
        }
    }
}

The testing website [2] provides numeric performance estimates (benchmarks) pertinent to each algo. Enter the arbitrary text, select the number of iterations from Drop-down list and click on TRIM button to get the benchmarks estimates. Unicode is OK, but plz limit text to nvarchar(100). Thx! :)

COLLAPSE N BLANK SPACES INTO ONE USING STRING.REPLACE

Listing 4: Trim up to 14 consecutive white spaces into 1
C#
#region Trim2one
/// <summary>
/// Benchmark:
/// Algo to Trim up to 14 consequitive white spaces into 1
/// </summary>
/// <param name="InputString">string</param>
/// <param name="LoopCounter">Int64</param>
/// <returns>double</returns>
public double Trim2oneBenchmark(string InputString,
                                Int64 LoopCounter)
{
    const string _whiteSpc1 = " ";
    string _whiteSpc2 = _whiteSpc1 + _whiteSpc1;
    string _whiteSpc3 = _whiteSpc2 + _whiteSpc1;
    string _whiteSpc4 = _whiteSpc3 + _whiteSpc1;
    string _whiteSpc8 = _whiteSpc4 + _whiteSpc4;

    // result string
    string _result = String.Empty;

    // stopwatch obj (ref: System.Diagnostics)
    Stopwatch _sw = new Stopwatch();

    // start count ticks
    _sw.Start();

    // trim using String.Replace
    for (Int64 i = 0; i < LoopCounter; ++i)
    {
        _result= InputString.Replace(_whiteSpc8, _whiteSpc1).
                             Replace(_whiteSpc4, _whiteSpc1).
                             Replace(_whiteSpc3, _whiteSpc1).
                             Replace(_whiteSpc2, _whiteSpc1);
    }

    // stop count
    _sw.Stop();

    // duration measured in ticks
    Int64 _ticks = _sw.ElapsedTicks;

    // duration in msec (rounded)
    return Math.Round(TicksToMillisecond(_ticks), 2);
}

/// <summary>
/// Trim up to 14 consequitive white spaces into 1
/// </summary>
/// <param name="InputString">string</param>
/// <returns>string</returns>
public string Trim2one(string InputString)
{
    const string _whiteSpc1 = " ";
    string _whiteSpc2 = _whiteSpc1 + _whiteSpc1;
    string _whiteSpc3 = _whiteSpc2 + _whiteSpc1;
    string _whiteSpc4 = _whiteSpc3 + _whiteSpc1;
    string _whiteSpc8 = _whiteSpc4 + _whiteSpc4;

    return InputString.Replace(_whiteSpc8, _whiteSpc1).
                       Replace(_whiteSpc4, _whiteSpc1).
                       Replace(_whiteSpc3, _whiteSpc1).
                       Replace(_whiteSpc2, _whiteSpc1);
}
#endregion

Note: the code snippet included in Listing 5 implements a string composition shown below for a didactic purpose (better readability) and also to avoid any possibility of mistyping a long string of blanks.

Listing 4a: Sequence of white spaces composed of vars
C#
const string _whiteSpc1 = " ";
string _whiteSpc2 = _whiteSpc1 + _whiteSpc1;
string _whiteSpc3 = _whiteSpc2 + _whiteSpc1;
string _whiteSpc4 = _whiteSpc3 + _whiteSpc1;
string _whiteSpc8 = _whiteSpc4 + _whiteSpc4;

In a production version some performance improvement could be achieved by replacing the string vars w/const (as mentioned in comment thread):

Listing 4b. string const containing sequences of blank spaces (1, 2, 3, 4, 8)
C#
const string _whiteSpc1 = " ";
const string _whiteSpc2 = "  ";
const string _whiteSpc3 = "   ";
const string _whiteSpc4 = "    ";
// etc.

The algorithm will trim up to 14 consequitive white spaces into one using String.Replace() method. To better understand the logic behind this algo, refer to the following transition diagram, which demonstrates the white spaces reduction process.

White Space Reduction algo using String.Replace() method

1st Replace(8-to-1), white space reduction
2->2
3->3
4->4
5->5
6->6
7->7
8->1 (done)
9->2
10->3
11->4
12->5
13->6
14->7

2nd Replace(4-to-1), white space reduction
2->2
3->3
4->1 (done)
5->2
6->3
7->4 
8->1 (done)
9->2
10->3
11->1 (done)
12->2
13->3
14->4

3rd Replace (3-to-1), white space reduction
2->2
3->1 (done)
4->1 (done)
5->2
6->1 (done)
7->2 
8->1 (done)
9->2
10->1 (done)
11->1 (done)
12->2
13->1 (done)
14->2

4th Replace(2-to-1), white space reduction
2->1 (done)
3->1 (done)
4->1 (done)
5->1 (done)
6->1 (done)
7->1 (done)
8->1 (done)
9->1 (done)
10->1 (done)
11->1 (done)
12->1 (done)
13->1 (done)
14->1 (done)

This algoritmic solution could be further extended to apply to any N consecutive Blank Spaces.

History
  • Original article posted [1]
  • This alternative version posted
  • Trim2one algorithm added
  • Trim ALL White Spaces using String-toChar[] algo added
  • Reduce ALL White Spaces to 1 using String-toChar[] algo added
  • Article restructured, code base/benchmarks updated

 

References

  1. Whitespace character (wiki)
  2. Felipe R. Machado, Fastest method to trim all whitespace from Strings in .NET
  3. String.Trim Method (MSDN)
  4. A.Bell, Efficient string concatenation algorithm implements String.Join() method
  5. Ayoola-Bell, How to Toggle String Case in .NET
  6. String.ToCharArray Method (MSDN)
  7. Char.IsWhiteSpace Method (MSDN)

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)