Introduction
The article describes several algorithmic solutions which allow to Trim In-String White Spaces, or reduce the sequence of White Spaces to just a single one.
White Spaces vs Blank Spaces: Important clarification
The term White Space should not be confused with Blank Space (aka Space) symbol corresponding to ASCII 32 or Unicode U+0020 [1]. White Spaces represent much broader category which also includes Line Feed, Tab, Carriage Return, etc [1]. The article provides universal solutions to trim (or reduce) all in-string White Spaces, and also several algorithms pertinent to Blank Spaces selected as the most common practical examples. The same principles and algorithmic solutions could be applied to other White Spaces as well.
Extension to String.Trim() Method
Standard .NET/C# Library contains Trim()
, TrimStart() and TrimEnd() methods allowing to strip leading/trailing, or both White Spaces, but there is no method capable of trimming in-string White Spaces. Proposed solution extends the standard functionality allowing to either trim in-string White Spaces (corresponding to the newly introduced TrimWithin()
method) or reduce the sequence of White Spaces to just a single one (another newcomer titled TrimWithin2one()
method). Multiple coding techniques have been tested/benchmarked in order to find the most efficient algoritmic solution. So far, two algoritms based on String-to-Char[] conversion [5...7] found to be the most efficient ones (see Listing 1 and 2), recommended for practical use. Other algorithms are listed mostly for didactic purpose.
Safe code
Proposed solution implements safe/managed code base, thus it does NOT include any pointers arithmetic and/or any low level function API calls. It's relevant to mention, that unsafe code may demonstrate some marginal performance improvement (refer to the discussion in [5]), but, by the very definition, it is unsafe and, therefore, not recommended for use.
Performance estimates: sample test results
Sample test performed on the local PC (i3 Core Duo 3220 w/Kingston HyperX 1600MHz DDR3) using following multilingual Unicode test string w/multiple Blank Spaces added intentionally, sample performance benchmarks estimates follow (note: highlighted algos 1 and 2 demonstrate best performance, i.e. are recommended):
TEST STRING
MY WAY OR WIFI ! VERSTEHEN, АННА ? ХОРОШО ГРАФ, BAZAARU НЕТ ! Εντάξει !
TRIM ALL IN-STRING WHITE SPACES (RESULT STRING)
MYWAYORWIFI!VERSTEHEN,АННА?ХОРОШОГРАФ,BAZAARUНЕТ!Εντάξει!
REDUCE IN-STRING WHITE SPACES SEQUENCE TO A SINGLE ONE (RESULT STRING)
MY WAY OR WIFI ! VERSTEHEN, АННА ? ХОРОШО ГРАФ, BAZAARU НЕТ ! Εντάξει !
BENCHMARKS
ALGORITHM | RUNTIME, μs | ELAPSED, ms | ITERATIONS | NOTES |
1). | TrimWithin | 0.9 | 942 | 1,000,000 | Trim In-String White Spaces |
2). | TrimWithin2one | 1.2 | 1,224 | 1,000,000 | Reduce In-String White Spaces to 1 |
3). | String.Replace method: | 1.3 | 1,314 | 1,000,000 | Trim In-String Blank Spaces |
4). | String Split/Concat algo: | 1.4 | 1,371 | 1,000,000 | Trim In-String Blank Spaces |
5). | String Split/Join algo: | 1.4 | 1,366 | 1,000,000 | Trim In-String Blank Spaces |
6). | Trim2one algo: | 1.8 | 1,817 | 1,000,000 | Reduce In-String Blank Spaces to 1 |
Background
This article [1] inspired by the original article 'Fastest method to trim all whitespace from Strings in .NET' posted by Felipe R. Machado [2]. Some algos are included mostly for the 'academic' interest w/limited practical value demonstrated rather poor performance (namely, the algos based on Regex and Linq); therefore, they are excluded from the current consideration. The following code snippets allows accurate benchmarking of different algorithms based on the following coding techniques:
- String-to-Char[] and Char[]-to-String technique (the fastest/recommended one)
- String.Replace() method as suggested by Chris Maunder
- String.Split() in conjunction with String.Join() included in the original article [2]
- String.Split() in conjunction with String.Concat() as per [4]
Using the code
Listing 1 includes the code snippet pertinent to the most efficient (so far) in-string White Space TrimWithin()
algorithm and corresponding benchmark. It could be used in conjunction with standard Trim()
or TrimStart()
methods removing the leading White Space.
Listing 1: TrimWithin() method to trim all in-string White Spaces from any Unicode/ASCII String
#region TrimWithin() Algo/Benchmark
public string TrimWithin(string InputString)
{
char[] _arInput;
char[] _arBuffer;
string _result = String.Empty;
try
{
if (String.IsNullOrEmpty(InputString) || String.IsNullOrWhiteSpace(InputString) )
{ throw new ArgumentNullException("Empty String"); }
_arInput = InputString.ToCharArray();
_arBuffer = new char[_arInput.Length];
int _idx = 0;
for (int i = 0; i < _arInput.Length; i++)
{
if (!Char.IsWhiteSpace(_arInput[i]))
{
_arBuffer[_idx] = (_arInput[i]);
_idx++;
}
}
return new String(_arBuffer, 0, _idx);
}
catch { throw; }
finally { _arInput = null; _arBuffer = null; }
}
public double TrimWithinBenchmark(string InputString,
Int64 LoopCounter)
{
string _result = String.Empty;
Stopwatch _sw = new Stopwatch();
_sw.Start();
for (Int64 i = 0; i < LoopCounter; ++i)
{
char[] _arInput = InputString.ToCharArray();
char[] _arBuffer = new char[_arInput.Length];
int _idx = 0;
for (int j = 0; j < _arInput.Length; j++)
{
if (!Char.IsWhiteSpace(_arInput[j]))
{
_arBuffer[_idx] = (_arInput[j]);
_idx++;
}
}
_result = (new String(_arBuffer, 0, _idx));
}
_sw.Stop();
Int64 _ticks = _sw.ElapsedTicks;
return Math.Round(TicksToMillisecond(_ticks), 2);
}
#endregion
Listing 2 shown below includes the code snippet pertinent to the most efficient (so far) in-string White Space TrimWithin2one()
algorithm and corresponding benchmark. The algoritms reduces any in-string sequence of the same White Spaces into a single one. It could also be used in conjunction with standard Trim()
method removing the leading/trailing White Spaces.
Important clarification: there is another algo proposed in comments thread by CP member Alexander Chernosvitov, which implements a different logic, namely: trimming a sequence of any White Spaces to the first one found in sequence, thus pertinent to the sequence of 5 blanks followed by 5 tabs the algo below will produce one blank and one tab char, while the proposed alternative will produce just one blank ignoring tab char
Listing 2: TrimWithin2one() method to reduce in-string sequence of White Spaces to a single one
#region TrimWithin2one
public string TrimWithin2one(string InputString)
{
char[] _arInput;
char[] _arBuffer;
string _result = String.Empty;
try
{
if (String.IsNullOrEmpty(InputString) || String.IsNullOrWhiteSpace(InputString) )
{ throw new ArgumentNullException("Empty String"); }
_arInput = InputString.ToCharArray();
_arBuffer = new char[_arInput.Length];
_arBuffer[0] = _arInput[0];
int _idx = 1;
for (int i = 1; i < _arInput.Length; i++)
{
if (!Char.IsWhiteSpace(_arInput[i]) ||
(Char.IsWhiteSpace(_arInput[i]) && _arInput[i] != _arBuffer[_idx - 1]))
{ _arBuffer[_idx] = (_arInput[i]); _idx++; }
}
return new String(_arBuffer, 0, _idx);
}
catch { throw; }
finally { _arInput = null; _arBuffer = null; }
}
public double TrimWithin2oneBenchmark(string InputString,
Int64 LoopCounter)
{
string _result = String.Empty;
Stopwatch _sw = new Stopwatch();
_sw.Start();
for (Int64 i = 0; i < LoopCounter; ++i)
{
char[] _arChars = InputString.ToCharArray();
char[] _buffer = new char[_arChars.Length];
_buffer[0] = _arChars[0];
int pos = 1;
for (int j = 1; j < _arChars.Length; j++)
{
if (!Char.IsWhiteSpace(_arChars[j]) ||
(Char.IsWhiteSpace(_arChars[j]) && _arChars[j] != _buffer[pos-1] ))
{ _buffer[pos] = (_arChars[j]); pos++; }
}
_result = (new String(_buffer, 0, pos));
}
_sw.Stop();
Int64 _ticks = _sw.ElapsedTicks;
return Math.Round(TicksToMillisecond(_ticks), 2);
}
#endregion
Points of Interest
Included mostly for didactic purpose, three Blank Spaces Trimming Algorithms in .NET/C# managed code are included in the following snippet:
Listing 3: Trim out all Blank Spaces algos
using System;
using System.Diagnostics;
namespace StringOp
{
public class StringTrim
{
#region TrimUsingReplace
public double TrimUsingReplace(string InputString,
string OldString,
string NewString,
Int64 LoopCounter)
{
string _result = String.Empty;
Stopwatch _sw = new Stopwatch();
_sw.Start();
for (Int64 i = 0; i < LoopCounter; ++i)
{
_result=InputString.Replace(OldString, NewString);
}
_sw.Stop();
Int64 _ticks = _sw.ElapsedTicks;
return Math.Round(TicksToMillisecond(_ticks), 2);
}
#endregion
#region TrimUsingSplitJoin
public double TrimUsingSplitJoin(string InputString,
string Separator,
Int64 LoopCounter)
{
string _result = String.Empty;
Stopwatch _sw = new Stopwatch();
_sw.Start();
for (Int64 i = 0; i < LoopCounter; ++i)
{
_result = String.Join(String.Empty,InputString.Split(new string[]{Separator},
StringSplitOptions.RemoveEmptyEntries));
}
_sw.Stop();
Int64 _ticks = _sw.ElapsedTicks;
return Math.Round(TicksToMillisecond(_ticks), 2);
}
#endregion
#region TrimUsingSplitConcat
public double TrimUsingSplitConcat(string InputString,
string Separator,
Int64 LoopCounter)
{
string _result = String.Empty;
Stopwatch _sw = new Stopwatch();
_sw.Start();
for (Int64 i = 0; i < LoopCounter; ++i)
{
_result = String.Concat(InputString.Split(new string[] { Separator },
StringSplitOptions.RemoveEmptyEntries));
}
_sw.Stop();
Int64 _ticks = _sw.ElapsedTicks;
return Math.Round(TicksToMillisecond(_ticks), 2);
}
#endregion
internal double TicksToMillisecond(Int64 Ticks)
{
double _msPerTick = (double)1000 / Stopwatch.Frequency;
return Ticks * _msPerTick;
}
}
}
The testing website [2] provides numeric performance estimates (benchmarks) pertinent to each algo. Enter the arbitrary text, select the number of iterations from Drop-down list and click on TRIM button to get the benchmarks estimates. Unicode is OK, but plz limit text to nvarchar(100). Thx! :)
COLLAPSE N BLANK SPACES INTO ONE USING STRING.REPLACE
Listing 4: Trim up to 14 consecutive white spaces into 1
#region Trim2one
public double Trim2oneBenchmark(string InputString,
Int64 LoopCounter)
{
const string _whiteSpc1 = " ";
string _whiteSpc2 = _whiteSpc1 + _whiteSpc1;
string _whiteSpc3 = _whiteSpc2 + _whiteSpc1;
string _whiteSpc4 = _whiteSpc3 + _whiteSpc1;
string _whiteSpc8 = _whiteSpc4 + _whiteSpc4;
string _result = String.Empty;
Stopwatch _sw = new Stopwatch();
_sw.Start();
for (Int64 i = 0; i < LoopCounter; ++i)
{
_result= InputString.Replace(_whiteSpc8, _whiteSpc1).
Replace(_whiteSpc4, _whiteSpc1).
Replace(_whiteSpc3, _whiteSpc1).
Replace(_whiteSpc2, _whiteSpc1);
}
_sw.Stop();
Int64 _ticks = _sw.ElapsedTicks;
return Math.Round(TicksToMillisecond(_ticks), 2);
}
public string Trim2one(string InputString)
{
const string _whiteSpc1 = " ";
string _whiteSpc2 = _whiteSpc1 + _whiteSpc1;
string _whiteSpc3 = _whiteSpc2 + _whiteSpc1;
string _whiteSpc4 = _whiteSpc3 + _whiteSpc1;
string _whiteSpc8 = _whiteSpc4 + _whiteSpc4;
return InputString.Replace(_whiteSpc8, _whiteSpc1).
Replace(_whiteSpc4, _whiteSpc1).
Replace(_whiteSpc3, _whiteSpc1).
Replace(_whiteSpc2, _whiteSpc1);
}
#endregion
Note: the code snippet included in Listing 5 implements a string composition shown below for a didactic purpose (better readability) and also to avoid any possibility of mistyping a long string of blanks.
Listing 4a: Sequence of white spaces composed of vars
const string _whiteSpc1 = " ";
string _whiteSpc2 = _whiteSpc1 + _whiteSpc1;
string _whiteSpc3 = _whiteSpc2 + _whiteSpc1;
string _whiteSpc4 = _whiteSpc3 + _whiteSpc1;
string _whiteSpc8 = _whiteSpc4 + _whiteSpc4;
In a production version some performance improvement could be achieved by replacing the string vars
w/const
(as mentioned in comment thread):
Listing 4b. string const containing sequences of blank spaces (1, 2, 3, 4, 8)
const string _whiteSpc1 = " ";
const string _whiteSpc2 = " ";
const string _whiteSpc3 = " ";
const string _whiteSpc4 = " ";
The algorithm will trim up to 14 consequitive white spaces into one using String.Replace() method. To better understand the logic behind this algo, refer to the following transition diagram, which demonstrates the white spaces reduction process.
White Space Reduction algo using String.Replace() method
1st Replace(8-to-1), white space reduction
2->2
3->3
4->4
5->5
6->6
7->7
8->1 (done)
9->2
10->3
11->4
12->5
13->6
14->7
2nd Replace(4-to-1), white space reduction
2->2
3->3
4->1 (done)
5->2
6->3
7->4
8->1 (done)
9->2
10->3
11->1 (done)
12->2
13->3
14->4
3rd Replace (3-to-1), white space reduction
2->2
3->1 (done)
4->1 (done)
5->2
6->1 (done)
7->2
8->1 (done)
9->2
10->1 (done)
11->1 (done)
12->2
13->1 (done)
14->2
4th Replace(2-to-1), white space reduction
2->1 (done)
3->1 (done)
4->1 (done)
5->1 (done)
6->1 (done)
7->1 (done)
8->1 (done)
9->1 (done)
10->1 (done)
11->1 (done)
12->1 (done)
13->1 (done)
14->1 (done)
This algoritmic solution could be further extended to apply to any N consecutive Blank Spaces.
History
- Original article posted [1]
- This alternative version posted
- Trim2one algorithm added
- Trim ALL White Spaces using String-toChar[] algo added
- Reduce ALL White Spaces to 1 using String-toChar[] algo added
- Article restructured, code base/benchmarks updated
References
- Whitespace character (wiki)
- Felipe R. Machado, Fastest method to trim all whitespace from Strings in .NET
- String.Trim Method (MSDN)
- A.Bell, Efficient string concatenation algorithm implements String.Join() method
- Ayoola-Bell, How to Toggle String Case in .NET
- String.ToCharArray Method (MSDN)
- Char.IsWhiteSpace Method (MSDN)