Introduction
This is simply and shortly a fast splitter function that combines the classic Split
function that splits an expression with a character or a stream of characters, and my new one which handles a stream of characters as a single separator.
Why New
Have you ever tried to make File Sender Program with resume support? You have to, for example, make a simple protocol having the File ID, packet number & packet data. Now, what separator you can use to split these data? If you tried to use a single character, then you put yourself in a risk that the packet might have by chance the same characters in the same order. Then your program is crashed. Now, you can come up with a stream of characters that has the least possibility of occurrence, to split your packet with, and let's assume it will be '(::)'. The problem is when you use the ordinary Split
function, it matches any of the characters entered as a single splitter.
Example
Expression: Hi(::)How are you? :)I hope you are fine(::)
Output of ordinary Split:
- Hi
- Empty String
- Empty String
- Empty String
- How are you?
- Empty String
- I hope you are fine
- Empty String
- Empty String
- Empty String
- Empty String
Output of my Split:
- Hi
- How are you? :)I hope you are fine
Usage
Function Header
public static string[] Split (string Expression ,
string Delimiter, bool SingleSeparator,
int Count , ComparisonMethod Compare)
Expression
: Expression to split.
Delimiter
: String to split with.
SingleSeparator
: true
to consider the Delimiter
characters as a single separator, false
to execute the ordinary Split
.
Count
: Number of tokens to split from this Expression
.
ComparisonMethod
: Value indicates if delimiter matching is case sensitive or not.
Split Module Code
namespace Infinity
{
public enum ComparisonMethod
{
Binary = 0,
Text = 1
}
namespace StringSplitter
{
public class CSplitter
{
private static string m_Expression ;
private static string m_Delimiter ;
public CSplitter()
{
}
private static bool
isValidDelimiterBinary(int StringIndex,
int DelimiterIndex )
{
if (DelimiterIndex == m_Delimiter.Length) return true;
if (StringIndex == m_Expression.Length) return false;
if (m_Expression[StringIndex] ==
m_Delimiter[DelimiterIndex])
return isValidDelimiterBinary(StringIndex + 1,
DelimiterIndex + 1);
else
return false;
}
private static bool
isValidDelimiterText(int StringIndex,
int DelimiterIndex )
{
if (DelimiterIndex == m_Delimiter.Length) return true;
if (StringIndex == m_Expression.Length) return false;
if (Char.ToLower(m_Expression[StringIndex])
== Char.ToLower(m_Delimiter[DelimiterIndex]))
return isValidDelimiterText(StringIndex + 1,
DelimiterIndex + 1);
else
return false;
}
public static string[] Split(string Expression,
string Delimiter, bool SingleSeparator,
int Count, ComparisonMethod Compare)
{
m_Expression = Expression;
m_Delimiter = Delimiter;
System.Collections.ArrayList Tokens =
new System.Collections.ArrayList ();
if (!SingleSeparator)
if (Count >=0)
return Expression.Split(Delimiter.ToCharArray(), Count);
else
return Expression.Split(Delimiter.ToCharArray());
if (Count ==0)
return new string [0];
else
if (Count == 1)
return new string [] {Expression};
else
Count --;
int i ;
int iStart = 0 ;
if (Compare == ComparisonMethod.Binary)
{
for (i = 0 ; i < Expression.Length ; i++)
{
if (isValidDelimiterBinary(i, 0))
{
Tokens.Add (Expression.Substring(iStart,
i - iStart));
i += Delimiter.Length - 1;
iStart = i + 1;
if (Tokens.Count == Count && Count >= 0) break;
}
}
}
else
{
for (i = 0 ; i < Expression.Length ; i++)
{
if (isValidDelimiterText(i, 0))
{
Tokens.Add (Expression.Substring(iStart,
i - iStart));
i += Delimiter.Length - 1;
iStart = i + 1;
if (Tokens.Count == Count && Count >= 0) break;
}
}
}
string LastToken = "";
if (iStart < Expression.Length)
{
LastToken = Expression.Substring(iStart,
Expression.Length - iStart);
if(LastToken == Delimiter)
Tokens.Add (null);
else
Tokens.Add (LastToken);
}
else
if (Tokens.Count == 0) Tokens.Add (Expression);
return (string [])
Tokens.ToArray(Type.GetType("System.String"));
}
}
}
}
Code in Details
Comparison Method Enumeration
public enum ComparisonMethod
{
Binary = 0,
Text = 1
}
Used to specify if the matching is case sensitive (Binary
) or not (Text
).
CSplitter members:
private static string m_Expression ;
private static string m_Delimiter ;
Those variables I have made because I need them in Delimiter Matching function. And it�s not logical to send them as parameters every time I call those methods. So, I added them only once in the global section, and I pass only the indices of them as you can see below.
isValidDelimiterBinary Function
private static bool isValidDelimiterBinary(int StringIndex, int DelimiterIndex )
{
if (DelimiterIndex == m_Delimiter.Length) return true;
if (StringIndex == m_Expression.Length) return false;
if (m_Expression[StringIndex] == m_Delimiter[DelimiterIndex])
return isValidDelimiterBinary(StringIndex + 1, DelimiterIndex + 1);
else
return false;
}
This function is a recursive function used to take an Expression start index and Delimiter start index. This has the whole trick as I think; first, let�s go there line by line:
if (DelimiterIndex == m_Delimiter.Length) return true;
if (StringIndex == m_Expression.Length) return false;
Those are 2 stop conditions:
First, one terminates the function if ALL the delimiter characters are checked and matched and returns true
. The other one returns false
if delimiter checking isn�t finished yet, but we reached the end of the expression, so it returns false
.
if (m_Expression[StringIndex] == m_Delimiter[DelimiterIndex])
return isValidDelimiterBinary(StringIndex + 1, DelimiterIndex + 1);
else
return false;
If the current character of the expression matches the current character of the Delimiter
, then call the function again with indices incremented by 1
. When you call it from the main module, all you have to do is to send the start index you want matching to start from, & 0 as the DelimiterIndex
to start from first character in delimiter.
bool res = isValidDelimiterText(i, 0);
isValidDelimiterText Function
It�s the same function exactly, but it is matched case insensitive way. I preferred to write two functions instead of checking whether user wants to match case sensitive or not every time I loop over expression characters. The only difference is this part in matching.
Char.ToLower(m_Expression[StringIndex]) ==
Char.ToLower(m_Delimiter[DelimiterIndex])
Here, I converted the two characters to lowercase to check them. Someone might ask me: Why you didn�t convert the whole string just one time to a temporary string or such, and work with it? Well, that�s a good idea. But the problem is that I loop once to convert them, and the second time to match them, and that�s not efficient. Another thing, Imagine a user sending a long string (30000 characters for instance) and he only wants two elements back. You will convert ALL the string while you might have the first separator which you need in the first 100 character? I guess this will be a performance disaster. :)
Split Function
Now, we go to the main function that does it all: first thing, we update the m_Expression
and m_Delimiter
member variables with the entered data.
m_Expression = Expression;
m_Delimiter = Delimiter;
System.Collections.ArrayList Tokens = new System.Collections.ArrayList();
This is an ArrayList
to hold the tokenized data. We use it because you need fast, dynamic String
-array convertible Object
to hold the data.
SingleSeparator Parameter Handling
if (!SingleSeparator)
if (Count >=0)
return Expression.Split(Delimiter.ToCharArray(), Count);
else
return Expression.Split(Delimiter.ToCharArray());
This part checks if the user wants to use the regular split
method or not. And if he wants to use the regular method, did he add the Count
member or not?
Count Parameter Handling
if (Count ==0)
return new string [0];
else
if (Count == 1)
return new string [] {Expression};
else
Count--;
This part handles the Count
parameters special cases as the following:
Count
= 0. Return an empty string
Count
= 1. Return the original string.
- Else, decrement
Count
with one, this will be explained later.
The Main Loop
int i ;
int iStart = 0 ;
if (Compare == ComparisonMethod.Binary)
{
for (i = 0 ; i < Expression.Length ; i++)
{
if (isValidDelimiterBinary(i, 0))
{
Tokens.Add (Expression.Substring(iStart, i - iStart));
i += Delimiter.Length - 1;
iStart = i + 1;
if (Tokens.Count == Count && Count >= 0) break;
}
}
}
else
{
for (i = 0 ; i < Expression.Length ; i++)
{
if (isValidDelimiterText(i, 0))
{
Tokens.Add (Expression.Substring(iStart, i - iStart));
i += Delimiter.Length - 1;
iStart = i + 1;
if (Tokens.Count == Count && Count >= 0) break;
}
}
}
Both parts of the if
condition are the same, the only difference is one of them calls the isValidDelimiterText
and the other part calls the isValidDelimiterBinary
function. I will explain the Then part of the if
condition (The binary matching):
for (i = 0 ; i < Expression.Length ; i++)
{
if (isValidDelimiterBinary(i, 0))
{
Tokens.Add (Expression.Substring(iStart, i - iStart));
i += Delimiter.Length - 1;
iStart = i + 1;
if (Tokens.Count == Count && Count >= 0) break;
}
}
This part does the loop thing. I used a for
loop not an enumerator because I need to have an indexer to work with it. Yes, I might use the enumerator with an indexer incremented manually, but why more processing? :) Before we start, consider the string in the Demo Project: a(::)b(::)c()(::) (::)(::), we will split it by (::) characters. Now, we check if the current Expression
character is the first of a stream of the Delimiter
characters or not.
if (isValidDelimiterBinary(i, 0))
If yes, we do the following
Tokens.Add (Expression.Substring(iStart, i - iStart));
Add characters from the start index to the character prior to the current character. So, for example: for the first delimiter found: i
= 1 and iStart
= 0, then string returned would be �a
�.
i += Delimiter.Length - 1;
Update the indexer i
and make it jump over the delimiter characters.
iStart = i + 1;
Update the next token start index iStart
and make it point to the next character after the delimiter characters (Will be �b
� in our case).
if (Tokens.Count == Count && Count >= 0) break;
This part checks if user asked for limited number of tokens, so we stop before the token number (Count
) ends by one (we decremented it above). That is because we have to include the last part of the string at the last index of the limited array returned.
Remaining Characters Check
Now, we have finished the loop. Let�s see if there�re still remaining characters. If there are remaining characters, then we check and see if they are another delimiter. Then we add null
string, else we add the remaining characters. If there is no remaining characters, then we check if there is a token returned or not, if no tokens returned, then add the whole string as one single token.
string LastToken = "";
if (iStart < Expression.Length){
LastToken = Expression.Substring(iStart, Expression.Length - iStart);
if(LastToken == Delimiter)
Tokens.Add (null);
else
Tokens.Add (LastToken);
}
else
if (Tokens.Count == 0) Tokens.Add (Expression);
Return Array Of strings
Then at last, return the tokens as an array of string
to the user.
return (string [])Tokens.ToArray(Type.GetType("System.String"));
Disclaimer
This code is free for personal use. However, if you are going to use it for commercial purposes, you need to purchase a license.