Introduction
This short article talks about the right way of comparing string
s in a C# application. We will try to see what are the various ways in which we can compare the string
s and which one should be or should not be used.
Background
Usually in our applications, when we want to compare two string
s, we use the equality operator. Under most scenarios, this will work properly but still we should know what are the other ways we can do string
comparisons and perhaps achieve better performance and results. So let's say I have a variable str
and I want to check whether its value is equal to "Yes
" or not.
if(str == "Yes")
{
}
else
{
}
The above mentioned operator will do the comparison in a case sensitive manner and it will not consider the current culture. Now if a non case sensitive comparison is required, I have seen most of the developers taking either of the below mentioned approaches.
Either we do this:
if (str.ToLower() == "yes")
{
}
else
{
}
or we do something like:
if (str.ToUpper() == "YES")
{
}
else
{
}
Now this will work fine in most cases and since the immutable nature of the string
will not even modify my original string
, it does involve an extra function call and creation of an extra temporary string
variable (call to ToLower()
or ToUpper()
). And it will not work in case we have this code running in a culture sensitive application and the str
variable might contain some characters that are non-English characters.
So how do we do string
comparison in a way that circumvents all these problems. .NET Framework string
class already takes care of all these scenarios and provides us some functions that will enable us to perform correct and optimal string
comparison in all such scenarios. We will now look into these functions.
Note: We will talk about equality comparison, but all these points will be valid for other comparisons too, i.e., finding the order of string
s, etc.
Using the Code
The very first thing to understand before jumping on the functions is the type of comparisons I might need. I might need a culture sensitive comparison or a non culture sensitive comparison (ordinal comparison). Secondly, I might want a case sensitive information or case insensitive comparison.
Now let us look at what .NET provides us. .NET provides us 3 modes:
CultureInvariant
CurrentCulture
Ordinal
CultureInvariant
The CultureInvariant
mode assumes that all the comparisons will be done in English language and en-US as the culture. This mode interprets characters with reference to a particular alphabet. The alphabets are ordered assuming the en-US as the culture. This mode ultimately can be visualized as using this sort of string to find the order of string: "AaBbCc...".
So in this mode, the sting "CAT
" and "bat
" will be ordered as: "bat
", "CAT
".
CurrentCulture
The second mode CurrentCulture
will arrange the alphabets as arranged in case of Invariant culture to find the order of string
s, only this order will be culture specific.
Also in this mode, the characters are compared using their corresponding counterpart in the other culture, i.e., the German Ä
will be treated as A
of en-US.
Ordinal
The third mode Ordinal simply compares the string
s based on the order of characters. In other words, it simply uses the Unicode value of the characters to find the order. It uses the following reference string
for ordering string
s. Which is nothing but all alphabets ordered as per their Unicode/ASCII values: "ABC...abc..."
. So in this mode, the sting
"CAT
" and "bat
" will be ordered as: "CAT
", "bat
".
Now with this information at hand, let us see what .NET provides us. The String.Equals
and compare
functions have an overloaded version which takes StringComparison
enum
type as the argument. This argument will specify the mode which we want to use for this comparison.
public static bool Equals (string a, string b, StringComparison comparisonType);
This enum
could have these possible values:
CurrentCulture
CurrentCultureIgnoreCase
InvariantCulture
InvariantCultureIgnoreCase
Ordinal
OrdinalIgnoreCase
Looking at each enum
value, it is self explanatory which mode is for which scenario. Still, let us draw a small matrix for the same.
| CaseSensitive | Non Casesensitive |
Culture Sensitive | CurrentCulture | CurrentCultureIgnoreCase |
Non culture sentitive(English en-US) | InvariantCulture | InvariantCultureIgnoreCase |
Order | Ordinal | OrdinalIgnoreCase |
And now, I do the same comparison which we saw above using these modes.
Comparing the string
character to character in a case sensitive manner:
if (String.Equals(str, "Yes", StringComparison.Ordinal) == true)
{
}
else
{
}
Comparing the string
in a non case sensitive manner:
if (String.Equals(str, "Yes", StringComparison.OrdinalIgnoreCase) == true)
{
}
else
{
}
These code snippets will also give us the desired results and perhaps in a little efficient way than the earlier.
Note: The ==
operator is equals to <code>StringComparison
.Ordinal. So in case we need to use this mode, we can simply do away with the ==
operator.
Now let us summarize and see which one should be used when:
CurrentCulture
- Culture specific case sensitive comparison CurrentCultureIgnoreCase
- Culture specific case non-sensitive comparison InvariantCulture
- English only case sensitive comparison InvariantCultureIgnoreCase
- English only non-case sensitive comparison Ordinal
- ASCII/UNICODE value based case sensitive comparison OrdinalIgnoreCase
- ASCII/UNICODE value based non-case sensitive comparison
A Note on StringComparer and StringComparison
A very interesting point of confusion is the possibility of being able to user StringComparer
class for all the similar string
comparisons. This class also has all these six ways of doing the string
comparisons. Important thing to note here is that this Class
also implements comparison interfaces, i.e., IComparer
, IEqualityComparer
, IComparer<String>
.
The StringComparison
that we have discussed so far in this article is an enum
that you we should use while comparing two string
s. So when should we not use this above mentioned approach and go for the StringComparer
class.
The thumb rule is that if only string
comparison is needed, then we should use String
class's methods like String.Equals
which will use the StringComparison enum
to determine which mode should be used for actual comparison. You should use StringComparer
class only when we have some methods which take any one of IComparer
, IEqualityComparer
, IComparer<String>
type as parameters and we need to pass our string
s.
Perhaps, internally the String
class's methods are still using StringComparer
class for actual comparison but from a developer's perspective, following the above guideline should suffice.
Point of Interest
This small article is written for those developers who are still at the start of their career and they are manipulating string
in various forms just to achieve the desired comparison results. We have discussed only the equality operation but comparison operator will also follow the same rules.
History
- 24th August, 2012: First version