Introduction
One of the problems I encountered a while ago was that when searching text, "A" will not match "À", or "Á" or "Ä" or "Â" or indeed any other characters which include diacritics (which is the printers term for the little accent marks which sit above some characters in many languages). This means that the same text as entered by a native German speaker will not match the text entered by a native English speaker. This can be a pain, and limit the usefulness of the search.
I was reminded of this when I answered a QA question on writing a regex to cope with international names...
Background
I did not write this code; this code is taken (as described in the code comments) from Micheal Kaplans Blog - all I did was respace it and convert it to an extension method. However, I felt this needed a wider audience than it was getting, and should be where it gets searched more easily.
I am not going to try to describe how it works, as the original blog does that in more detail than I'd want to go into! (And probably a lot more accuracy... )
Using the Code
Include the code in a static
class of your own, or download the source and add it to your project.
public static String RemoveDiacritics(this String s)
{
String normalizedString = s.Normalize(NormalizationForm.FormD);
StringBuilder stringBuilder = new StringBuilder();
for (int i = 0; i < normalizedString.Length; i++)
{
Char c = normalizedString[i];
if (CharUnicodeInfo.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark)
{
stringBuilder.Append(c);
}
}
return stringBuilder.ToString().Normalize(NormalizationForm.FormC);
}
You can then use the method as you would any string
extension method:
string match = tbUserInput.Text.ToLower().RemoveDiacritics();
if (string.IsNullOrWhiteSpace(match))
{
...
}
History