This contribution comes from this forum question[^], and my
unefficient answer[^].
So we want to remove some diacritical marks[^] in a Unicode string, for instance change occurrences of àáảãạăằắẳẵặâầấẩẫậ to plain a, with the help of C++0x[^] as implemented in VC2010.
For that let's define a C array of const wchar_t*
with the first character being the replacement character and the next ones being the characters to replace:
const wchar_t* pchangers[] =
{
L"aàáảãạăằắẳẵặâầấẩẫậ",
L"AÀÁẢÃẠĂẰẮẲẴẶÂẦẤẨẪẬ",
L"OÒÒÓỎÕỌÔỒỐỔỖỘƠỜỚỞỠỢ",
L"EÈÉẺẼẸÊỀẾỂỄỆ",
L"UÙÚỦŨỤƯỪỨỬỮỰ",
L"IÌÍỈĨỊ",
L"YỲÝỶỸỴ",
L"DĐ",
L"oòóỏõọôồốổỗộơờớởỡợ",
L"eèéẻẽẹêềếểễệ",
L"uùúủũụưừứửữự",
L"iìíỉĩị",
L"yỳýỷỹỵ",
L"dđ"
};
The following
CharMap
class is constructed from a
std::vector<std::wstring>
of such strings and uses it to populate it's
std::map<wchar_t, wchar_t> charmap
member, with keys being
characters after first and values being
first character:
#include <map>
#include <vector>
#include <string>
#include <algorithm>
#include <iterator>
class CharMap
{
std::map<wchar_t, wchar_t> charmap;
public:
CharMap(const std::vector<const std::wstring>& changers)
{
std::for_each(changers.begin(), changers.end(), [&](const std::wstring& changer){
std::transform(changer.begin() + 1, changer.end(), std::inserter(charmap, charmap.end()), [&](wchar_t wc){
return std::make_pair(wc, changer[0]);});
});
}
std::wstring operator()(const std::wstring& in)
{
std::wstring out(in.length(), L'\0');
std::transform(in.begin(), in.end(), out.begin(), [&](wchar_t wc) ->wchar_t {
auto it = charmap.find(wc);
return it == charmap.end() ? wc : it->second;});
return out;
}
};
The
std::wstring CharMap::operator()(const std::wstring& in)
constructs a
std::wstring out
from
in
, changing all
characters to replace in
in
to their
replacement character in
out
and returns
out
.
Now let's just put it at work:
#include <iostream>
std::vector<const std::wstring> changers(pchangers, pchangers + sizeof pchangers / sizeof (wchar_t*));
int main()
{
std::wcout << CharMap(changers)(L" người mình.mp3 ") << std::endl;
return 0;
}
Kind of demonstration of the power of C++0x isn't it?
If you have pasting problems with Unicode strings, download the full code CharMap.zip (1 KB).
cheers,
AR