|
OK, following on from the char -> wchar_t question earlier, does anyone know of a neat STL-friendly way to perform UTF8 encoding/decoding? Currently I use the MS character encoding macros, e.g.:
wstring str = CA2W(utf8_string, CP_UTF8).m_psz;
But I'd like to use something that is a bit more platform independent!
The Rob Blog Google Talk: robert.caldecott
|
|
|
|
|
AFAIK STL doesn't have UTF-8 support because the elements in a basic_string have to all be the same size, and the UTF-8 encodings of characters have variying sizes.
--Mike--
Visual C++ MVP
LINKS~! Ericahist | NEW!! PimpFish | CP SearchBar v3.0 | C++ Forum FAQ
|
|
|
|
|
Never mind. You can store a utf-8 string as a sequence of bytes in std::string, just need to know which functions operate correctly on such a sequence and which not.
As for the conversion, I have been writing an article on platform-independent STL friendly utf-8 string operations for months, but I just can't make myself finish it
My programming blahblahblah blog. If you ever find anything useful here, please let me know to remove it.
|
|
|
|
|
Nemanja Trifunovic wrote: As for the conversion, I have been writing an article on platform-independent STL friendly utf-8 string operations for months, but I just can't make myself finish it
Man, that would be sweet...
The Rob Blog Google Talk: robert.caldecott
|
|
|
|
|
In the Microsoft stl libraries, localization uses C interface. The locale class calls setlocale function. And it cannot work with UTF-8.
From the Remarks section on the setlocale function reference:
"The set of available languages, country/region codes, and code pages includes all those supported by the Win32 NLS API (except code pages that require more than two bytes per character, like UTF-8)."
|
|
|
|
|
Yeah once again about it!
I know, there is a way to do it using crt function mbstowcs. The first thing I do not like about it I have single-byte character string (in the cp1251 encoding for example). The second it's not about stl.
|
|
|
|
|
alabax wrote: Yeah once again about it!
huu ???[^]
how many CP accounts do you have ?
|
|
|
|
|
that's the only
|
|
|
|
|
yeah, that's what i see, so, why did you say "once again about it" ???
|
|
|
|
|
I mean questions like this appear every month on the newsgroups and message boards. =)
|
|
|
|
|
|
like this:
wstring s2w(const string &s)
{
size_t l=s.size()+1;
const char *pc=s.c_str();
wchar_t *pw=new wchar_t[l];
locale loc("");
use_facet<ctype<wchar_t>>(loc).widen(pc,pc+l-1,pw);
return wstring(pw);
}
-- modified at 14:03 Wednesday 12th April, 2006
|
|
|
|
|
Look s about right - proof of the pudding's in the testing, of course!
|
|
|
|
|
OMG! It's such a shame I did new without delete
GC is deep in my mind!
wstring s2w(const string &s)
{
size_t l=s.size()+1;
const char *pc=s.c_str();
wchar_t *pw=new wchar_t[l];
if (pw==0) throw;
locale loc("");
use_facet<ctype<wchar_t>>(loc).widen(pc,pc+l-1,pw);
wstring ws(pw);
delete [] pw;
return ws;
}
|
|
|
|
|
You could always use a vector instead.
vector<wchar_t> pw(l);
...
use_facet<ctype<wchar_t> >(loc).widen(pc,pc+l-1,&pw[0]);
wstring ws(&pw[0]);
The Rob Blog Google Talk: robert.caldecott
|
|
|
|
|
|
You're not freeing the memory used by pw, unless I've missed something?
The Rob Blog Google Talk: robert.caldecott
|
|
|
|
|
locale global_locale;
wstring s2w(const string &s,const locale &loc=global_locale)
{
size_t l=s.length()+1;
const char *pc=s.c_str();
wchar_t *pw=new wchar_t[l];
if (pw==0) throw;
use_facet<ctype<wchar_t>>(loc).widen(pc,pc+l-1,pw);
pw[l-1]=L'\0';
wstring ws(pw);
delete [] pw;
return ws;
}
|
|
|
|
|
Not exception safe
As others suggested, use std::vector or boost::scoped_array instead of new[]-delete[]
My programming blahblahblah blog. If you ever find anything useful here, please let me know to remove it.
|
|
|
|
|
Ok, here comes cute solution
wstring s2w(const string &s,const locale &loc=global_locale)
{
size_t l=s.length();
vector<wchar_t> pw(l+1);
use_facet<ctype<wchar_t>>(loc).widen(&s[0],&s[l],&pw[0]);
wstring ws(&pw[0]);
return ws;
}
widen does not append trailing L'\0'!
The question is: does vector always initialize values to 0?
|
|
|
|
|
alabax wrote: The question is: does vector always initialize values to 0?
Yes, the default value of a simple type using this vector ctor:
explicit vector(size_type count); is 0. For classes, the default ctor is run, if no explicit value is provided in this vector ctor:
vector(size_type count, const T& value); .
--
The Blog: Bits and Pieces
|
|
|
|
|
hi all,
how to acess messages from MSMQ Events when message arrived from source to Destination in OnArrived() function.
thanks,
uday.
uday kiran
|
|
|
|
|
Since you are not having any luck with this question, you might try the microsoft.public.msmq.programming newsgroup.
"Let us be thankful for the fools. But for them the rest of us could not succeed." - Mark Twain
"There is no death, only a change of worlds." - Native American Proverb
|
|
|
|
|
hi all,
i am having two interface. the first interface is name FirstInterface and i have one method in that named one(). Suppose i make another interface named SecondInterface then how can i use the FirstInterface Methods in this Interface. (ie) i have to use one() mehtod of the FirstInterface in SecondInteface.
please let me know if any solution other than Containment/Aggrigation.
thanks
uday.
uday kiran
|
|
|
|
|
It depends on how you've implemented your COM object. If you implement the interfaces using multiple inheritance such as ATL does it's just a matter of calling the method as you would any other member function. If you use the nested class approach, as MFC does, it's a little more complicated (but not much). Another approach would be to do a QueryInterface , call the method, then do a Release - but this is a bit heavy handed. More detail is needed to properly answer this question.
Steve
|
|
|
|