Re: get wide character and multibyte character value - C / C++ / MFC Discussion Boards

Cedric Moonen23-Jan-08 22:16

23-Jan-08 22:16

Yes, that's right. I didn't think about that (sometimes you are just too concentrated on one part of your problem Poke tongue | ;-P

and fail to see other paths). One little annoying thing is that, when I change the year or month I'll need to "set back" the day, hour, minute and second parts. But that's a small price to pay for a working solution Big Grin | :-D

Thanks !

Cédric Moonen
Software developer

Charting control [v1.2]

Re: Manipulating COleDateTime objects

CPallini23-Jan-08 22:35

CPallini

23-Jan-08 22:35

You know, you're welcome.
Smile | :)

If the Lord God Almighty had consulted me before embarking upon the Creation, I would have recommended something simpler.
-- Alfonso the Wise, 13th Century King of Castile.

[my articles]

Re: Manipulating COleDateTime objects

David Crow24-Jan-08 3:12

David Crow

24-Jan-08 3:12

Cedric Moonen wrote:
Suppose my date is 31 November 2007 and I would like to add two months to it..

The problem is that a month does not have a constant value.

Also, suppose you wanted to add three months to the date. Since February does not have 31 days, do you stop at February 28, or go 3 days into March? If the latter, it looks odd since March would then be four months away.

"Normal is getting dressed in clothes that you buy for work and driving through traffic in a car that you are still paying for, in order to get to the job you need to pay for the clothes and the car and the house you leave vacant all day so you can afford to live in it." - Ellen Goodman

"To have a respect for ourselves guides our morals; to have deference for others governs our manners." - Laurence Sterne

Re: Manipulating COleDateTime objects

Cedric Moonen24-Jan-08 3:29

Cedric Moonen

24-Jan-08 3:29

DavidCrow wrote:
The problem is that a month does not have a constant value.

Yes I know and that's what complicate my problem Smile | :)

DavidCrow wrote:
Also, suppose you wanted to add three months to the date. Since February does not have 31 days, do you stop at February 28, or go 3 days into March? If the latter, it looks odd since March would then be four months away.

Well, I see now that my example was very badly chosen. In fact, when I increment a date by a certain amount of months, it will always start at the 1st of the month (so, November 1st + 3 months = February 1st).

But CPallini's answer is the way to go I suppose (didn't implement it yet).

Cédric Moonen
Software developer

Charting control [v1.2]

Re: Manipulating COleDateTime objects

David Crow24-Jan-08 5:24

David Crow

24-Jan-08 5:24

Cedric Moonen wrote:
But CPallini's answer is the way to go I suppose (didn't implement it yet).

It looks to be a viable solution:

COleDateTime AddMonths( const COleDateTime dt, const int nMonthsToAdd )
{
    COleDateTime    dateTemp;
    int             nMonth = (dt.GetMonth() - 1) + nMonthsToAdd,
                    nYear  = dt.GetYear(),                    
                    x = 0;
 
    dateTemp.SetStatus(COleDateTime::invalid);
 
    while (dateTemp.GetStatus() == COleDateTime::invalid && x < 4)
    {
        dateTemp.SetDateTime(nYear + (nMonth / 12), 
                             (nMonth % 12) + 1, 
                             dt.GetDay() - x, 
                             dt.GetHour(),
                             dt.GetMinute(),
                             dt.GetSecond());
 
        x++;
    }
 
    return dateTemp;
}

get wide character and multibyte character value

George_George23-Jan-08 20:10

George_George

23-Jan-08 20:10

Hello everyone,

I need to know the wide character (unicode) and multibyte (UTF-8) values of a character string of czech. I personally know nothing about czech. Is the following approach correct?

1. I use L on the character string and watch memory to get the wide character representation of the character string in little endian form;

2. I change the computer region/language to czech, and use function WideCharToMultiByte, and use CP_ACP as input code page and use the L character string as input to get the output multibyte character string output from parameter lpMultiByteStr.

Is (1) and (2) correct? Any more efficient and smart ways?

thanks in advance,
George

Re: get wide character and multibyte character value

Nitheesh George23-Jan-08 22:49

Nitheesh George

23-Jan-08 22:49

try this

wcstombs function.

i think this will help you.

Re: get wide character and multibyte character value

George_George23-Jan-08 23:28

George_George

23-Jan-08 23:28

Hi Nitheesh,

Is this function relates to my question? I think I am using almost the same approach. Could you describe the steps to get wide character and multibyte value for a given czech character string please? Smile | :)

regards,
George

Re: get wide character and multibyte character value

CPallini23-Jan-08 23:34

CPallini

23-Jan-08 23:34

Who gives you a czech string?
Big Grin | :-D

If the Lord God Almighty had consulted me before embarking upon the Creation, I would have recommended something simpler.
-- Alfonso the Wise, 13th Century King of Castile.

[my articles]

Re: get wide character and multibyte character value

George_George24-Jan-08 1:31

George_George

24-Jan-08 1:31

Hi CPallini,

I am parsing some information of multi-language. Could we come back to the original question please? Smile | :)

Any ideas?

regards,
George

Re: get wide character and multibyte character value

CPallini24-Jan-08 1:47

CPallini

24-Jan-08 1:47

Well, IMHO if you have the string then you either already have the Unicode or the Multibyte format of it, hence you need only one conversion.
BTW What's the difficult about examininig a character of a string?

If the Lord God Almighty had consulted me before embarking upon the Creation, I would have recommended something simpler.
-- Alfonso the Wise, 13th Century King of Castile.

[my articles]

Re: get wide character and multibyte character value

George_George24-Jan-08 2:00

George_George

24-Jan-08 2:00

Thanks CPallini,

Sorry that I may not make myself understood enough. I only have the literal format. The strings are something like MÍST, I want to get the wide character binary value and the multibyte binary value,

1. To get the wide character binary value, I use L"MÍST" and use debug mode to watch its internal buffer in Visual Studio.

2. To get the multibyte (UTF-8) binary value, I use WideCharacterToMultibyte API to convert L"MÍST" to multibyte value;

Is (1) and (2) correct solution?

regards,
George

Re: get wide character and multibyte character value

CPallini24-Jan-08 2:19

CPallini

24-Jan-08 2:19

George_George wrote:
1. To get the wide character binary value, I use L"MÍST" and use debug mode to watch its internal buffer in Visual Studio.

Fine. Since wide chars are unsigned shorts, I'm sure you're also able to figure out how programmatically find the encoded values.

George_George wrote:
To get the multibyte (UTF-8) binary value, I use WideCharacterToMultibyte API to convert L"MÍST" to multibyte value;

When you use WideCharacterToMultibyte you must be aware that (of course) not all wide string characters can be mapped to the codepage you're specifying (I'm not an expert, but I think CP_UTF8 gives you have unmapped chars than Czech codepage (1250?))
Smile | :)

If the Lord God Almighty had consulted me before embarking upon the Creation, I would have recommended something simpler.
-- Alfonso the Wise, 13th Century King of Castile.

[my articles]

Re: get wide character and multibyte character value

George_George24-Jan-08 14:19

George_George

24-Jan-08 14:19

Thanks CPallini,

1.

CPallini wrote:
Fine. Since wide chars are unsigned shorts, I'm sure you're also able to figure out how programmatically find the encoded values.

I just use the unsigned short value itself (as binary hex value format). You mean we need additional conversion?

2.

CPallini wrote:
but I think CP_UTF8 gives you have unmapped chars than Czech codepage (1250?))

Good point! I missed it. You mean using 1250 as code page value is better?

How do you find the magic value numebr 1250, in some Windows header (.h) file?

regards,
George

Re: get wide character and multibyte character value

CPallini24-Jan-08 21:03

CPallini

24-Jan-08 21:03

George_George wrote:
How do you find the magic value numebr 1250, in some Windows header (.h) file?

here http://www.microsoft.com/globaldev/reference/WinCP.mspx[^], for instance.

Smile | :)

If the Lord God Almighty had consulted me before embarking upon the Creation, I would have recommended something simpler.
-- Alfonso the Wise, 13th Century King of Castile.

[my articles]

Re: get wide character and multibyte character value

George_George24-Jan-08 21:23

George_George

24-Jan-08 21:23

Thanks CPallini,

Good link!

When using WideCharacterToMultiByte or using debugger to see the hex binary value of L string, there is no need to change the language setting of control panel, right?

regards,
George

Re: get wide character and multibyte character value

CPallini24-Jan-08 21:27

CPallini

24-Jan-08 21:27

As I already told you, I'm not an expert, anyway, I think you need to change the codepage on control panel only to render text (using multibyte encoding), hence no need for you.
Smile | :)

If the Lord God Almighty had consulted me before embarking upon the Creation, I would have recommended something simpler.
-- Alfonso the Wise, 13th Century King of Castile.

[my articles]

Re: get wide character and multibyte character value

George_George24-Jan-08 21:47

George_George

24-Jan-08 21:47

Hi CPallini,

1.

CPallini wrote:
hence no need for you.

Why no need for me? If I need to change language settings to get the correct value, I have to and should change. Smile | :)

2.

http://www.microsoft.com/globaldev/reference/WinCP.mspx

For the page you recommended before, there are two terms, SBCS and DBCS. When we map our term multibyte and wide character to them, is multibyte the same as SBCS and wide character the same as DBCS?

regards,
George

Re: get wide character and multibyte character value

CPallini24-Jan-08 21:54

CPallini

24-Jan-08 21:54

George_George wrote:
Why no need for me? If I need to change language settings to get the correct value, I have to and should change.

I don't think so. I think WideCharToMultiByte function keep doing is honest work regardless of your control panel settings (provided you pass it the correct codepage as argument). Hence you can check WideCharToMultiByte result to find out character codes, however you cannot correctly render the latters without setting the proper codepage in your control panel.

George_George wrote:
For the page you recommended before, there are two terms, SBCS and DBCS. When we map our term multibyte and wide character to them, is multibyte the same as SBCS and wide character the same as DBCS?

No. Wide character encoding corrensponds to UNICODE, while DBCS (and SBCS) correnspond to MultiByte.
(Maybe I'm wrong, though, I should tell you again, I'm not an expert about, my common sense is driving me on the argument).

Smile | :)

If the Lord God Almighty had consulted me before embarking upon the Creation, I would have recommended something simpler.
-- Alfonso the Wise, 13th Century King of Castile.

[my articles]

Re: get wide character and multibyte character value

George_George24-Jan-08 22:27

George_George

24-Jan-08 22:27

Thanks CPallini!

My question is answered. Thanks for your patience again! Smile | :)

regards,
George

Re: get wide character and multibyte character value

Nemanja Trifunovic24-Jan-08 15:46

Nemanja Trifunovic

24-Jan-08 15:46

CPallini wrote:
When you use WideCharacterToMultibyte you must be aware that (of course) not all wide string characters can be mapped to the codepage you're specifying (I'm not an expert, but I think CP_UTF8 gives you have unmapped chars than Czech codepage (1250?)

There is 1-1 mapping between UTF-16 (wide chars on Windows) and UTF-8 (both are simply different encoding forms for Unicode character set). On the other hand, there is no 1-1 mapping between Unicode and CP1250.

Conclusion? If you need UTF-8, convert directly from wide char to UTF-8, and don't play with legacy multibyte encodins such as CP1250

Programming Blog

utf8-cpp

Re: get wide character and multibyte character value

George_George24-Jan-08 16:36

George_George

24-Jan-08 16:36

Thanks Nemanja,

Your reply is clear. For the same wide character string in Czech, I am not sure whether converting to Czech codepage (1250) or CP_UTF8 will making information lose ... i.e. resulting in ?? (0x3F) character. Any experiences?

regards,
George

Re: get wide character and multibyte character value

Nemanja Trifunovic25-Jan-08 3:15

Nemanja Trifunovic

25-Jan-08 3:15

George_George wrote:
For the same wide character string in Czech, I am not sure whether converting to Czech codepage (1250) or CP_UTF8 will making information lose ... i.e. resulting in ?? (0x3F) character.

If you know for sure the wide string contains only Czech (or other Central Europian) characters, you can convert it to either CP1250 or UTF-8 without any loss. In any case, you don't need to mess with the system language settings - just use the correct codepage parameter in WideStringToMultiChar.

If you don't know for sure the wide string contains only Central Europian characters, you can still safely convert to UTF-8, but not to CP1250.

Programming Blog

utf8-cpp

Re: get wide character and multibyte character value

George_George25-Jan-08 3:19

George_George

25-Jan-08 3:19

Great Nemanja!

But why not safe with CP1250?

Nemanja Trifunovic wrote:
If you don't know for sure the wide string contains only Central Europian characters, you can still safely convert to UTF-8, but not to CP1250.

regards,
George

Re: get wide character and multibyte character value

Nemanja Trifunovic25-Jan-08 3:26

Nemanja Trifunovic

25-Jan-08 3:26

George_George wrote:
But why not safe with CP1250?

If a wide string contains a non-Central Europian character, say - U+03A8 (Greek capital Psi); it has no representation in CP1250 and will appear as a replacement character (probably question mark) after the conversion.

Programming Blog

utf8-cpp

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.