Click here to Skip to main content
16,017,608 members
Please Sign up or sign in to vote.
3.40/5 (3 votes)
See more:
C#
public char unicode to Character()
{
int decimal=44;
Byte[] bytes=new byte[2];
bytes[0]=(byte)(decimal);
bytes[1]=0;
int charcount=unicode.GetCharCount(bytes);
har[] chars=new Char[charcount];
chars=unicode.GetChars(bytes);

return chars[0];
}


What I have tried:

the above code works for only charaters with a decimal value upto 255. But I want to get the output for decimal value 3516.
Posted
Updated 26-Mar-16 19:18pm
v2
Comments
BillWoodruff 27-Mar-16 0:31am    
A C# Unicode Char is a wrapper over a UShort Integer (two bytes) which means you have only 65,536 possible value. Transformations from a Char to integer, and from integer to Char, are straightforward.

What is your goal in dealing with a single byte of a Unicode Char ?
Sergey Alexandrovich Kryukov 27-Mar-16 1:31am    
It depends on what is that decimal and how you want to interpret it. Unicode is defined for only 1,112,064 different characters, but decimal has the range -7.9 x 1028 to 7.9 x 1028, approximately. So, in general case, it's wrong. :-)

You primary mistake is that you probably thing that Unicode is some kind of encoding. It is not.

—SA

1 solution

If by "Unicode" you mean a character code point, you can use Char.ConvertFromUtf32 Method (Int32) (System)[^].

But it cannot always be a character, that's why it's a string. A single .NET character is, strictly speaking, not a Unicode character; it can be a part of surrogate pair, beyond BMP. I recently tried to explain it here:
How to change one font to other font in the term of key maping[^].

But surrogate pairs are reserved code points in the range U+D800 to U+DFFF, so, if you want to work in the range up to 3516 = 0x0DBC, you will get a 1-character string, which you can interpret as "character" — problem solved.

—SA
 
Share this answer
 
v2
Comments
[no name] 27-Mar-16 11:36am    
A 5. Please allow me a question which I'm still not clear about it:

I read this in MSDN among others: BMP can represent 2^16 Code Points.
Do we not to need to substract the 1024 for the low Surrogate "start char". Still I have headache about These Unicode things to understand it completely :(
Thank you in advance.
Bruno
Sergey Alexandrovich Kryukov 27-Mar-16 12:10pm    
The delicate point is here: yes, 2^16 code points, but not 2^16 characters. Can you see the difference?

This is what exactly what was standardized by Unicode: the range e U+D800 to U+DFFF was reserved to be never used for characters. It's can be used for surrogate pairs only. When Unicode committee needs to introduce new character, they consider to put it into some unused room of code points which are not yet assigned to characters. The choice depends on the semantic/cultural value of the new character. If, for example, it's a new character related to already standardized alphabet, an attempt can be made to add it to the range previously reserved for that alphabet.

In all cases, new character can be chosen to be added to either BMP or not. Adding to BMP means adding it to 0 to U+FFFF domain, but excluding already reserved subset of code point, notably, excluding U+D800 to U+DFFF.

Alternatively, the character can be chosen to be added beyond BMP. It fills in 3 code points: one normal code point greater than U+FFFF and two code points (pair) in the surrogate range U+D800 to U+DFFF, to allow UTF-16 representation. But taking up the pair does not need any decision or reservation, as it is done "automatically", according to the UTF-16 algorithm: Descrpition; see also the table "UTF-16 decoder".

—SA
[no name] 27-Mar-16 12:29pm    
First of all thank you very much again for your answer to a "comment question.

"Can you see the difference?": Seems I can't yet, sorry...

I will read now again the rest of your answer in Detail and try to ask a question at CP. This also, that I can Forward some Points.

Meanwhile I feel to be realy stupid not to understand all this Unicode stuff :(

Thank you again
Bruno

How can the range for surrogates counted as code Points?
Sergey Alexandrovich Kryukov 27-Mar-16 12:48pm    
The difference is simple. First of all, code points are some pure mathematical entities, abstract integer numbers, abstracted from their computer representation. The are integer numbers as they are understood in mathematics. Some of such numbers are reserved by the Unicode standard, to point to some cultural entities. Nobody says that each code point is reserved to point to a character. Actually, the objects (as cultural entities) pointed by a code point are classified into 1) characters, 2) low surrogates, 3) high surrogates. Is it better now?

These surrogates (not pairs of them) can be considered as non-characters or some "technical characters" used to represent other characters in UTF-16. One interesting consequence is: it's possible to have invalid UTF-16 text (or it can be dubbed "non-text"), the text which cannot be interpreted as Unicode at all. Say, surrogates, if any, should only appear in proper pairs. If you have a single surrogate word without its counterpart, it makes the whole text invalid. By the way, UTF-8 algorithm is more stable, it's a really cunning algorithm, a stroke of a genius maybe...

Note that a .NET character is not, strictly speaking, a Unicode character, not even a character in the sense of UTF-16 (internal .NET representation of a character in memory is UTF-16LE). And this is only because, technically, a low surrogate or high surrogate is, technically, considered as a character. In .NET, in particular, string.Length is not, generally, a correct length in characters; it's only a length in 16-bit words. Even in .NET, Unicode should be taken with care. Say, if you write text search in editor and use string.Length to get, say, the position of the caret, it will be incorrectly set if there is a non-BMP character in between...

—SA
[no name] 28-Mar-16 10:12am    
I'm reading this at the Moment in Detail and try to understand. Looks like it solves a lot of my doubts. Was always afraid to ask questions about something like this... idiot me ;)


Something like " If you have a single surrogate word without its counterpart, it makes the whole text invalid":

That was a question from the very beginning (where I tried to understand UC vs. UTF16) for me. But such a Statement like yours, one can not find (or I did not find it until now) in MSDN.

++++5. Thank you again.
Bruno

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900