|
Hey, that actually works! Thanks
|
|
|
|
|
Every day you learn something new, is a good day. Thank you!
|
|
|
|
|
Excellent, thanks!
"Real men drive manual transmission" - Rajesh.
|
|
|
|
|
Hi all,
i have and unicode file, i wanna read this file by character wise,means read only one character at a time.
please tell me how can i do this.
thanks in advace.
|
|
|
|
|
|
|
What kind of Unicode text file are you dealing with (e.g. UTF-8 , etc..)?
What do you mean with 'character' (e.g. 'Unicode character' or 'single byte'?)?
If the Lord God Almighty had consulted me before embarking upon the Creation, I would have recommended something simpler.
-- Alfonso the Wise, 13th Century King of Castile.
This is going on my arrogant assumptions. You may have a superb reason why I'm completely wrong.
-- Iain Clarke
[My articles]
|
|
|
|
|
CPallini wrote: What kind of Unicode text file are you dealing with (e.g. UTF-8 , etc..)? What do you mean with 'character' (e.g. 'Unicode character' or 'single byte'?)?
Using Unicode encoding type file and one character means single byte.
|
|
|
|
|
If you want to read a byte at time then use, as already suggested, fgetc[^].
If the Lord God Almighty had consulted me before embarking upon the Creation, I would have recommended something simpler.
-- Alfonso the Wise, 13th Century King of Castile.
This is going on my arrogant assumptions. You may have a superb reason why I'm completely wrong.
-- Iain Clarke
[My articles]
|
|
|
|
|
Hope you do know that unicode is two bytes and ascii is one byte...
|
|
|
|
|
UNICODE is actually a set of code-points whose cardinality requires 21 bits.
When encoded in sequence of 1 bye is called UTF-8 and when encoded as sequence of two bytes is called UTF-16.
In UTF-8 coding may vary from 1 to 4 bytes (and remains identical for code-points between 0 and 127, aka ASCII)
In UTF-16 coding may be 2 or 4 bytes (and is TWO for the most of Latin, Cyrillic and Greek characters, as many simplified Chinese).
UNICODE==2bytes is a misconception that originated at the time Windows included Unicode APIS using 16bits since -at that time- Unicode specs where not so wide.
Actually, reading 2bytes does not necessarily means "read a character".
2 bugs found.
> recompile ...
65534 bugs found.
|
|
|
|
|
unfortunately, i think it depends on what standard of C/C++ and what OS. I'm pretty sure windows defines unicode as 16bits...
from their website:
"Unicode: A fixed-width, 16-bit worldwide character encoding that was developed and is maintained and promoted by the Unicode Consortium, a nonprofit computer industry organization."
http://msdn.microsoft.com/en-us/library/cc194793.aspx[^]
|
|
|
|
|
I'm sorry for you and for Microsoft, but the one and only entitled to say what Unicode was, is and will be is www.unicode.org[^]
The page you linked is a very shame for Microsoft. A technical document like that cannot be written without specifying in the page itself a data when it was written (hey ... they speak about their new amazing Windows NT 3.5 ...) and for this sole fault should disqualify M$ of whatever authority in the field.
2 bugs found.
> recompile ...
65534 bugs found.
|
|
|
|
|
so angry! ...similar articles found in the MS VS2010 area of MSDN... i don't do much in unicode so haven't needed to worry about it...
|
|
|
|
|
This is actually a miscoception ...
see here[^].
2 bugs found.
> recompile ...
65534 bugs found.
|
|
|
|
|
|
Nope.
1) Unicode is not a Microsoft product. What UNICODE is is defined by www.unicode.org[^]
2) Microsoft use to encode UNICODE into 16-bits units. That is a technique well defined by the UNICODE standard itself, known as UTF-16. Essentially, every code not in the range 0xD800-0xDFFF and lower than 0xFFFF is code as itself.
Every other greater that 0xFFFF is broken in two 10-bits chunks, or-ed with 0xD800 and 0xDC00 respectively.
The range 0xD800 - 0xDFFF is called "UNICODE surropgate" and does not contain valid codepoints.
So you can have single unicode characters requiring two wchar_t in sequence to be represented and sequences of two wchar_t representing a single character, with code greater than 0xFFFF (typical for CJK - Chinese, Japanese, Corean characters).
2 bugs found.
> recompile ...
65534 bugs found.
|
|
|
|
|
i certainly believe your point about unicode consortium being the authority... no argument there!
|
|
|
|
|
Use fgetc()/fgetwc() [^] or wcin [^]. It's exactly the same process as reading non-Unicode.
I must get a clever new signature for 2011.
|
|
|
|
|
i m using this now but its not successful.
FILE *stream;
char buffer[2];
int kk, ch;
fopen_s( &stream, OpenFile, "r" );
if( stream == NULL )
{
return ;
}
ch = fgetc( stream );
for( kk=0; (kk < 1 ) && ( feof( stream ) == 0 ); kk++ )
{
buffer[kk] = (char)ch;
ch = fgetc( stream );
}
buffer[kk] = '\0';
printf( "%s\n", buffer );
fclose( stream );
here its unable to open the file and stream is alwaz null thats why its return.
|
|
|
|
|
You are ignoring the return code from fopen_s so it's impossible to diagnose your problem. Use something like the following and look up the error code that you receive.
errno_t errNum = fopen_s( &stream, OpenFile, "r" );
if (errNum != 0)
{
}
I would suggest you look at the documentation here[^] for further guidance.
Incidentally, the rest of your code does not seem to be set up to process Unicode data, which was the subject of your original question.
I must get a clever new signature for 2011.
|
|
|
|
|
If you have the program set to unicode, you need to use _wfopen_s to open the file, and the filename (OpenFile) needs to be specified as wchar_t something like
wchar_t Myfile[] = "my_file.ext";
Then you should be able to use fgetwc to get the characters using ch = fgetwc( stream ); your should specify ch as wchar_t
|
|
|
|
|
Hi,
How can I check if a dll exist in c++ before I try to load it with dllHandle = LoadLibraryW(m_sFileName)? Sometimes it looks like that the LoadLibraryW call hangs the calling thread if the dll doesn't exist on the disc.
Regards
Olof
|
|
|
|
|
|