This post shows how to convert between NSString and C strings in an iOS project.
I recently had to use a C library in an iPhone project, which is mostly in Objective-C. Things were going on smoothly until I had some C functions that return C string
s (wchar_t*
, char*
) and require conversions in order to work with Objective-C NSString*
types.
There are three ways to declare a string
in an iOS project in xCode:
NSString* str = @"hello world"; wchar_t* str = L"hello world"; char* str = "hello world";
If you declare a Unicode string
via the L”” syntax, the compiler defaults it to UTF32. The function wcslen()
to get the length (e.g., number of characters) of a string
may not work properly if the input string is not UTF8 encoded. For example, try the following code:
wchar_t* str1 = L"Giới thiệu về Google"; wchar_t* str2 = L"Gioi thieu ve Google"; printf("str1 length: %d", wcslen(str1));
printf("str2 length: %d", wcslen(str2));
The code will output wrong length for str1
and correct length for str2
, even though they have the same number of characters. I think wcslen
is confused by the UTF32 characters in str1
and counts some characters more than once. However, if I try the following code:
char* str3 = "Giới thiệu về Google";
setlocale(LC_ALL, "en_US.UTF-8″);
int buflen = strlen(str3)+1;
wchar_t* buffer = malloc(buflen * sizeof(wchar_t));
mbstowcs(buffer, str3, buflen);
printf("str3 length: %d", wcslen(str3));
free(buffer);
to declare an ANSI string
and convert it to UTF8 wide string
by using setlocale
to ensure the correct Unicode encoding, wcslen
will return the correct string
length. Not knowing what the problem is, I have to make sure that all C string
s in my project are UTF8 encoded.
Conversion from NSString*
to an ANSI string
(char*
) is easy using the built in NSUTF8StringEncoding
method. The returned value is valid as long as the original value is valid, so there is no need to release or free it. The following method (taken from my custom NSString
category) shows how to achieve this:
- (const char*)getMultiByteString
{
return [self cStringUsingEncoding:NSUTF8StringEncoding];
}
It is a bit more complicated with C function mbstowcs
to convert from NSString*
to a wide string
(wchar_t*
):
- (wchar_t*)getWideString
{
const char* temp = [self cStringUsingEncoding:NSUTF8StringEncoding];
int buflen = strlen(temp)+1; wchar_t* buffer = malloc(buflen * sizeof(wchar_t));
mbstowcs(buffer, temp, buflen);
return buffer;
}
It is the responsibility of the caller to free the returned buffer. To improve, one can free the return value in the dealloc()
method of NSString
. The return type should then be changed to const wchar_t*
to indicate that the returned value is read-only.
Take note that wchar_t
is 2 bytes on Windows but 4 bytes on Unix/Linux (including iOS). The above function uses sizeof
to determine the size of wchar_t
for the sake of generality.
Using stringWithUTF8String
and wcstombs
, we can do the reverse and convert a C string
into NSString
:
+ (NSString*)stringWithWideString:(const wchar_t*)ws
{
int bufflen = 8*wcslen(ws)+1;
char* temp = malloc(bufflen);
wcstombs(temp, ws, bufflen);
NSString* retVal = [self stringWithUTF8String:temp];
free(temp);
return retVal;
}
I hope this will help others with similar problems.