Introduction
For file storage and data communication to work interoperably, the width of datatype must stay invariant across platforms. This tip discusses the pitfalls of platform-dependent data width and their solution. Endianess, deserving a tip of its own, is not covered here.
time_t
time_t
stores the number of seconds since 1st January, 1970. It is a 32-bit integer, on 32-bit Linux, where it can run up to year 2038, a Y2K equivalent crisis for Linux and otherwise, it is 64-bit on 64-bit Linux. Whereas on modern Visual C++, time_t
is 64-bit, no matter the x86 or x64 platform. time_t
is not guaranteed to be interoperable between platforms, so it is best to store time as text and convert to time_t
accordingly.
wchar_t
wchar_t
type to hold the Unicode character is UTF-16 on Windows while UTF-32 on Linux/MacOS, therefore incompatible with each other. UTF-16 character can be 2 bytes or 4 bytes depending on its codepage while UTF-32 character is always 4 bytes which is a colossal waste of memory since most Unicode characters can be expressed in 2 bytes. UTF-8 is 1 byte for ASCII and multibyte for Unicode. For interoperability between Windows and other OSes, the solution is to store the text in UTF-8 and convert to wchar_t
upon loading. Another solution is to use fixed-width character types such as char16_t
or char32_t
introduced in C++11.
Integer Types
size_t
and its signed counterpart type, ptrdiff_t
whose width varies on x86 or x64 platform, should always be avoided in storage and communication packet. Undetermined width type like long
type should be avoided as well. Use the fixed width integer types introduced in C++11, such as uint32_t
and int32_t
.
Pointer Types
Pointer width varies according to x86 or x64 mode. Pointer sometimes are used as a opaque index/identity. Window SDK's DWORD_PTR
is one such example. Pointer derived identity can be temporarily stored in database, file storage or network packets due to distinctness of memory address. It poses a problem where a 64-bit value is sliced off in a 32-bit, say database column type, when the code is recompiled in x64 mode from the original x86 mode. If it has to be done, then use the largest pointer width as the data width. If not, it is best to derive your identity through other means like GUID or truly random number generation.
History
- 15th August, 2019: Initial version