1. The Purpose
The purpose of this article is to clear the essential points about the Windows API, the C Runtime Library (CRT), and the Standard C++ Library (STL). It is not uncommon that even experienced developers have confusion and hold onto misconceptions about the relationship between these parts. If you ever wondered what is implemented on top of what and never had a time to figure it out, then keep reading.
2. Basics
The following diagram represents the relationship between WinAPI, CRT, and STL.
Diagram #1: The relationship between Windows API, CRT, and the C++ Standard Library
Adjacent blocks can communicate with each other. What does it mean? Let's go from the bottom to the top.
2.2. Hardware
Each hardware part exposes its own set of commands that enables the Operating System to control and communicate with it. The amount and complexity of the commands varies from part to part. Often, different vendors of the same part may provide additional commands beyond the requirements of a common standard. Communication with countless hardware devices with endless variety of commands would be enormous toil for software writers if they had to access it directly. Here, the Operating System comes to the rescue.
2.3 Operating System
The purpose of the OS is to encapsulate all the intricacies of the underlying hardware and provide a unified access interface to the computer's parts. No application can access the hardware directly. Only the OS can access the hardware. The part of the OS that accesses the hardware is said to run in kernel mode.
Older OSs like MS-DOS, for example, allowed programs to access hardware resources directly. Though it enabled software writers to make certain performance gains, in the long run, this technique often made the software very brittle, and incompatible with newer hardware parts.
2.4 Application Programming Interface
The OS exposes the underlying machine resources by means of an Application Programming Interface (API). An API is a uniform set of functions that enables software developers to abstract from hardware peculiarities and focus on their own goals. An application cannot bypass the OS and access hardware resources directly. It is commonly said that applications run in user mode. MS Windows provides an API as a set of C functions. The C language is chosen as the lowest common denominator for software development under the Windows platform.
2.4.1 Platform Software Development Kit
MS distributes a free Platform Software Development Kit (Platform SDK or PSDK), which enables software developers to write Windows programs. The PSDK contains:
- Header files with API function declarations
- Import lib files to link with (where calls to API functions are redirected to the relevant DLLs)
- Documentation
- Various binary helper tools
For example, to open or create a file, we call the CreateFile
function, which is declared in the "WinBase.h" header file and requires the "Kernel32.lib" library to link with.
The names of Windows API functions follow the Camel case naming convention and usually are easily distinguished by this. Names of macros and constants are conventionally in uppercase. Each function always has a "Requirements" section on its documentation page where the necessary headers, import libraries, and supported OS versions are specified.
A Windows application can call any API function, provided the application follows the function's signature and links with the appropriate import library (or gets the function's address directly from the implementing DLL with the GetProcAddress
call).
2.5 C Runtime Library
On top of the OS API functions, software vendors implement the C Runtime Library (CRT). CRT is a standardized set of header files and C functions which implement common tasks such as string operations, some math functions, basic input/output etc. Usually, the same vendor that makes the C compiler also provides the CRT implementation. The International Organization for Standardization [^] is responsible for the C language standard and its runtime library.
2.5.1 Standards and Extensions
Theoretically, by using only standard C functions, the developer can ensure that the same code may be used to build and run a program under any platform where a decent C compiler and CRT implementation exists. However, in practice, software vendors include many useful extensions to standard library functions, which make developers' life easier but at a price of portability.
The names of CRT functions are in lower case. The names of macros and constants are in uppercase. The names of extensions begin with the underscore character; for example, the _mkdir
function. Each function always has a "Requirements" section on its documentation page where its header is specified.
2.6 Unicode Awareness
2.6.1 Platform SDK is Already Unicode Aware
Actually, the above mentioned Win32 API names are not real names. These names are mere macros that are defined in the PSDK header files. So, when the PSDK documentation mentions a function, for example CreateFile
, a developer should be aware that CreateFile
is a macro. The true names of the CreateFile
function are CreateFileA
and CreateFileW
. Yes, there are two, rather than one, versions for many Win32 API functions. The version that ends with 'A' accepts ANSI character strings, i.e., strings of regular char
s. Another version ends with 'W' (the so called "wide" version) and accepts Unicode character strings, i.e., strings of wchar_t
s. Both versions are implemented within the kernel32.dll module. The CreateFile
macro will expand into the CreateFileW
name if the UNICODE
symbol is defined for a project, and into the CreateFileA
name otherwise.
There are three families of Windows OS: MS-DOS/9x-based, Windows CE, and Windows NT.
- The MS-DOS/9x-based family, which includes Windows 1.0-3.11, 95, 98, and Windows ME, is based on the MS-DOS OS. Earlier versions of Windows: 1.0-2.0 are true 16-bit OSs. Newer versions: 3.0, 95, 98, and ME are the so called hybrid 16/32-bit OSs. They are 16-bit at low level, but capable of running 32-bit programs with certain limitations. One of these limitations is that only the ANSI version of the Win32 API functions exist on this platform. Currently, the MS-DOS/9x-based family is extinct and unsupported by Microsoft.
- The Windows NT family started from Window NT 3.1 in early 90's and includes Windows NT 4, Windows 2000, Windows XP, Window Vista, and Server flavors of these OSs. The Windows NT family is true 32-bit. It supports both ANSI and Unicode versions of the Win32 API. The Windows NT family operates with Unicode strings internally. The ANSI version of a Win32 API function is a mere wrapper around the real worker – the Unicode version of a function.
- The Windows CE family is intended for mobile and embedded devices. It is true 32-bit. Windows CE supports only the Unicode version of the Win32 API.
2.6.2 PSDK Solution: TCHARs
In order to avoid multiple PSDKs for different Windows families, Microsoft implemented generic text characters or TCHAR
s. TCHAR
and other relevant macros are defined in the WinNT.h header file. The main idea is that the developer never uses the char
or wchar_t
types explicitly, but uses the TCHAR
macro instead. The TCHAR
macro will expand into the appropriate character type depending on whether the UNICODE
symbol is defined for a build. In the same manner, instead of calling the 'A' or 'W' version of a Win32 API function, the developer calls a generic macro version, which will accommodate the actual character type at compile time.
LPCTSTR psz = TEXT("Hello World!");
TCHAR szDir[MAX_PATH] = { 0 };
GetCurrentDirectory(MAX_PATH, szDir);
const char* psz = "Hello World!";
char szDir[MAX_PATH] = { 0 };
GetCurrentDirectoryA(MAX_PATH, szDir);
const wchar_t* psz = L"Hello World!";
wchar_t szDir[MAX_PATH] = { 0 };
GetCurrentDirectoryW(MAX_PATH, szDir);
Using TCHAR
s allows a developer to maintain a single code line both for ANSI and Unicode builds. Nowadays, if you do not intend to target old Windows 9x/Me platforms, you can safely forget about TCHAR
s and use Unicode strings everywhere and make Unicode only builds. As an added bonus, Unicode applications can forget about code pages hustle and use the same logic for all strings.
The easy way to remember PSDK string declarations is to say them loud:
L P C T STR = const TCHAR*
^ ^ ^ ^ ^
| | | | |
Long -------+ | | | |
Pointer to ---+ | | |
Constant -------+ | |
TCHAR ------------+ |
STRing -------------+
Sometimes L
- "Long" is omitted, since long and short pointers are obsolete for the Win32 platform. So, typedef can look like PTSTR
= "pointer to TCHAR
string", which is just TCHAR*
.
Here are two screenshots of the same program. The first screenshot is taken when the program is built as ANSI. The second screenshot demonstrates the Unicode build of the program.
Naive ANSI program from the 20th century. All non-English characters are converted into illegible '?' symbols.
A modern Unicode program is aware of other languages.
2.6.3 CRT Solution: _TCHARs
Following the Platform SDK logic, Microsoft introduced generic text mapping into its C runtime library. CRT uses an additional header file to define generic character macros: "tchar.h". In order to be compliant with the requirements of the C language standard, all non-standard names start from the underscore symbol. Also, CRT uses the shorter _T()
macro for literal strings instead of the longer TEXT()
macro, which is defined in "WinNT.h". CRT authors decided to advance the generic text notion even further, and as a result of this decision, now CRT distinguishes three modes for text characters:
- SBCS - The Single Byte Character Set. The classic
char
is used for strings. One ASCII character fits within one char
element. No symbol has to be defined for a project. This is the traditional C language approach that survived from the 1970's to our days. English characters are represented with values 0x00 - 0x7F; non-English characters are represented with values 0x80 - 0xFF. The actual meaning of non-English characters is interpreted according to the currently active code page. - _MBCS - The Multi-Byte Character Set. The classic
char
is used for strings. One multi-byte symbol may require one or two char
elements. The _MBCS
symbol has to be defined for a project. _MBCS is backward compatible with the SBCS mode, and was the default choice for new projects in MS Visual C++ until version 8.0 (2005). _MBCS was commonly used for Eastern Asian languages like Japanese, Korean, and Chinese. Now, _MBCS is being mostly ousted by Unicode characters. Using _MBCS was the only feasible option to handle Eastern Asian languages on Windows 9x/Me platforms. - _UNICODE - The Unicode Character Set. The
wchar_t
type is used for strings. One Unicode symbol occupies one wchar_t
element, which is 16-bit on the Windows platform, and can represent up to 65535 different values. This is the default mode for the new projects starting from version 8.0 (2005) of MS Visual C++.
CRT uses the _MBCS
and _UNICODE
symbols definition in order to distinguish between multi-byte and Unicode builds.
Diagram #2: The Generic Text Mapping in CRT
Generic-text data type or name | SBCS (_UNICODE, _MBCS not defined) | _MBCS defined | _UNICODE defined |
---|
_TCHAR | char | char | wchar_t |
_T("Hello, World!") | "Hello, World!" | "Hello, World!" | L"Hello, World!" |
Function name prefix and example:
_tcs
_tcscat , _tcsicmp | str , _str
strcat , _stricmp | _mbs
_mbscat , _mbsicmp | wcs , _wcs
wcscat , _wcsicmp |
_TCHAR message[128] = _T("The time is: ");
_TCHAR* now = _tasctime(&tm);
_tcscat(message, now);
_putts(message);
char message[128] = "The time is: ";
char* now = asctime(&tm);
strcat(message, now);
puts(message);
char message[128] = "The time is: ";
char* now = asctime(&tm);
_mbscat(message, now);
puts(message);
wchar_t message[128] = L"The time is: ";
wchar_t* now = _wasctime(&tm);
wcscat(message, now);
_putws(message);
2.7 C++ Standard Library
The C++ programming language defines its own standard library. The C++ Standard Library specifies a set of classes and functions that facilitate common programming tasks.
Often, the C++ Standard Library is referred to as STL. This abbreviation belongs to pre-standard times, and stands for Standard Template Library. The latest revision of the C++ standard STL became a subset of the C++ Standard Library. However, the term STL is still ubiquitous and used as a synonym for the C++ Standard Library.
The International Organization for Standardization [^] is responsible for the C++ language standard and its library.
2.7.1 Contents of the C++ Standard Library
The C++ Standard Library may be divided into the following major parts:
- Containers, where common data structures are defined, such as
vector
, set
, list
, map
etc. - Iterators, which provide a uniform way to operate over standard containers.
- Algorithms, which implement common useful algorithms. Algorithms use iterators instead of working directly with containers. That's why the same implementation of an algorithm can be used with different standard containers.
- Allocators, which handle memory storage allocation/deallocation for elements in containers.
- Function Objects and Utilities, which are helpers to algorithms and containers.
- Streams, which provide a uniform object oriented way of input/output.
- C Runtime Library. Due to the backward compatibility of C++ with the C language, CRT is incorporated into the Standard C++ Library.
2.8 Cross-platform Development
Sometimes there is a requirement that a software program will run on several computer platforms. The developer may choose to develop as many separate code bases of software as there are target platforms. However, this approach is tedious and error prone. It is also wasteful and ineffective considering development resources since the same functionality must be implemented and maintained over and over again.
The common approach is to develop a single code base for all platforms and restrict the usage of platform-dependent API functions and vendor-specific standard library extensions. It makes development harder; however, in the long run, all platforms benefit from new features and bug fixes.
3. Code Reuse
There are two ways to incorporate the CRT and/or the C++ Library code into a program: static linking and dynamic linking. In the following discussion, I will use solely the CRT term to save typing; however, these concepts are relevant both to CRT and the C++ Standard Library.
3.1 Linking Statically
When the CRT/C++ Library is linked statically, then all its code is embedded into the resulting executable image. This technique has both advantages and disadvantages.
Advantages:
- Simple deployment. It is enough to copy a program to the destination computer to make it run. No need to worry about complicated scenarios of CRT/C++ Library deployment.
- No additional files. It can be very convenient for small utility applications to comprise just of one executable file. Such self-contained applications can be easily downloaded and redistributed without the risk of breaking its integrity.
Disadvantages:
- Not serviceable. New versions of a library and fixes of old versions are invisible for statically linked programs.
- Domino Effect of static linking. In the modern world, rarely can a program pull it out all by itself. Nowadays, software programs are complex, and heavily rely on third party components and libraries. Also, a software program itself is often divided into several loosely coupled modules. Using static linking to CRT in one of them greatly reduces interoperability between modules and forces developers to fall back on the lowest common denominator, i.e., the C interface with explicit methods for the acquisition and release of resources. The following section discusses the issue in more details.
3.1.1 CRT as a Black Box
The problem is that internal CRT objects cannot be shared with other CRT instances. The memory allocated in one instance of CRT must be freed in the same instance, the file opened on one instance of CRT must be operated and closed by functions from the same instance, etc. It happens because the CRT tracks the acquired resources internally. Any attempt to free a memory chunk or read from a file via FILE*
that came from another CRT instance will lead to corruption of the internal CRT state and most likely to crash.
That's why linking CRT statically obligates a developer of a module to provide additional functions to release allocated resources and a user of a module to remember to call these functions in order to prevent resource leaks. No STL containers or C++ objects that use allocations internally can be shared across modules that link to the CRT statically. The following diagram illustrates the usage of a memory buffer allocated via a call to malloc
.
Diagram #2: Using memory allocated by malloc
from different modules
In the above diagram, Module 1 is linked to the CRT statically, while Modules 2 and 3 are linked to the CRT dynamically. Modules 2 and 3 can pass CRT owned objects between them freely. For example, a memory chunk allocated with malloc
in Module 3 can be freed in Module 2 with free
. It is because both malloc
and free
calls will end up in the same instance of CRT.
On the other hand, Module 1 cannot let other modules to free its resources. Everything allocated in Module 1 must be freed by Module 1. It is because only Module 1 has access to the statically linked instance of the CRT. In the above sample, Module 2 must remember to call a function from Module 1 in order to properly release the acquired memory.
3.2 Linking Dynamically
When the CRT/C++ Library is linked dynamically, only small import libraries are linked with the resulting executable image. Import libraries contain instructions for where to find the actual implementation of the CRT/C++ Library functions. On a program's start, the system loader reads these instructions and loads the appropriate DLLs into the process' address space.
Advantages:
- Improved Modularity. As described in previous sections, the overall modularity of a program can benefit from dynamic linking. A program can be divided into several modules while being able to pass relatively high-level objects between them.
- Faster start. CRT DLLs are preloaded by the system on start. Then, when a program needs to load a CRT module, no actual load occurs. It enables the system to save physical memory and reduce page swapping.
Disadvantages:
- Complicated deployment. CRT libraries must be redistributed and properly installed in order for a program to work. It requires writing an additional setup program and thinking out a deployment strategy.
4. Summary
The article described relationships and dependencies between the Windows API, the C Runtime Library, and the Standard C++ Library. The Windows API is the lowest operational level for user mode programs. On top of the Windows API, there is the C Runtime Library, which encapsulates and hides the Operating System differences. The Standard C++ Library provides much more functionality and also includes the CRT as an integral part. Using only standardized functions and classes allows to write cross-platform applications. Such applications require rebuild only in order to run on a new platform. No code change is required.
Both the C Runtime Library and the Standard C++ Library can be linked to statically or dynamically, depending on the application's needs. Each method has its own advantages and drawbacks.