Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / desktop / MFC

Tiny C Runtime Library

4.86/5 (60 votes)
25 Mar 20079 min read 1   5.7K  
Reduce code bloat for those simple utility programs by using a streamlined C runtime - now with Unicode support!

Introduction

Updated 2007-03-25: See end of article for details.

Ever designed a simple utility program, such as a hex-dump program, only to find your simple program is a full 64K, optimized for size, when all it does is read a file and print to stdout? Ever wonder what happened to those good ol' DOS days where programs had to be small? Where a COM file was limited to 64K? Or when you can write a bare-bones DOS-style protected mode operating system kernel in about 64K?

Well look no further. Here I will examine what causes this code bloat, and what can be done to fix it.

Background

Matt Pietrek wrote an excellent article in the January 2001 MSDN Magazine titled Under the Hood: Reduce EXE and DLL Size with LIBCTINY.LIB. While most of this information remains valid today, I have updated some of his code to work better with Visual Studio 2005. I have also added support for functions that were not included in his article.

Intended Audience

This article is aimed at programmers who like to have control over every little detail. It is also geared towards small portable utility-like programs, where a DLL CRT is undesirable because of the need for a second file and installation program, and where the overhead of a statically linked CRT is much greater than the core program code.

Of course, by replacing the CRT, programs that rely on specifics of the Microsoft CRT will fail. For instance, if you go digging into the FILE structure, or expect a certain header on your memory allocations, or rely on the buffering features of stdio, or use locales, runtime checks, or C++ exception handling, you can't use this library. This library is aimed for use by small, simple programs, such as a hex-dump command line program or the many UNIX-style tools like cat or grep.

Many C/C++ purists will take offence at my suggestions, because the C runtime is, to them, something that shouldn't be tampered with. But bear with me, because although you might never use any of this article's information, it should at least give you an insight into how your program works.

Where's Bloat-o?

(really bad pun, I know...)

The source of this 'code bloat' is very easy to find by looking at a linker-generated map file. Here is a snippet from the demo programs' map file:

0001:00000000       ?DumpFile@@YAXPAD@Z        00401000 f   hd.obj
0001:00000152       _main                      00401152 f   hd.obj
0001:0000021b       _feof                      0040121b f   LIBCMT:feoferr.obj
0001:0000024a       _fgetc                     0040124a f   LIBCMT:fgetc.obj
0001:00000381       _printf                    00401381 f   LIBCMT:printf.obj
0001:00000430       __get_printf_count_output  00401430 f   LIBCMT:printf.obj
0001:00000446       __fsopen                   00401446 f   LIBCMT:fopen.obj
0001:0000050a       _fopen                     0040150a f   LIBCMT:fopen.obj
0001:00000520       _memset                    00401520 f   LIBCMT:memset.obj
0001:0000059a       __fclose_nolock            0040159a f   LIBCMT:fclose.obj
0001:0000060d       _fclose                    0040160d f   LIBCMT:fclose.obj
0001:00000689       __amsg_exit                00401689 f   LIBCMT:crt0dat.obj
0001:000006ad       ___crtCorExitProcess       004016ad f   LIBCMT:crt0dat.obj
0001:000006d3       ___crtExitProcess          004016d3 f   LIBCMT:crt0dat.obj
...
0001:0000a590       __allmul                   0040b590 f   LIBCMT:llmul.obj
0001:0000a5e0       _strchr                    0040b5e0 f   LIBCMT:strchr.obj
0001:0000a5e6       ___from_strstr_to_strchr   0040b5e6     LIBCMT:strchr.obj

As you can see, it includes "two" functions from my program, and over "two hundred" functions in the C Runtime (CRT).

Notice that one of the functions is even ___crtCorExitProcess, a function that is used by a C++/CLI program! Other gems include multithreading support, locales, and exception handling - none of which are used by my program!

And this is with Eliminate Unreferenced Data and COMDAT Folding on!

Where do I begin?

I will first highlight the various tasks performed by the C Runtime to give the reader a better understanding of the 'magic' that happens in C and C++.

Let's start by configuring the linker to Ignore Default Libraries. Compile. I was greeted with this:

hd.obj : error LNK2001: unresolved external symbol _feof
hd.obj : error LNK2001: unresolved external symbol _fgetc
hd.obj : error LNK2001: unresolved external symbol _printf
hd.obj : error LNK2001: unresolved external symbol _fopen
hd.obj : error LNK2001: unresolved external symbol _memset
hd.obj : error LNK2001: unresolved external symbol _stricmp
hd.obj : error LNK2001: unresolved external symbol _fclose
hd.obj : error LNK2001: unresolved external symbol _exit
LINK : error LNK2001: unresolved external symbol _mainCRTStartup

Not good. Not good at all.

mainCRTStartup

Where does your console program start? Did I hear you say main? If you did, you said what I would have said before journeying into the inner Stationworkings of the C Runtime.

Windows isn't nice enough to provide your app with a ready-made argc and argv. All it does is call a void function() specified in the EXE header. And by default, that function is called mainCRTStartup. Here is a simple example:

C++
extern "C" void __cdecl mainCRTStartup()
{
    int argc = _init_args();
    _init_atexit();
    _initterm(__xc_a, __xc_z);         // Call C++ constructors

    int ret = main(argc, _argv, 0);    // Don't handle environment strings

    _doexit();
    ExitProcess(ret);
}

We start by creating argc and argv, which we later pass to main. But before we do that we have to take care of some things, like calling the constructors for static C++ objects.

The same thing happens in GUI programs, except the function is called WinMainCRTStartup. And for DLLs, the true entry point is _DllMainCRTStartup. Unicode programs look for wmainCRTStartup and wWinMainCRTStartup respectively. DllMain appears to stay the same.

C++ Magic

The constructors of static objects don't just call themselves. And Windows is certainly not going to call them for us. So we have to do it ourselves. What do I mean?

C++
class StaticClass
{
public:
    StaticClass() {printf("StaticClass constructor\n");};
    ~StaticClass() {printf("StaticClass destructor\n");};
};
StaticClass staticClass;

void main()
{
    printf("main\n");
}

C++ programmers should automatically expect the output of this program to be:

StaticClass constructor
main
StaticClass destructor

Matt Pietrek has a great explanation in his article, mentioned earlier, under the heading "The Dark Underbelly of Constructors", so I will not bother going into that level of detail here. Suffice it to say that the compiler emits pointers to the constructor functions (actually thunks to constructor functions) in a special ".CRT" section in the object file, which is later merged with the ".data" section. By declaring a pointer to the start and the end of this section, the _initterm function is able to iterate over these pointers, calling each constructor in turn.

The constructor thunk function also registers an atexit callback to call the destructor of the object. Thus the mainCRTStartup function above goes to the trouble of creating an atexit table. The _doexit function is responsible for calling these functions.

Standard Functions

So now we have taken care of the program's entry point. What about the other functions?

printf and Family

One of the more complex tasks performed by the C Runtime is parsing the printf format string. (I'll admit it's not terribly complex; it's just non-trivial compared to strcmp) To save space, we can offload this processing to the Windows function wvsprintf. No, that's not a wide-character version. The w probably stands for Windows.

C++
extern "C" int __cdecl printf(const char *fmt, ...)
{
    va_list args;
    va_start(args, fmt);
    int ret = vprintf(fmt, args);
    va_end(args);

    return ret;
}

extern "C" int __cdecl vprintf(const char *fmt, va_list args)
{
    char bfr[2048];                // ugly... but this whole idea of replacing
                                   // the CRT could be called ugly too!
    int ret = wvsprintf(bfr, fmt, args);

    fwrite(bfr, ret, 1, stdout);   // fwrite takes care of CRLF translation
    return ret;
}

File I/O

Originally I had planned to eschew the FILE structure altogether - and instead just use a HANDLE cast to a FILE*. But this would have only given me two bits of information. As I added functionality to the library this ideal solution became less ideal when I needed to store an end-of-file flag, text-mode flag, and possibly other data. And besides, not using the FILE structure means that the stdin, stdout, and stderr identifiers don't work! So now I (ab)use the FILE structure.

Because I cannot change the FILE structure itself (it is defined in stdio.h) I have to use its fields to work with my data. A very ugly solution. But this library isn't intended to be pretty. NOTE however, that this means code that relies on internal fields in the FILE structure will crash. But then again, you shouldn't be messing with internal data structures anyways, right?

Thus, for illustration, here is fopen:

C++
extern "C" FILE *fopen(const char *path, const char *attrs)
{
    DWORD access, disp;
    if (strchr(attrs, 'w'))
    {
        access = GENERIC_WRITE;
        disp = CREATE_ALWAYS;
    }
    else
    {
        access = GENERIC_READ;
        disp = OPEN_EXISTING;
    }

    HANDLE hFile = CreateFileA(path, access, 0, 0, disp, 0, 0);
    if (hFile == INVALID_HANDLE_VALUE)
        return 0;

    _FILE *file = new _FILE;
    memset(file, 0, sizeof(_FILE));
    file->set_handle(hFile);

    if (strchr(attrs, 't'))
        file->_flag |= _FILE_TEXT;

    return file;
}

fread and fwrite are substantially more complicated than this, because they must translate '\r\n' combinations to '\n' only. For brevity, I will not discuss the algorithm - see the source code if you are interested.

String functions

Replacing the CRT means no more strlen, strcmp, or even memset. These must be implemented from scratch. Thankfully, they are not difficult to implement - just tedious. Care should be taken to handle NULL pointers and other special cases described in the MSDN documentation.

Wide Character (Unicode) Support

This is the major new feature in this library. It is still under development and hasn't undergone extensive testing yet.

As suggested by Hans Dietrich I have started to add wide-character support to the library. Basically that means implementing wide-character versions of various functions.

Uppercase and lowercase

When dealing with ASCII, functions like isalpha, toupper, and strlwr are trivial to implement. But as soon as Unicode enters the picture, they become much more complicated. There are different rules for uppercase versus lowercase and alphabetic versus numeric, so some operating system help is in order. To fix this problem, the function GetStringTypeW is used to implement the isXYZ family of functions, and the functions CharUpper and CharLower are used to implement toupper and tolower, respectively.

File encoding

Up until VS2005 even the Unicode file library functions could only write ASCII characters. Output to wprintf, fwprintf, and fwputs in text mode are all translated from Unicode before it is written to the file.

Because adding support for UTF-8, UTF-16, and other forms of file encoding would just add bloat to this library, I have made the decision to not include it. The behavior will remain compatible with the pre-VS2005 CRT. If you need to deal with file encodings, you probably need the full CRT anyway.

Why are you adding all this stuff? Why not keep it simple!

Simple: Only the stuff that you call is included in a release build!

But then why is Microsoft's CRT so bloated if you don't call much stuff? Again - because you do, but don't know it. The CRT startup code itself calls lots of functions that in turn call other functions - and a lot of it is garbage that isn't needed 90% of the time. Locales, exception handling, etc. have their place, but not in all programs. If your program doesn't use it, why should it have to pay the price of Microsoft's startup code using it?

The startup code and various functions in this CRT library are designed to rely on as little functionality as possible. Thus only the essentials are included.

Using the code

Add the tlibc (Tiny Libc) project to your project's solution, and add it as a referenced project. Alternatively, compile the library and add it to your project's linker options.

Because we are replacing the default CRT, C++ exception handling and SEH will not be handled properly. So don't use it! You will also need to turn off Buffer Security Check, set Runtime Checks to default, and disable Runtime Type Information.

Make sure to link with Ignore Default Libraries turned on! And to generate the smallest code, compile with link-time code generation on, optimize for size, turn string pooling on, and enable COMDAT folding and eliminate unreferenced data.

Results

After recompiling the program with libctiny and the method above, the EXE jumped from a giant 64K to a much more reasonable 4K! (4096 bytes to be exact). For comparison, the entire code section of the linker map file is reproduced below:

0001:00000000       ?DumpFile@@YAXPAD@Z        00401000 f   hd.obj
0001:0000013d       _main                      0040113d f   hd.obj
0001:0000021a       _fopen                     0040121a f   libct:file.obj
0001:000002a7       _fread                     004012a7 f   libct:file.obj
0001:000003c2       _fwrite                    004013c2 f   libct:file.obj
0001:0000048b       _fgetc                     0040148b f   libct:file.obj
0001:000004b6       _printf                    004014b6 f   libct:printf.obj
0001:000004ef       _memset                    004014ef f   libct:memory.obj
0001:0000050e       __doexit                   0040150e f   libct:initterm.obj
0001:0000053a       _mainCRTStartup            0040153a f   libct:crt0tcon.obj
0001:000005f5       _malloc                    004015f5 f   libct:alloc.obj
0001:00000607       __init_args                00401607 f   libct:argcargv.obj
0001:00000705       __ismbcspace               00401705 f   libct:isctype.obj

History

2007-03-25

  • Fixed strnicmp, pointed out by mpj

2006-08-19

  • Wide-character bugfixes (_fgetws)
  • Non-Unicode builds now set to SBCS rather than MBCS
  • Fixed typo bug in stderr, pointed out by Hans

2006-08-13

  • Preliminary wide-character support
  • Fixed memory leak in command-line parsing (existed in original)
  • Fixed memory leak in fread
  • Fixed behavior of feof
  • Fixed a rather embarrassing problem with strncpy
  • Added _DllMainCRTStartup that I accidentally omitted

2006-08-12

  • Submission to CodeProject

Comments, complaints, questions, etc. are welcome. Please let me know if you actually use this for something. If you need a function that is not included in this library, let me know and I will update the code. Comments on my 'comments' are also welcome.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here