What a jolly good question.
The preprocessor takes a look at your source code just before it goes off to the compiler, does a little formatting, and carries out any instructions you have given it.
Like what?
Well, preprocessor instructions are called preprocessor directives, and they all start with a #
.
Like #include?
Exactly.
Each #
command that the preprocessor encounters results in a modification to the source code in some way. Let’s take a look at them briefly in turn, and then we’ll see what goes on behind the scenes.
#include
Includes header files for other libraries, classes, interfaces, etc. The preprocessor actually copies the entire header into your source file* (yes, that’s why inclusion guards are such a good thing).
#define
Who doesn’t love macros! The preprocessor replaces all instances of the definition with the code that is defined. The definition holds unless an #undef
directive is found for that definition.
#ifdef
Conditional behaviour that tells the preprocessor to include code within the conditional declaration IF
the condition is met. You can use these just like if
-else
statements, choosing from: #ifdef
, #ifndef
, #if
, #else
, and #elif
, and you always need to finish with an #endif
.
#error #warning
Used for sending messages to the user. The preprocessor stops on #error
, but not on #warning
. In both cases, it sends any string
it finds after the directive (in quotes please), to the screen as output, so they are handy ways to ensure everything is set up correctly for your platform.
#line
Used to alter the line number and filename displayed when you encounter compilation errors. If, for example, you need to refer back to a certain source file from compilation of an intermediate file (that is possibly auto-generated).
#pragma
Other specific directives interpreted by the compiler. Your compiler documentation will tell you what pragmas are available and you should never assume that they will be available globally.
#assert #unassert
These were eternally popular in older programs (well, the ones I’ve worked on at least), but they are now considered obsolete. Their use is strongly discouraged, which means don’t put them in new code.
Predefined Macros
There are a number of predefined macros available for use:
__FILE__
Gives the filename as a string
__LINE__
Gives the current line number (as an integer) __DATE__
The compile date as a string
__TIME__
The compile time as a string
__STDC__
Compiler dependent, but usually defined as 1
to indicate compliance with the ISO C standard. __cplusplus
Always defined when compiling a C++ program
The first two in particular are really useful in debugging. Just pop them in and magically you get informative output without having to write your own file and line processing class.
Your compiler may support other macros. For example, the full list (for GCC) can be found here.
So What Actually Happens When You Run the Preprocessor?
- Replace all trigraphs. I’ll actually talk about this in a future post, because although it’s effectively a historical feature (and you have to switch it in GCC), it’s still quite interesting.
- Concatenate source code split over multiple lines.
- Remove each comment and replace with a space.
- Deal with preprocessor directives (those we talked about above). For
#include
, it recursively carries out steps 1 -3 on the new file. - Process any escape sequences.
- Pass the file to the compiler.
If you want to see what your file looks like after preprocessing (and who doesn’t?), you can pass gcc the -E
option. This will send the preprocessed source code to stdout
and then stop execution without compiling or linking.
e.g.
g++ -E myfile.cpp
Or, you can use the compile flag:
-save-temps
To compile as usual but to keep a copy of the temporary files.
For example, let’s take a simple program:
#include <stdio.h>
#define ONE 1
#define TWO 2
int main()
{
printf("%d, %d\n", ONE, TWO);
return 0;
}
And then compile it with:
g++ hello.cpp -save-temps
When compilation is finished, you’ll have two additional files in your directory: hello.s and hello.ii.
hello.s contains assembler instructions and hello.ii contains your source with the preprocessing completed.
If you look at hello.ii in a text editor, you’ll see that it has a LOT of code in it. That’s because you used an #include
directive to pull in the stdio
header.
Even better, if you scroll right to the bottom, you can also see that the preprocessor has replaced the ONE and TWO macros in the printf
statement with the actual definitions, 1 and 2.
Awesome!
*Actually, it makes a temporary copy of your source file and expands out all the directives that it finds into that copy. The file is deleted after use, so ordinarily you would never know it existed.
CodeProject