Introduction
Ever wanted to have your own C/C++ preprocessor? Or maybe you are curious
about how this invisible everyday helper of your toolbox works? If yes, you may
want to read further. If no - before hitting the 'back' button of your browser
consider to learn something new and read further too :-).
The C++ preprocessor is a macro processor that under normal circumstances is
used automatically by your C++ compiler to transform your program before the
actual compilation. It is called a macro processor because it allows you to
define macros, which are brief abbreviations for longer constructs. The C++
preprocessor provides four separate facilities that you can use as you see fit:
- Inclusion of header files
- Macro expansion
- Conditional compilation
- Line control
These features are greatly underestimated today, even more, the preprocessor
has been frowned on for so long that its usage just hasn't been effectively
pushed until the Boost preprocessor library [1]
came into being a few years ago. Only today we begin to understand, that
preprocessor generative metaprogramming combined with template metaprogramming
in C++ is by far one of the most powerful compile-time
reflection/metaprogramming facilities that any language has ever supported.
The C++ Standard [2]
was adopted back in 1998, but there is still no (known to me) C++ compiler,
which has a bug free implementation of the rather simple preprocessor
requirements mandated therein. This may be a result of the mentioned
underestimation or even banning of the preprocessor from good programming style
during the last few years or may stem from the somewhat awkward standardized
dialect of English used to describe it.
So the Wave preprocessor library is an attempt to:
- Provide a free, fully Standard conformant and (hopefully) bugfree
implementation of the mandated preprocessor functionality
- Make maximal usage of the C++ STL and/or Boost [3]
libraries (for compactness and maintainability)
- Achieve straightforward extendibility for the implementation of additional
features
- Build a flexible library for different C++ lexing and preprocessing needs.
To simplify the parsing task of the input stream (which is most of the time,
but not restricted to, a file) the Spirit parser construction library [4]
is used.
Background
The Wave C++ preprocessor is not a monolithic application, it's rather
a modular library, which exposes mainly a context object and an iterator
interface. The context object helps to configure the actual preprocessing
process (as search path's, predefined macros, etc.). The exposed iterators are
generated by this context object too. Iterating over the sequence defined by
these two iterators will return the preprocessed tokens, which are to be built
on the fly from the given input stream.
The C++ preprocessor iterator itself is feeded by a C++ lexer iterator, which
implements an unified interface. By the way, the C++ lexers contained within the
Wave library may be used standalone too and are not tied to the C++
preprocessor iterator at all. As a lexer I'll understand a piece of code, which
combines several consecutive characters in the input stream into a stream of
objects (called tokens) more suitable for subsequent parsing. These tokens carry
around not only the information about the matched character sequence, but
additionally the position in the input stream, where a particular token was
found. In other words the lexer removes all this so-needed-by-human garbage like
spaces, newlines, etc. (i.e. performs some lexical transformation) leaving the
structural transformation for parser.
To make the Wave C++ preprocessing library modular, the C++ lexer is
held completely separate and independent from the preprocessor. To proof this
concept, there are two different C++ lexers implemented and contained within the
library by now, which are functionally completely identical. The C++ lexers
expose the mentioned unified interface, so that the C++ preprocessor iterator
may be used with both of them. The abstraction of the C++ lexer from the C++
preprocessor iterator library was done to allow to plug in different other C++
lexers too, without the need to re-implement the preprocessor. This will allow
for benchmarking and specific finetuning of the process of preprocessing
itself.
During the last weeks Wave got another field of application: testing
the usability and applicability of different Standards proposals. A new C++0x
mode was implemented, which allows to try out and help to establish some ideas,
which are designed to overcome some of the known limitations of the C++
preprocessor.
Using the code
The actual preprocessing is a highly configurable process, so obviously you
have to define a couple of parameters to control this process, such as:
- Include search paths, which define, where to search for files to be included
with
#include <...>
and #include "..."
directives
- Which macros to predefine and which of the predefined macros to undefine
- Several other options as for instance to control, whether to enable some
extensions to the C++ Standard (for instance variadics and placemarkers) or not.
You can access all these processing parameters through the
wave::context
object. So you have to instantiate at least one
object of this type to use the Wave library. For more information about
the context template please refer to the class reference as included in the
downloadable file or as may be found here.
The context object is a template class, for which you have to supply at least
two template parameters: the iterator type of the underlying input stream to use
and the type of the token to be returned from the preprocessing engine. The type
of the used input stream is defined by you, so may the token type, but as a
starting point I would recommend to use the token type predefined as the default
inside the Wave library - the wave::cpplexer::lex_token<>
template class. A full reference of this class you can find inside the
downloadable file or here.
The main preprocessing iterators are not to be instantiated directly, but
should be generated through this context object too. The following code snippet
preprocesses a given input file and outputs the generated text into
std::cout
.
std::ifstream instream("input.cpp");
std::string input(
std::istreambuf_iterator<char>(instream.rdbuf());
std::istreambuf_iterator<char>());
typedef wave::context<std::string::iterator,
wave::cpplexer::lex_token<> >
context_t;
context_t ctx(input.begin(), input.end(), "input.cpp");
context_t::iterator_t first = ctx.begin();
context_t::iterator_t last = ctx.end();
while (first != last) {
std::cout << (*first).get_value();
++first;
}
This sample shows, how the input may be read into a string variable, from
where it is fed into the preprocessor. But the parameters to the constructor
of the wave::context<>
object are not restricted to this type
of input stream. It can take a pair of arbitrary iterator types (conceptually at
least forward_iterator
type iterators) to the input stream, from
where the data to be preprocessed should be read. The third parameter supplies a
filename, which is subsequently accessible from inside the preprocessed tokens
returned from the preprocessing to indicate the token position inside the
underlying input stream. Note though, that this filename is used only as long no
#include
or #line
directives are encountered, which in
turn will alter the current filename.
The iteration over the preprocessed tokens is relatively straight forward.
Just get the starting and the ending iterators from the context object (maybe
after initializing some include search paths) and you are done! The
dereferencing of the iterator will return the preprocessed tokens, which are
generated on the fly from the input stream.
As you may have seen, the complete library resides in a C++ namespace
wave
. So you have to explicitly specify this while using the different
classes. The other way around is certainly to place a using namespace
wave;
somewhere at the beginning of your source files.
The Wave tracing facility
If you ever had the need to debug a macro expansion you had to discover, that
your tools provide only little or no support for this task. For this reason the
Wave library got a tracing facility, which allows to get selectively some
information about the expansion of a certain macro or several macros.
The tracing of macro expansions generates a possibly huge amount of
information, so it is recommended, that you explicitly enable/disable the
tracing for the macro in question only. This may be done with the help of a
special #pragma
:
#pragma wave trace(enable)
#pragma wave trace(disable)
To see, what the Wave driver generates while expanding a simple macro,
I suggest, that you try to compile the following with 'wave -t test.trace
test.cpp':
#define X(x) x
#define Y() 2
#define CONCAT_(x, y) x ## y
#define CONCAT(x, y) CONCAT_(x, y)
#pragma wave trace(enable)
CONCAT(X(1), Y())
#pragma wave trace(disable)
After executing this command the file test.trace will contain the generated
trace output. The generated output is relatively straightforward to understand,
but you can find a thorough description of the trace output format in the
documentation included with the downloadable file.
The experimental C++0x mode
In order to prepare and support a proposal for the C++ Standards committee,
which will describe certain new and enhanced preprocessor facilities, the
Wave preprocessor library has implemented experimental support for the
following features:
- Variadic macros and placemarker tokens in C++
- Well defined token-pasting
- A macro scoping mechanism
- New alternative preprocessor tokens
Variadic macros and placemarker tokens are known already from the C99
Standard. Its addition to the C++ Standard would help to make C99 and C++ less
different.
Token-pasting of unrelated tokens (i.e. token-pasting resulting in multiple
preprocessing tokens) is currently undefined behaviour for no substantial
reason. It is not dependent on architecture nor is it difficult for an
implementation to diagnose. Furthermore, retokenization is what most, if not
all, preprocessors already do and what most programmers already expect the
preprocessor to do. Well-defined behavior is simply standardizing existing
practice and removing an arbitrary and unnecessary undefined behavior from the
Standard.
One of the major problems of the preprocessor is that macro definitions do
not respect any of the scoping mechanisms of the core language. As history has
shown, this is a major inconvenience and drastically increases the likelihood of
name clashes within a translation unit. The solution is to add both a named and
unnamed scoping mechanism to the C++ preprocessor. This limits the scope of
macro definitions without limiting its accessibility.
The proposed scoping mechanism is implemented with the help of three new
preprocessor directives: #region
, #endregion
and
#import
(note that the actual names for the directives may change
during the standardization process). Additionally it changes minor details of
some of the existing preprocessor directives: #ifdef
,
#ifndef
and the operator defined()
.
To avoid overly detailed descriptions of the new features in this article, a
simple example is provided here (taken from the experimental version of the
preprocessor library written by Paul
Mensonides), which demonstrates the proposed extensions:
# ifndef ::CHAOS_PREPROCESSOR::chaos::WSTRINGIZE_HPP
# region ::CHAOS_PREPROCESSOR::chaos
#
# define WSTRINGIZE_HPP
#
# include <chaos/experimental/cat.hpp>
#
#
#
# define wstringize(...) \
chaos::primitive_wstringize(__VA_ARGS__) \
#
#
#
# define primitive_wstringize(...) \
chaos::primitive_cat(L, #__VA_ARGS__) \
#
# endregion
# endif
# import ::CHAOS_PREPROCESSOR
chaos::wstringize(a,b,c)
The macro scope syntax is resembled after the namespace scoping already known
from the core C++ language. There is a significant difference though. The
#region
and #endregion
directives are opaque for any
macro definition from outside or inside the spanned region, respective. This way
macros defined inside a specific region are visible from outside this region
only, if these are imported (by the #import
directive) or if these
are qualified (as for instance the argument to the #ifndef
directive above).
For more details about the new experimental features please refer to the
documentation included with the downloadable file.
The described features are enabled by the --c++0x
command line
option of the Wave driver. Alternatively you can enable these features by
calling the wave::context<>::set_language()
function with the
wave::support_cpp0x
value.
The command line preprocessor driver
To see, how you may write a full blown preprocessor, you may refer to the
Wave driver sample, included in the downloadable file. This Wave
driver program fully utilizes the capabilities of the library. It is usable as a
preprocessor executable on top of any other C++ compiler. It outputs the textual
representation of the preprocessed tokens generated from a given input file.
This driver program has the following command line syntax:
Usage: wave [options] [@config-file(s)] file:
Options allowed on the command line only:
-h [--help]: print out program usage (this message)
-v [--version]: print the version number
-c [--copyright]: print out the copyright statement
--config-file filepath: specify a config file (alternatively: @filepath)
Options allowed additionally in a config file:
-o [--output] path: specify a file to use for output instead of
stdout
-I [--include] path: specify an additional include directory
-S [--sysinclude] syspath: specify an additional system include directory
-F [--forceinclude] file: force inclusion of the given file
-D [--define] macro[=[value]]: specify a macro to define
-P [--predefine] macro[=[value]]: specify a macro to predefine
-U [--undefine] macro: specify a macro to undefine
-n [--nesting] depth: specify a new maximal include nesting depth
Extended options (allowed everywhere)
-t [--traceto] path: output trace info to a file [path] or to stderr [-]
--timer: output overall elapsed computing time to stderr
--variadics: enable variadics and placemarkers in C++ mode
--c99: enable C99 mode (implies variadics)
--c++0x: enable experimental C++0x support (implies
variadics)
To allow the tracing output, the Wave driver now has a special command
line option -t (--trace), which should be used to specify a file, to which the
generated trace information will be put. If you use a single dash ('-') as the
file name, the output goes to the std::cerr
stream.
There is left one caveat to mention. To use the Wave library or to
compile the Wave driver yourself you will need at least the VC7.1
compiler (the C++ compiler included in the VS.NET 2003 release). Alternatively
you may compile it with a recent version of the gcc compiler (GNU Compiler
Collection) or the Intel V7.0 C++ complier. Sorry, for now no VC6 and no VC7 -
these are to far away from C++ Standard conformance. But I will eventually try
to alter parts of the Wave library to make it compilable with this
compilers too - it depends on your response.
Wave depends on the Boost library (at least V1.30.2) and the Program Options library from Vladimir Prus (at least rev. 160, recently adopted to Boost, but not included yet) , so please be sure to install these libraries, before trying to recompile Wave.
Conclusion
Despite the fact, that the Wave library is quite complex and heaviliy
uses advanced C++ idioms, as templates and template based metaprogramming, it is
farely simple to be used in a broad spectrum of applications. It nicely fits
into well known paradigms used over years by the C++ Standard Template Library
(STL).
The Wave driver program is the only known to me C++ preprocessor,
which
- allows to enable variadics and placemarkers for C++ programs
- exposes facilities to support the debugging of the macro expansion process
- implements experimental C++0x support as macro scoping,which will be
proposed as a C++ Standards addition
therefore it may be an invaluable tool for the development of modern C++
programs.
As recent developments like the Boost Preprocessor Library show [1],
we will see in the future a lot of applications for advanced preprocessor
techniques. But these need a solid base - a Standard conformant preprocessor. As
long as the widely available compilers do not fit into these needs, the
Wave library may fill this gap.
References
- The Boost Library
Preprocessor Subset for C/C++
- Programming
languages - C++ (INCITS/ISO/IEC 14882:1998)
- The Boost
Libraries Documentation
- The Spirit
parser construction framework
- The
Wave C++ preprocessor library
History
03/25/2003 (Wave V0.9.1)
- Initial Version of this article
03/26/2003
- Fixed a broken link in the references section
04/07/2003 (Wave V0.9.2)
- Fixed several typos in the article text
- Added the tracing facility to trace the macro expansion process
- Added the predefined macro
__INCLUDE_LEVEL__
- Added support for the
operator _Pragma()
(C99 and --variadics
mode only)
- Added new command line options to the Wave driver program
- Fixed a couple of bugs (see the ChangeLog file inside the downloadable file)
- Updated the documentation (inside the downloadable file)
05/16/2003 (Wave V0.9.3)
- Added the
_Pragma wave system()
- Added the possibility to pre-include files
- Added the experimental C++0x mode with
- macro scoping support
- well defined token-pasting
- variadics and placemarkers
- __comma__, __lparen__ and __rparen__ alternative pp-tokens
- Fixed a lot of bugs (see the ChangeLog file inside the downloadable file)
05/22/2003
06/04/2003
- Fixed a couple of macro expansion bugs
- Updated the attached source and demo files (see the ChangeLog file inside
the downloadable file)
01/05/2004 (Wave V1.0)
- Added support for #pragma once directive
- Added support for #pragma wave timer() and the --timer command line switch (see the documentation inside the downloadable file)
- Included a finite state machine, which suppresses not needed whitespace. This makes the generated output much more dense.
- Added an optional IDL mode, which besides not recognizing C++ specific tokens doesn't
recognize any keywords (except
true
and false
), but only identifiers.
- Incorporated a couple of changes, which improved the overall performance
- Fixed a lot of bugs (see the ChangeLog file inside the downloadable archive)
- Switched licensing to use the Boost Software License, Version 1.0.