(untagged)

Wave: a Standard conformant C++ preprocessor library

Hartmut Kaiser

0.00/5 (No votes)

10 Jan 2004

Describes a free and fully Standard conformant C++ preprocessor library

Sample Image - wave_preprocessor.jpg

Introduction

Ever wanted to have your own C/C++ preprocessor? Or maybe you are curious about how this invisible everyday helper of your toolbox works? If yes, you may want to read further. If no - before hitting the 'back' button of your browser consider to learn something new and read further too :-).

The C++ preprocessor is a macro processor that under normal circumstances is used automatically by your C++ compiler to transform your program before the actual compilation. It is called a macro processor because it allows you to define macros, which are brief abbreviations for longer constructs. The C++ preprocessor provides four separate facilities that you can use as you see fit:

Inclusion of header files
Macro expansion
Conditional compilation
Line control

These features are greatly underestimated today, even more, the preprocessor has been frowned on for so long that its usage just hasn't been effectively pushed until the Boost preprocessor library [1] came into being a few years ago. Only today we begin to understand, that preprocessor generative metaprogramming combined with template metaprogramming in C++ is by far one of the most powerful compile-time reflection/metaprogramming facilities that any language has ever supported.

The C++ Standard [2] was adopted back in 1998, but there is still no (known to me) C++ compiler, which has a bug free implementation of the rather simple preprocessor requirements mandated therein. This may be a result of the mentioned underestimation or even banning of the preprocessor from good programming style during the last few years or may stem from the somewhat awkward standardized dialect of English used to describe it.

So the Wave preprocessor library is an attempt to:

Provide a free, fully Standard conformant and (hopefully) bugfree implementation of the mandated preprocessor functionality
Make maximal usage of the C++ STL and/or Boost [3] libraries (for compactness and maintainability)
Achieve straightforward extendibility for the implementation of additional features
Build a flexible library for different C++ lexing and preprocessing needs.

To simplify the parsing task of the input stream (which is most of the time, but not restricted to, a file) the Spirit parser construction library [4] is used.

Background

The Wave C++ preprocessor is not a monolithic application, it's rather a modular library, which exposes mainly a context object and an iterator interface. The context object helps to configure the actual preprocessing process (as search path's, predefined macros, etc.). The exposed iterators are generated by this context object too. Iterating over the sequence defined by these two iterators will return the preprocessed tokens, which are to be built on the fly from the given input stream.

The C++ preprocessor iterator itself is feeded by a C++ lexer iterator, which implements an unified interface. By the way, the C++ lexers contained within the Wave library may be used standalone too and are not tied to the C++ preprocessor iterator at all. As a lexer I'll understand a piece of code, which combines several consecutive characters in the input stream into a stream of objects (called tokens) more suitable for subsequent parsing. These tokens carry around not only the information about the matched character sequence, but additionally the position in the input stream, where a particular token was found. In other words the lexer removes all this so-needed-by-human garbage like spaces, newlines, etc. (i.e. performs some lexical transformation) leaving the structural transformation for parser.

To make the Wave C++ preprocessing library modular, the C++ lexer is held completely separate and independent from the preprocessor. To proof this concept, there are two different C++ lexers implemented and contained within the library by now, which are functionally completely identical. The C++ lexers expose the mentioned unified interface, so that the C++ preprocessor iterator may be used with both of them. The abstraction of the C++ lexer from the C++ preprocessor iterator library was done to allow to plug in different other C++ lexers too, without the need to re-implement the preprocessor. This will allow for benchmarking and specific finetuning of the process of preprocessing itself.

During the last weeks Wave got another field of application: testing the usability and applicability of different Standards proposals. A new C++0x mode was implemented, which allows to try out and help to establish some ideas, which are designed to overcome some of the known limitations of the C++ preprocessor.

Using the code

The actual preprocessing is a highly configurable process, so obviously you have to define a couple of parameters to control this process, such as:

Include search paths, which define, where to search for files to be included with #include <...> and #include "..." directives
Which macros to predefine and which of the predefined macros to undefine
Several other options as for instance to control, whether to enable some extensions to the C++ Standard (for instance variadics and placemarkers) or not.

You can access all these processing parameters through the wave::context object. So you have to instantiate at least one object of this type to use the Wave library. For more information about the context template please refer to the class reference as included in the downloadable file or as may be found here. The context object is a template class, for which you have to supply at least two template parameters: the iterator type of the underlying input stream to use and the type of the token to be returned from the preprocessing engine. The type of the used input stream is defined by you, so may the token type, but as a starting point I would recommend to use the token type predefined as the default inside the Wave library - the wave::cpplexer::lex_token<> template class. A full reference of this class you can find inside the downloadable file or here.

The main preprocessing iterators are not to be instantiated directly, but should be generated through this context object too. The following code snippet preprocesses a given input file and outputs the generated text into std::cout.

    // Open the file and read it into a string variable

    std::ifstream instream("input.cpp");
    std::string input(
        std::istreambuf_iterator<char>(instream.rdbuf());
        std::istreambuf_iterator<char>());

    // The template wave::cpplexer::lex_token<> is the default 

    // token type to be used by the Wave library.

    // This token type is one of the central types throughout 

    // the library, because it is a template parameter to many 

    // of the public classes and templates and it is returned 

    // from the iterators itself.

    typedef wave::context<std::string::iterator, 
                wave::cpplexer::lex_token<> >
            context_t;

    // The C++ preprocessor iterators shouldn't be constructed 

    // directly. These are to be generated through a 

    // wave::context<> object. Additionally this wave::context<> 

    // object is to be used to initialize and define different 

    // parameters of the actual preprocessing.

    context_t ctx(input.begin(), input.end(), "input.cpp");
    context_t::iterator_t first = ctx.begin();
    context_t::iterator_t last = ctx.end();

    // The preprocessing of the input stream is done on the fly 

    // behind the scenes during the iteration over the 

    // context_t::iterator_t based stream. 

       while (first != last) {
           std::cout << (*first).get_value();
           ++first;
       }

This sample shows, how the input may be read into a string variable, from where it is fed into the preprocessor. But the parameters to the constructor of the wave::context<> object are not restricted to this type of input stream. It can take a pair of arbitrary iterator types (conceptually at least forward_iterator type iterators) to the input stream, from where the data to be preprocessed should be read. The third parameter supplies a filename, which is subsequently accessible from inside the preprocessed tokens returned from the preprocessing to indicate the token position inside the underlying input stream. Note though, that this filename is used only as long no #include or #line directives are encountered, which in turn will alter the current filename.

The iteration over the preprocessed tokens is relatively straight forward. Just get the starting and the ending iterators from the context object (maybe after initializing some include search paths) and you are done! The dereferencing of the iterator will return the preprocessed tokens, which are generated on the fly from the input stream.

As you may have seen, the complete library resides in a C++ namespace wave. So you have to explicitly specify this while using the different classes. The other way around is certainly to place a using namespace wave; somewhere at the beginning of your source files.

The Wave tracing facility

If you ever had the need to debug a macro expansion you had to discover, that your tools provide only little or no support for this task. For this reason the Wave library got a tracing facility, which allows to get selectively some information about the expansion of a certain macro or several macros.

The tracing of macro expansions generates a possibly huge amount of information, so it is recommended, that you explicitly enable/disable the tracing for the macro in question only. This may be done with the help of a special #pragma:

#pragma wave trace(enable)    // enable the tracing

// the macro expansions here will be traced

// ...

#pragma wave trace(disable)   // disable the tracing

To see, what the Wave driver generates while expanding a simple macro, I suggest, that you try to compile the following with 'wave -t test.trace test.cpp':

// test.cpp

#define X(x)          x
#define Y()           2
#define CONCAT_(x, y) x ## y
#define CONCAT(x, y)  CONCAT_(x, y)
#pragma wave trace(enable)
// this macro expansion is to be traced

CONCAT(X(1), Y())     // should expand to 12

#pragma wave trace(disable)

After executing this command the file test.trace will contain the generated trace output. The generated output is relatively straightforward to understand, but you can find a thorough description of the trace output format in the documentation included with the downloadable file.

The experimental C++0x mode

In order to prepare and support a proposal for the C++ Standards committee, which will describe certain new and enhanced preprocessor facilities, the Wave preprocessor library has implemented experimental support for the following features:

Variadic macros and placemarker tokens in C++
Well defined token-pasting
A macro scoping mechanism
New alternative preprocessor tokens

Variadic macros and placemarker tokens are known already from the C99 Standard. Its addition to the C++ Standard would help to make C99 and C++ less different.

Token-pasting of unrelated tokens (i.e. token-pasting resulting in multiple preprocessing tokens) is currently undefined behaviour for no substantial reason. It is not dependent on architecture nor is it difficult for an implementation to diagnose. Furthermore, retokenization is what most, if not all, preprocessors already do and what most programmers already expect the preprocessor to do. Well-defined behavior is simply standardizing existing practice and removing an arbitrary and unnecessary undefined behavior from the Standard.

One of the major problems of the preprocessor is that macro definitions do not respect any of the scoping mechanisms of the core language. As history has shown, this is a major inconvenience and drastically increases the likelihood of name clashes within a translation unit. The solution is to add both a named and unnamed scoping mechanism to the C++ preprocessor. This limits the scope of macro definitions without limiting its accessibility.

The proposed scoping mechanism is implemented with the help of three new preprocessor directives: #region, #endregion and #import (note that the actual names for the directives may change during the standardization process). Additionally it changes minor details of some of the existing preprocessor directives: #ifdef, #ifndef and the operator defined().

To avoid overly detailed descriptions of the new features in this article, a simple example is provided here (taken from the experimental version of the preprocessor library written by Paul Mensonides), which demonstrates the proposed extensions:

    # ifndef ::CHAOS_PREPROCESSOR::chaos::WSTRINGIZE_HPP
    # region ::CHAOS_PREPROCESSOR::chaos
    #
    # define WSTRINGIZE_HPP
    #
    # include <chaos/experimental/cat.hpp>
    #
    # // wstringize

    #
    # define wstringize(...) \
        chaos::primitive_wstringize(__VA_ARGS__) \
        /**/
    #
    # // primitive_wstringize

    #
    # define primitive_wstringize(...) \
        chaos::primitive_cat(L, #__VA_ARGS__) \
        /**/
    #
    # endregion
    # endif

    # import ::CHAOS_PREPROCESSOR
 
    chaos::wstringize(a,b,c) // expands to: L"a,b,c"

The macro scope syntax is resembled after the namespace scoping already known from the core C++ language. There is a significant difference though. The #region and #endregion directives are opaque for any macro definition from outside or inside the spanned region, respective. This way macros defined inside a specific region are visible from outside this region only, if these are imported (by the #import directive) or if these are qualified (as for instance the argument to the #ifndef directive above).

For more details about the new experimental features please refer to the documentation included with the downloadable file.

The described features are enabled by the --c++0x command line option of the Wave driver. Alternatively you can enable these features by calling the wave::context<>::set_language() function with the wave::support_cpp0x value.

The command line preprocessor driver

To see, how you may write a full blown preprocessor, you may refer to the Wave driver sample, included in the downloadable file. This Wave driver program fully utilizes the capabilities of the library. It is usable as a preprocessor executable on top of any other C++ compiler. It outputs the textual representation of the preprocessed tokens generated from a given input file. This driver program has the following command line syntax:

Usage: wave [options] [@config-file(s)] file:
 
  Options allowed on the command line only:
    -h [--help]:            print out program usage (this message)
    -v [--version]:         print the version number
    -c [--copyright]:       print out the copyright statement
    --config-file filepath: specify a config file (alternatively: @filepath)
 
  Options allowed additionally in a config file:
    -o [--output] path:          specify a file to use for output instead of 
                                 stdout
    -I [--include] path:         specify an additional include directory
    -S [--sysinclude] syspath:   specify an additional system include directory
    -F [--forceinclude] file:    force inclusion of the given file
    -D [--define] macro[=[value]]:    specify a macro to define
    -P [--predefine] macro[=[value]]: specify a macro to predefine
    -U [--undefine] macro:       specify a macro to undefine
    -n [--nesting] depth:        specify a new maximal include nesting depth
    
  Extended options (allowed everywhere)
    -t [--traceto] path:    output trace info to a file [path] or to stderr [-]
    --timer:                output overall elapsed computing time to stderr 
    --variadics:            enable variadics and placemarkers in C++ mode
    --c99:                  enable C99 mode (implies variadics)
    --c++0x:                enable experimental C++0x support (implies 
                            variadics)

To allow the tracing output, the Wave driver now has a special command line option -t (--trace), which should be used to specify a file, to which the generated trace information will be put. If you use a single dash ('-') as the file name, the output goes to the std::cerr stream.

There is left one caveat to mention. To use the Wave library or to compile the Wave driver yourself you will need at least the VC7.1 compiler (the C++ compiler included in the VS.NET 2003 release). Alternatively you may compile it with a recent version of the gcc compiler (GNU Compiler Collection) or the Intel V7.0 C++ complier. Sorry, for now no VC6 and no VC7 - these are to far away from C++ Standard conformance. But I will eventually try to alter parts of the Wave library to make it compilable with this compilers too - it depends on your response.

Wave depends on the Boost library (at least V1.30.2) and the Program Options library from Vladimir Prus (at least rev. 160, recently adopted to Boost, but not included yet) , so please be sure to install these libraries, before trying to recompile Wave.

Conclusion

Despite the fact, that the Wave library is quite complex and heaviliy uses advanced C++ idioms, as templates and template based metaprogramming, it is farely simple to be used in a broad spectrum of applications. It nicely fits into well known paradigms used over years by the C++ Standard Template Library (STL).

The Wave driver program is the only known to me C++ preprocessor, which

allows to enable variadics and placemarkers for C++ programs
exposes facilities to support the debugging of the macro expansion process
implements experimental C++0x support as macro scoping,which will be proposed as a C++ Standards addition

therefore it may be an invaluable tool for the development of modern C++ programs.

As recent developments like the Boost Preprocessor Library show [1], we will see in the future a lot of applications for advanced preprocessor techniques. But these need a solid base - a Standard conformant preprocessor. As long as the widely available compilers do not fit into these needs, the Wave library may fill this gap.

References

History

03/25/2003 (Wave V0.9.1)

Initial Version of this article

03/26/2003

Fixed a broken link in the references section

04/07/2003 (Wave V0.9.2)

Fixed several typos in the article text
Added the tracing facility to trace the macro expansion process
Added the predefined macro __INCLUDE_LEVEL__
Added support for the operator _Pragma() (C99 and --variadics mode only)
Added new command line options to the Wave driver program
Fixed a couple of bugs (see the ChangeLog file inside the downloadable file)
Updated the documentation (inside the downloadable file)

05/16/2003 (Wave V0.9.3)

Added the _Pragma wave system()
Added the possibility to pre-include files
Added the experimental C++0x mode with
- macro scoping support
- well defined token-pasting
- variadics and placemarkers
- __comma__, __lparen__ and __rparen__ alternative pp-tokens
Fixed a lot of bugs (see the ChangeLog file inside the downloadable file)

05/22/2003

Corrected several typos

06/04/2003

Fixed a couple of macro expansion bugs
Updated the attached source and demo files (see the ChangeLog file inside the downloadable file)

01/05/2004 (Wave V1.0)

Added support for #pragma once directive
Added support for #pragma wave timer() and the --timer command line switch (see the documentation inside the downloadable file)
Included a finite state machine, which suppresses not needed whitespace. This makes the generated output much more dense.
Added an optional IDL mode, which besides not recognizing C++ specific tokens doesn't recognize any keywords (except true and false), but only identifiers.
Incorporated a couple of changes, which improved the overall performance
Fixed a lot of bugs (see the ChangeLog file inside the downloadable archive)
Switched licensing to use the Boost Software License, Version 1.0.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here