Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / C++

Extremely Efficient Type-safe printf Library

4.85/5 (22 votes)
17 Aug 2011CPOL17 min read 108.5K   596  
Introduces a fast and type-safe parameter rendering library (type-safe printf)

Introduction

The idea of a type-safe string rendering and parameter substitution library for C++ has a long history. Historically, the C language always provided a flexible printf function which, combined with the ability to pass a variable number of parameters to a function, gave a nice solution to the problem. But there lies a problem for a developer: there is completely no check for the validity of a number and the types of passed parameters and, in the case of the sprintf function, completely no output buffer boundary checks.

C++ has introduced a stream input/output library, which has solved type safety and parameter count problems, but introduced at least two new ones:

  1. printf's perfect separation of presentation and data is now gone. Now you need to mix static text data, formatting data, and data itself. This also adds problems to localization where the order of string's static parts and parameters change between different languages.
  2. Almost every implementation of the string input/output library has very bad performance, even compared to the C printf.

There are several type-safe printf libraries existing for C++ today. Most of them successfully solve the first introduced problem, but do nothing in solving the second, because they are implemented on top of the stream input/output library.

This library was designed with the following in mind:

  1. Introduce as little overhead as possible
  2. Provide rich rendering capabilities
  3. Be completely type-safe
  4. Provide output container boundary checking, when required

The current library’s feature list is:

  1. The library has a modular design – the user can implement and replace the formatting module and output modules.
  2. Compatibility with the printf syntax was not in the list of requirements, so another, extended format is supported by default. If required, the standard printf syntax may be implemented as a pluggable module.
  3. The library is template based and does not currently perform any virtual dispatch.
  4. The library is free of memory allocations. The only memory allocation is performed during format string "compilation".
  5. The library has its own very effective implementations of integer and floating-point renderers. The only exception is (currently) date/time rendering, which is performed using a call to the CRT function.

Known restrictions and limitations:

  1. Only very limited support for locales is provided. Locale-specific data is used to print the decimal point and thousand separator. A system default locale is used and is cached on the first use.
  2. Floating point rendering does not support scientific notation.

Requirements

The library requires the following features from your compiler:

  1. Support for auto keyword
  2. Support for decltype keyword
  3. Full support for r-value references and perfect forwarding
  4. Full support for TR1 STL extensions

The library has been developed and tested on Microsoft Visual C++ 2010, but should be portable. The only part of the library that uses types declared in windows.h (FILETIME) is guarded by the following check:

C++
#if defined(_WIN32) and defined(_FILETIME_) ... #endif

The C++0x standard is not completed at the time of development, so the code may break in future compilers or compilers that implement a later revision than Visual C++ 2010 SP1, on which the library was developed.

Extended Format Description

Library format string syntax is not compatible with the standard printf syntax. Instead, it has a different syntax.

The format string has blocks of plain text which are directly copied to the output and parameter placeholders. Each placeholder has the following syntax:

{<param-index>[width-decl][alignment-decl][plus-decl]
  [precision-decl][base-decl][padding-decl][ellipsis-decl]
  [char-decl][locale-decl]}

The placeholder must be enclosed in curly braces. If you need to use the opening curly brace in the text, you need to duplicate it to distinguish from the placeholder beginning. There is no need to escape the closing brace, it will always be parsed correctly.

Parameter declaration starts with a parameter’s number. This is the only mandatory field. Parameters are ordered starting from zero. All subsequent declarations are optional. If several declarations are used, their order is not significant and there must be no space or any other separator between them.

width-decl

Use this declaration to limit the minimum and/or maximum length of a rendered parameter, in characters. The syntax of a declaration is one of:

  • w<min-width>,<max-width>
  • w<min-width>
  • w,<max-width>

Both min-width and max-width must be decimal integers and if specified, max-width must be larger than min-width.

alignment-decl

Use this declaration to set parameter alignment. It is ignored unless width-decl is also used. Use one of:

  • al – align left (default)
  • ar – align right
  • ac – align center

plus-decl

Forces the plus sign to be rendered for positive numbers. Syntax:

  • +

precision-decl

Use the declaration to specify the number of digits to be displayed after the comma. Used only for floating-point types. If not specified, the default one (BELT_TSP_DEFAULT_FP_PRECISION = 6) is used. You may overwrite it by defining BELT_TSP_DEFAULT_FP_PRECISION before including the library’s headers.

  • p<number>

base-decl

Specify a base for an integer. If any base besides 10 is used with the floating-point type, only the integer part is rendered. Only bases of 2, 8, 10, and 16 are supported. Lowercase or uppercase hexadecimal may be specified:

  • b2 – binary
  • b8 – octal
  • b10 – decimal (default)
  • b[0]16[x] – lowercase hexadecimal. If prefix "0" is used, library adds "0x" before the number
  • b[0]16X – uppercase hexadecimal. If prefix "0" is used, library adds "0X" before the number

padding-decl

Set the character to fill the space when min-width is set (see the width-decl above). The default one is space.

  • f<character>

ellipsis-decl

Add the ellipsis sign when truncating output. It is not compatible with center alignment (will act as left alignment). Note that a single UNICODE character (U+2026) is used for ellipsis.

  • e

char-decl

Treat the passed char or wchar_t parameter as a character and not an integer.

  • c

locale-decl

Separate thousands with the default user locale's thousand separator. Will work only for base 10.

  • l

time-decl

Interpret the passed parameter as a date, time, or date+time, and display according to format-string. The format is the same as in the CRT strftime function.

  • t(format-string)

Using the Library

This library is header only; you don’t have to link to any binary in order to use it. First, include the main library’s header:

C++
#include <printf/printf.h> 

All the library’s identifiers are declared in the namespace ts_printf. All examples later in this document assume the following using namespace directive:

C++
using namespace ts_printf;

The next step for you is to include the file corresponding to the output adaptor you are going to use. Currently, the library supports three output adaptors. You may use any number of adaptors:

C++
// std::basic_string adaptor
#include <printf/basic_string_adaptor.h>
// Output iterator adaptor
#include <printf/output_iterator_adaptor.h>
// C-style character array and std::array<char_type> adaptor
#include <printf/array_adaptor.h>

We will discuss all the available adaptors and their options later.

The following pseudo-code shows you how you use the library to convert the format string and a number of parameters to character array and send it to the output adaptor:

(ret-type | void) printf(format-object,
            parameter-object[, adaptor-specific-parameters]);

ret-type and adaptor-specific-parameters depend on the output adaptor of your choice and will be described below.

Format Object

The library introduces the format object which stores the “compiled” format string. The reason for this to be separate from rendering is that once compiled, the object may be used multiple times with different sets of parameters.

Note: It is safe to use the same format object from different threads. No synchronization is required or performed by the library. The printf function always takes the format object by constant reference.

You construct the format object with a library’s format function. It accepts different kinds of format strings:

Compile-time Constant Character Array

C++
format(L"File size: {0}");
const char fmt[] = "File size: {0}";
format(fmt);

For the compile-time constant character array, the library saves a call to strlen, as it is able to know the size of the format string at compile time. Note that in this case, the format string must fill the whole array and must not contain any symbols you would not want in the output, like ‘\0’.

If you still have your format string in a compile-time constant character array, but want the library to call strlen for it, cast the format string to (const char_type *) before passing it to the format function.

Pointer to Constant null-terminated String

C++
const wchar_t *fmt = GetFormatString();
format(fmt);

The library calls the strlen function to get the length of the string and stores pointers to the beginning and end of the passed format string in the returned format object.

You must make sure the memory pointed lives longer than the returned format object.

Note: Passing a pointer to a non-const format string is prohibited.

Constant Reference to std::basic_string

C++
std::wstring fmt(L"File size: {0}");
format(fmt);

The library internally stores a reference to the passed std::basic_string. You must make sure the format string lives longer than the returned format object.

Note: Passing a non-const l-value reference is prohibited.

Temporary std::basic_string

C++
std::wstring get_fmt()
{
    return L"File size: {0}";
}
format(get_fmt());

The library will “move” or “take ownership” of the passed object and internally store it in the format object. In short, it is safe to create a format object from a temporary std::basic_string.

Constant Reference to Boost Range of Characters

C++
std::wstring fmt(L"##File size: {0}##");
auto subrange = boost::make_iterator_range(fmt, 2, -2);
format(subrange);

The same considerations as in “constant reference to std::basic_string” apply.

Temporary Boost Range of Characters

C++
std::wstring fmt(L"##File size: {0}##");
format(boost::make_iterator_range(fmt, 2, -2));

The same considerations as in “temporary std::basic_string” apply.

Important note: The type of the returned format object will be different for each type of format string parameter. If you are going to store the returned object for a long time, use the auto or decltype keywords.

C++
class A
{
    decltype(format(L"")) MyFormat;
    decltype(format(std::wstring())) MySecondFormat;

public:
    A(std::wstring &&fmt)
    {
        MyFormat = format(L"{0}");
        MySecondFormat = format(std::move(fmt));
    }
};

The format string is always passed as a first parameter to the printf function. As you may see, the format function, which is used to construct the format object, always accepts a single parameter. So, for simplicity, you can omit calling the format function when you are only constructing the temporary format object to be passed to printf. The following two lines are equivalent:

C++
printf(format(L"File size: {0}"), ...);
printf(L"File size: {0}", ...);

If you omit the call to the format function, printf will call it for you. There is no performance penalty in this. There is a penalty, however, if you are reconstructing the same format object several times. So, instead of writing:

C++
printf(L"size = {0}", params(100));
printf(L"size = {0}", params(101));
printf(L"size = {0}", params(102));
printf(L"size = {0}", params(103));

use:

C++
auto fmt = format(L"size = {0}");

printf(fmt, params(100));
printf(fmt, params(101));
printf(fmt, params(102));
printf(fmt, params(103));

If an incorrect format string is specified, the library throws the bad_extended_format_string exception.

Parameter Object

The second parameter to the printf function is a parameter object. You construct a parameter object with a call to a helper function params. The helper function receives a variable number of parameters and internally stores a constant reference to each of the passed parameters. Nevertheless, you can specify immediate values; it is smart enough to take copies of them.

Always remember that a parameter object stores a constant reference to a parameter. Make sure a parameter lives longer than a parameter object.

The maximum number of parameters is controlled using the BELT_TSP_MAX_PARAMS preprocessor constant, which defaults to 10.

The parameter object has an operator(), which you can use to add more parameters to it. It has two overloads. The first overload allows you to add a single parameter to an existing parameter object:

C++
auto p1 = params(100);
auto p2 = p1(200);

printf(fmt, p2);

The second overload allows you to join two parameter objects together:

C++
auto p1 = params(100);
auto p2 = params(200, 300);
auto p3 = p1(p2);

printf(L"{0}", p3);

Supported Parameter Types

  • All integer, character, and floating-point values or constant references to them.
  • C++
    int p1 = 10;
    ... params(100, 4.5, 't', p1) ...
  • Compile-time character arrays. For compile-time character arrays, the library does not call the strlen function, and takes the length of the string from the array’s size at compile time. If your string occupies only part of the array and is terminated with a zero character, cast the array to const char_type *.
  • C++
    ... params(L"first string", L"second string")...
  • Pointers to constant zero-terminated strings. The library calls the strlen function to determine the length of the string.
  • C++
    const char *val = "first string";
    ... params(val)...
  • Constant references to std::basic_string objects. If you are passing temporaries, make sure they live long enough!
  • C++
    std::wstring str1(L"first string");
    ... params(str1, std::wstring(L"second string")) ...
  • Constant references to boost character ranges.
  • C++
    std::wstring str1(L"first string");
    ... params(boost::make_iterator_range(str1, 2, 0)) ...
  • Constant reference to the std::tm structure.
  • ts_printf:time_t value or constant reference to it (constructed from std::time_t using helper function time).
  • Constant reference to a FILETIME structure (Windows only).
  • ts_printf::no_value constant. Passing this constant tells the library to ignore this parameter. If the corresponding placeholder is found in the format string, nothing is rendered.

Other parameter types are not supported.

Output Adaptors

The library supports the concept of output adaptors. An output adaptor is a component that receives rendered output and forwards it to the output. Currently, the library has three different adaptors, which we describe below.

Output Iterator Adaptor

This adaptor is the most generic one. Use this adaptor if other library supplied adaptors do not suit your needs and you don’t want to create your own. This adaptor does not provide any boundary checking. In order to use this adaptor, add the following include to your project:

C++
#include <printf/output_iterator_adaptor.h> 

The library provides the following overload of the printf function:

C++
template<class OutputIterator,other_unspecified_arguments>
unspecified_return_type printf<policy>
	(format_object, params_object, OutputIterator begin);

where:

  • format_object
  • Constant reference to a format object. You can also directly pass the value to construct a format object, which will be constructed automatically.

  • params_object
  • Parameter object. Use the helper function params to construct it.

  • begin
  • Output iterator pointing to the beginning of the output sequence.

  • policy
  • Optional return policy. If omitted (among with angle brackets), the function returns the updated output iterator, pointing to the location just after the last written character. If you pass the return_range policy, the function returns the boost range of the sequence it created.

The function returns either the output iterator or the boost iterator range, depending on the policy. This function does not perform any boundary checking. You must make sure you have enough storage available before calling it.

Basic String Adaptor

This adaptor allows you to receive the result of the printf function as the standard string object, or append the rendered character stream to the existing string object. In order to use this adaptor, add the following include to your project:

C++
#include <printf/basic_string_adaptor.h>

It provides two overloads of the printf function. The first overload renders the result stream, creates a string object, and returns it:

C++
template<unspecified_arguments>
std::basic_string<unspecified_arguments>
           printf(format_object, params_object);

Where:

  • format_object
  • Constant reference to a format object. You can also directly pass the value to construct a format object, which will be constructed automatically.

  • params_object
  • Parameter object. Use the helper function params to construct it.

The returned string has the same character type as in the format_object and default allocator.

The second overload lets you append data to the existing string:

C++
template<unspecified_arguments>
std::basic_string<unspecified_arguments> &(format_object,
    params_object, std::basic_string<unspecified_arguments> &result);

where:

  • format_object
  • Constant reference to a format object. You can also directly pass the value to construct a format object, which will be constructed automatically.

  • params_object
  • Parameter object. Use the helper function params to construct it.

  • result
  • A reference to a string to which output should be appended.

The function returns the reference to the same string passed in the last parameter.

Character Array Adaptor

Using this adaptor instructs the library to write the output to the given character array with optional boundary checking. It works with standard C-style character arrays as well as objects of class std::array<char_type,N>. In order to use this adaptor, add the following include to your project:

C++
#include <printf/array_adaptor.h>

Policies

Each overload of the printf function accepts an optional policy. You use the macro BELT_TSP_CAA_POL to construct the policy. This macro accepts two parameters, the first for overflow control policy and the second for return policy. Combining these two parameters, you can adjust the behavior of the function.

Use one of the following keywords for the overflow policy:

  • truncate
  • If output reaches the end of the array, it is silently discarded. The output will be truncated in this case.

  • throw
  • If output reaches the end of the array, an exception of type index_out_of_bounds_exception is thrown.

  • ignore
  • No boundary checking is performed (no code is even generated for that). Use only if you are absolutely sure that the output will not reach the end of the array.

  • debug
  • If output reaches the end of the array, a debug assertion is thrown. This policy is available only if the _DEBUG preprocessor constant is defined.

Use one of the following keywords to specify the return policy:

  • iterator
  • printf will return the iterator pointing right after the last written character.

  • range
  • printf will return the boost range that stores the beginning and end of the written sequence.

If you do not specify a policy, a default one is used. You can overwrite the default policy by defining the BELT_TSP_DEFAULT_CAA_POLICY preprocessor constant before including the adaptor's header. If not overridden, for debug builds, the policy is debug for overflow and iterator for return and for release builds – ignore for overflow and iterator for return.

This is a pseudo-code that shows you the prototype of all printf overloads offered by this adaptor:

C++
template<class policy,unspecified_arguments>
unspecified_return_type printf(format_object,
       params_object, array_reference, begin_iterator);

Where:

  • policy
  • Optional policy object (see above). If you are using the default policy, omit this parameter (with angle brackets).

  • format_object
  • Constant reference to a format object. You can also directly pass the value to construct a format object, which will be constructed automatically.

  • params_object
  • Parameter object. Use the helper function params to construct it.

  • array_reference
  • A reference to the (mutable) C-style character array or std::array object.

  • begin_iterator
  • An optional iterator pointing to the location (inside the passed array) from which to start writing output. You may skip this parameter, in which case the rendered output will be written starting from the beginning of the array.

The function returns either the iterator pointing to the location right after the last written character or the boost iterator range, depending on the requested return policy.

Extending the Library: Writing Your Own Output Adaptor

The library makes it very easy to create your own output adaptor and use it during rendering. First, you need to add the following directive into your code:

C++
#include <printf/details/basic_adaptor.h>

You then need to put your output adaptor into the ts_printf namespace and derive your adaptor class from the ts_printf::_details::basic_adaptor class.

Your output adaptor class has to implement two method functions:

C++
template<class Range>
void write(Range &&range);

template<class char_type>
void write(char_type ch,size_t count);

The library passes the boost character range to the first overload. You need to copy the range to your output. The second overload receives a single character and count. You need to write the character count times into your output.

The last step would be to create a free function in the ts_printf namespace. It should receive a formatting object, a parameter object, and whatever you need to instantiate your output adaptor. It is recommended not to name your free function to printf, otherwise you may get compilation errors. You may still name your function printf if you do not plan to use other output adaptors. As a bonus, you will get automatic format object construction.

In the body of this free function, you will construct an instance of your output iterator and pass it to the render function, exposed by a format object.

See the following implementation example:

C++
template<class Format, class ParamsHolder, class OutputIterator>
void printf(const Format &format, const ParamsHolder &params, OutputIterator it)
{
    _details::output_iterator_adaptor<OutputIterator> adaptor(it);
    format.render(params, adaptor);
}

User-Defined Parameter Object

Starting from version 3, the library supports user-defined parameter object. Standard parameter object, described above, is compile-time static. It "knows" the type of each parameter at compile time. This is a frequent operation, but sometimes you have a list of values to display and you do not know the type of each value at compile time. This may be COM VARIANT, boost::variant or boost::any, or any user-defined variant type.

To be able to use such values with printf library, pass a special object as a second parameter to any of printf overloads (instead of standard parameter object). By default, the library expects the operator () to be defined in the following way:

C++
struct UserDefinedParameterObject
{
    // ...
    template<class Helper>
    void operator ()(size_t parameter_index, const Helper &h) const
    {
        // determine the type of the object 
        // (need to convert runtime type to compile-time type):
        // for example, using switch:
        switch (get_parameter_type(parameter_index))
        {
            case tpInteger:
                h(convert_parameter_to_integer(parameter_index));	// calls h(int)
                break;
            case tpFloatingPoint:
                h(convert_parameter_to_floating_point(parameter_index));	// calls h(double)
                break;
        }
    }
};

Once the user-defined type determined the type of the specified parameter, it should call an operator () of the passed helper object with a value converted to one of the library-supported types (see above).

The library also allows you to customize the way it uses your helper object. If it is uncomfortable for you to define operator () this way, you may override a ts_printf::_details::render_user_parameter_helper function:

C++
namespace ts_printf { namespace _details {
template<class Params, class Helper>
inline void render_user_parameter_helper
	(const Params &ms, size_t index, const Helper &helper)
{
	// implement custom dispatching
}
} }

Where:

  • params - constant reference to your parameter object
  • index - index of the parameter to render
  • helper - constant reference to helper object (as described above)

Performance Tests

I was asked to provide some performance comparisons in the comments. So, the sources were updated to include a perftest project that rendered one integer, one double and one string using:

  1. sprintf to character array on stack
  2. ts_printf to the same array
  3. ts_printf to basic_string, constructed on each iteration
  4. ts_printf to the character array (as in case 2), but this time format object is not cached and re-created on each iteration
  5. using ostringstream to construct a string

Each test is run 1 million times and the average iteration time is computed. A test application is compiled as native 64-bit with full optimization settings, provided by Visual C++ 2010 SP1 and run on two computers (both under Windows 7 64-bit). Note: A test is a single-threaded application, so multiple cores are not used, raw processor speed matters more. Two methods of execution time measurement implemented; the results below are received using performance counters. Another measurer gives similar numbers.

Core2 Quad Q9400 2.66GHz

sprintf:0.8865 microseconds / iteration
ts_printf to array:0.1777 microseconds / iteration
ts_printf to std::string:0.4293 microseconds / iteration
ts_printf to array (no fmt cache):0.3912 microseconds / iteration
std::ostringstream:3.3936 microseconds / iteration

Core i5 750 2.66GHz

sprintf:0,7017 microseconds / iteration
ts_printf to array:0,1283 microseconds / iteration
ts_printf to std::string:0,3371 microseconds / iteration
ts_printf to array (no fmt cache):0,2710 microseconds / iteration
std::ostringstream:2,2100 microseconds / iteration

Change History

08/16/2011 - Version 4
  • Fixed non-compile bug in timedata_renderer.h (::time_t instead of std::time_t).
08/15/2011 - Version 3
  • Several bug fixes, including possible incorrect code generation with full optimization
  • Does not generate level-4 warnings
  • User-defined parameter object support added
  • Automatic "0x" or "0X" prefix for hexadecimal numbers support added
02/24/2011 - Version 2
  • Several bug fixes

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)