Deferred Logging Using Variadic Templates

Philipp Sch

5.00/5 (8 votes)

22 Sep 2015CPOL8 min read

13.6K

Introduction

Recently I read the excellent article by valdok introducing a logging engine which only logs information when its really needed [http://www.codeproject.com/Articles/64573/Superior-logging-design-Fast-but-comprehensive]. To summarize the main idea shortly, a mechanism is provided where you can add Checkpoints including string formatting(!) and both, string formatting and writing to the log will only be done if it is needed. If no error occurs, the log information will neither be formatted nor be written. This is a great advantage because string formatting and writing is very expensive. Additionally, the log file will be inflated when writing all log entries with far too much information which is not needed at all for the debug process. Therefore a fast logging system is provided but you have to pay the price that many features are missing which would require complex operations. Please refer to valdok's article when you are interested in all the details and for a discussion of the idea.

However, I tried to implement the code and ran into the problem that a cast from int* to va_list does not compile using unix x64; gcc/4.9.2. I have to admit that I did not tried hard to solve that problem directly because I rather wanted to use Variadic Templates which were introduced in C++11. The code I present here uses no casts at all and compiles using gcc/4.9.2 on a unix x64 machine and using VC++ 2015 on Windows. Additionally, the code has one advantage because any object (class / struct) can be used for logging in the default C++ manner by providing a suitable operator<<.

Implementation

All delayed logging information is contained in so called Checkpoints. Each thread has its own Stack of checkpoints entirely build on the stack (no allocation or deallocation needed at all). This Checkpoint stack (CS) is build by defining the interface ICheckpoint which which will automatically register and unregister Checkpoints from the CS.

C++

struct ICheckpoint
{
    inline ICheckpoint()
    {
        m_pNext = s_pTop;
        s_pTop = this;
    }

    inline ~ICheckpoint()
    {
        // As all Checkpoints reside on the stack, stack unwinding guarantees that all elements are deleted in the correct order!
        // Consequently, the assert should only fire when something was done wrong.
        assert(this == s_pTop);
        s_pTop = m_pNext;
    }

    // First stack entry
    // the keyword thread_local ensures that each thread has its own stack automatically as the pointer is stored in the TLS (thread local storage)
    thread_local static ICheckpoint* s_pTop;

    // Next Checkpoint after this one in the Checkpoint Stack
    ICheckpoint* m_pNext;

    // Formats and writes the Checkpoint to file
    // NOT thread safe
    virtual void print(std::ostream& out) = 0;
};

In C++11 the keyword thread_local was introduced so that one is not longer depending on platform specific implementations like __declspec(thread) (MSVC)
The purely virtual method print instructs the Checkpoint to format the given expression and write everything to the ostream out. For me this it is sufficient to just use an ostream but you can easily use your own streams.
The next step is to implement the Checkpoints defining the CCheckpoint class. It safes all variables of the unformatted expression as references. They will only be used if the user decides to write the checkpoints to file.

C++

template<typename... Arguments>
class CCheckpoint : public ICheckpoint
{
public:
    // Just copy all arguments to the m_args tuple.
    inline CCheckpoint(const Arguments&... args)
        : m_args(std::forward<const Arguments&>(args)...)
    {
    }

    // NOT Thread safe!
    virtual void print(std::ostream& out) override
    {
        out << "\t";
        // Recursive expansion of the std::tuple
        detail<sizeof...(Arguments) == 0, sizeof...(Arguments), 0, std::tuple<const Arguments&...> >::recursivePrint(out, m_args);
    }

private:
    // Here the references to all Arguments are safed
    std::tuple<const Arguments&...> m_args;
};

// helper function to create CCheckpoint which deduces the template parameters automatically
template<typename... Arguments>
inline CCheckpoint<Arguments...> Checkpoint(const Arguments&... args)
{
    return CCheckpoint<Arguments...>(std::forward<const Arguments&>(args)...);
}

To illustrate how the variadic templates work, here is an example, of how the constructor of CCheckpoint works:

C++

// First example, unfortunately classes do not automatically deduce template parameters, therefore here all template parameters have to be specified explicitly
CCheckpoint< ... > ("Text, number, decimals ", __LINE__, " ", 0.56f, " ", 0.123);

// Second example, here the template parameters are deduced automatically because Checkpoint is a function
auto chkp = Checkpoint("Text, number, decimals ", __LINE__, " ", 0.56f, " ", 0.123);

The constructor just copies references of all given arguments to the std::tuple. As it is derived from ICheckpoint the templated CCheckpoint is automatically registered on the CS. The print function formats the string and writes everything to file on request.

It is very important to understand, that CCheckpoint just saves references and no deep copies. Therefore all objects handed over the CCheckpoint have to be alive as long as the CCheckpoint is destroyed, i.e. something like

C++

{
    int* p = new int(10);
    auto chkp = Checkpoint(*p);
    delete p; p = nullptr;
    PrintCheckpoints();        // Function will be explained below.
}

will crash, because the integer value is not copied and the pointer is already deleted when the Checkpoints shall be written. However, as long as you only use stack objects you don't have to bother at all.

To format the string m_args is expanded recursively and each element is written to the stream individually. Let's take a look on the recursivePrint method:

C++

// NOT Thread safe!
template<bool Last, size_t NTotal, size_t N, typename Tuple>
struct detail
{
    static inline void recursivePrint(std::ostream& out, Tuple& t)
    {
        out << std::get<N>(t);
        detail<N == NTotal - 1, NTotal, N + 1, Tuple>::recursivePrint(out, t);
    }
};
// Specialization which stops the recursion
// NOT Thread safe!
template<size_t NTotal, size_t N, typename Tuple>
struct detail<true, NTotal, N, Tuple>
{
    static inline void recursivePrint(std::ostream& out, Tuple& t)
    {
        out << std::endl;
    }
};

recursivePrint resides in the detail structure which takes four template parameters: Last (true, if all values in m_args have been written, false otherwise), NTotal (Total number of elements in the tuple), N (element index of the element which shall be written) and Tuple. The Tuple parameter is for convinience because we don't want to take care of all types of all arguments on our one but rather let the compiler do that job.
The second definition of recursivePrint explicitly requests that bool Last is true. It just stops the recursion and it adds a newline to the ostream. Note, that template specialization is only allowed for classes and structures and not for functions therefore it is necessary to define static member functions and not just global recursivePrint functions. The code will also not compile when no specialized version of recursivePrint is provided because the compiler knows that for N = NTotal an out of range situation is created when calling std::get<T>(t);
At this point I will also point out another advantage of this implementation: You can pass any type to Checkpoint, the code will compile as long as you provide a matching operator<< , so the code is not limited to strings, ints and decimals.
recursivePrint is first called from the CCheckpoint::print method with the command

C++

detail<sizeof...(Arguments) == 0, sizeof...(Arguments), 0, std::tuple<const Arguments&...> >::recursivePrint(out, m_args);

I will shortly discuss what is happening here. The sizeof...(Arguments) operator returns the number of template parameters specified in Arguments. If no template arguments are given at all, the operator returns 0 and the specialized version of recursivePrint will be called immediately. NTotal is also determined by the sizeof... operator. Importantly, the array will be expanded going from 0 to NTotal to reproduce correct ordering in the output log file.
Within recursivePrint, the next recursion is called by

C++

detail<N == NTotal - 1, NTotal, N + 1, Tuple>::recursivePrint(out, t);

and just incrementing N.

How are all Checkpoints how written to file? This job is done by the PrintCheckpoints method:

C++

void PrintCheckpoints()
{
    for(ICheckpoint* pCP = ICheckpoint::s_pTop; pCP != nullptr; pCP = pCP->m_pNext)
        pCP->print(*g_pErrorOut.get());
}

Here one very important aspect has to be mentioned: ICheckpoint::s_pTop is declared to be thread_local. This is nice because each thread automatically has its own CS. However, to write checkpoints to the log the method must be called by the same thread whose checkpoints shall be written and not by any other thread!

Finally, I show you the CHECKPOINT Macro used to create Checkpoints in your code. Why do we need a Macro? Because the Checkpoint has to live on the stack meaning that you have to create a variable name on your own for each checkpoint. The macro, however, takes care of that. The reason I want to show you the definition is that the macro uses the C++11 feature of a variadic define corresponding to variadic templates

C++

#define CHECKPOINT(...) \
    auto RAND_IDENTIFIER(chkpt) = logging::Checkpoint(__VA_ARGS__);

where __VA_ARGS__ just forwards all arguments given to the macro to the Checkpoint. RAND_IDENTIFIER creates an unique name for the new checkpoint. I took this Macro directly from valdok's article.

Additionally, I define two more macros PUTLOG(...) and PUTLOG_AND_PRINT_CHECKPOINTS(...) which immediately format and write the given expression. Additionally PUTLOG_AND_PRINT_CHECKPOINTS will also format and write all Checkpoints in the CS of the calling thread.

Remarks

Multithreading
In principle all functions presented here are thread safe and therefore no mutexes are needed at all. However, when one single std::ofstream is used to write all checkpoints from all threads, the calls to ofstream are of course not thread safe. But this is of course an implementation detail. I decided for my purposes, that is it sufficient to write all information to one file and therefore implemented thread synchronization in all my macros to guarantee that no messages are mixed up with other threads. If you prefer to use a different target for your logging info (e.g. a different stream per thread) then no thread synchronization is needed at all.
Wide chars
Using wchar_t instead of char to format error output is easily possible because everything just depends on templates and all you need is a proper operator<<. However, in the above implementation all ocurrances of std::ostream have to be replaced with std::wostream to handle wchar_t properly. Therefore the logging machinery does not care if you hand over a wchar_t or a char. What matters is if the provided target stream can handle it.
Exceptions
As already mentioned, the Checkpoint stack resides on the stack and not on the heap. Consequently a call to PrintCheckpoints() will not work inside the catch block because after an exeption has been thrown the stack is already destroyed in the catch handler. Instead PrintCheckpoints() has to be called before stack unwinding occurs. On Windows you may use the SEH exception model (please refer again to the article of valdok for an introduction to SEH exceptions). What can be done when just the STL shall be used? First, the STL provides signal handlers which are called for Segmentation Faults, Illegal instructions and Floating point exceptions before stack unwinding occurs. Within the handlers PrintCheckpoints() can be called, the Checkpoints are written and the program exits. For user-defined exceptions PrintCheckpoints() must be called in the constructor of the exception and the Checkpoints will be written.
Unfortunately this does not work for all not user-defined exceptions thrown. E.g. if std::vector::at() throws out_of_range stack unwinding will take place. However, using try and catch cleverly can circumvent this problem.

Using the code

Using the log is very simple: You just have to know the three macros PUTLOG, PUTLOG_AND_PRINT_CHECKPOINTS and CHECKPOINT. The file test.cpp contains a small example which maybe gives you some ideas of how to use the three macros in your own exceptions, in c++ signal handlers and own debug objects. I would suggest to think about if std::ostream is suitable as a target destination for your log output in regarding your special problem (c.f. Remarks).

Conclusion

I hope I could explain to you how deferred logging can be implemented using C++11 and variadic templates. I think that the basic idea by valdok is very useful because no unnecessary information is written to the log reducing the work of the programmers a lot because you can be sure that each information written into the log really has a meaning. Additionally it keeps the CPU load at a minimum because no allocations are needed and no formatting or writing is done until its necessary or explicitly requested.
Let me also add one comment to the templates: The whole machinery looks quite complicated and therefore not really fast but note that all templates are evaluated at compile time and not at runtime, i.e. the compiler has to deal with it.

Questions, improvements or comments are highly appreciated.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)