Introduction
Recently I read the excellent article by valdok introducing a logging engine which only logs information when its really needed [http://www.codeproject.com/Articles/64573/Superior-logging-design-Fast-but-comprehensive]. To summarize the main idea shortly, a mechanism is provided where you can add Checkpoints including string formatting(!) and both, string formatting and writing to the log will only be done if it is needed. If no error occurs, the log information will neither be formatted nor be written. This is a great advantage because string formatting and writing is very expensive. Additionally, the log file will be inflated when writing all log entries with far too much information which is not needed at all for the debug process. Therefore a fast logging system is provided but you have to pay the price that many features are missing which would require complex operations. Please refer to valdok's article when you are interested in all the details and for a discussion of the idea.
However, I tried to implement the code and ran into the problem that a cast from int*
to va_list
does not compile using unix x64; gcc/4.9.2. I have to admit that I did not tried hard to solve that problem directly because I rather wanted to use Variadic Templates which were introduced in C++11. The code I present here uses no casts at all and compiles using gcc/4.9.2 on a unix x64 machine and using VC++ 2015 on Windows. Additionally, the code has one advantage because any object (class / struct) can be used for logging in the default C++ manner by providing a suitable operator<<.
Implementation
All delayed logging information is contained in so called Checkpoints. Each thread has its own Stack of checkpoints entirely build on the stack (no allocation or deallocation needed at all). This Checkpoint stack (CS) is build by defining the interface ICheckpoint which which will automatically register and unregister Checkpoints from the CS.
struct ICheckpoint
{
inline ICheckpoint()
{
m_pNext = s_pTop;
s_pTop = this;
}
inline ~ICheckpoint()
{
assert(this == s_pTop);
s_pTop = m_pNext;
}
thread_local static ICheckpoint* s_pTop;
ICheckpoint* m_pNext;
virtual void print(std::ostream& out) = 0;
};
In C++11 the keyword thread_local
was introduced so that one is not longer depending on platform specific implementations like __declspec(thread)
(MSVC)
The purely virtual method print instructs the Checkpoint to format the given expression and write everything to the ostream out. For me this it is sufficient to just use an ostream but you can easily use your own streams.
The next step is to implement the Checkpoints defining the CCheckpoint
class. It safes all variables of the unformatted expression as references. They will only be used if the user decides to write the checkpoints to file.
template<typename... Arguments>
class CCheckpoint : public ICheckpoint
{
public:
inline CCheckpoint(const Arguments&... args)
: m_args(std::forward<const Arguments&>(args)...)
{
}
virtual void print(std::ostream& out) override
{
out << "\t";
detail<sizeof...(Arguments) == 0, sizeof...(Arguments), 0, std::tuple<const Arguments&...> >::recursivePrint(out, m_args);
}
private:
std::tuple<const Arguments&...> m_args;
};
template<typename... Arguments>
inline CCheckpoint<Arguments...> Checkpoint(const Arguments&... args)
{
return CCheckpoint<Arguments...>(std::forward<const Arguments&>(args)...);
}
To illustrate how the variadic templates work, here is an example, of how the constructor of CCheckpoint
works:
CCheckpoint< ... > ("Text, number, decimals ", __LINE__, " ", 0.56f, " ", 0.123);
auto chkp = Checkpoint("Text, number, decimals ", __LINE__, " ", 0.56f, " ", 0.123);
The constructor just copies references of all given arguments to the std::tuple
. As it is derived from ICheckpoint
the templated CCheckpoint
is automatically registered on the CS. The print function formats the string and writes everything to file on request.
It is very important to understand, that CCheckpoint just saves references and no deep copies. Therefore all objects handed over the CCheckpoint
have to be alive as long as the CCheckpoint
is destroyed, i.e. something like
{
int* p = new int(10);
auto chkp = Checkpoint(*p);
delete p; p = nullptr;
PrintCheckpoints(); }
will crash, because the integer value is not copied and the pointer is already deleted when the Checkpoints shall be written. However, as long as you only use stack objects you don't have to bother at all.
To format the string m_args
is expanded recursively and each element is written to the stream individually. Let's take a look on the recursivePrint
method:
template<bool Last, size_t NTotal, size_t N, typename Tuple>
struct detail
{
static inline void recursivePrint(std::ostream& out, Tuple& t)
{
out << std::get<N>(t);
detail<N == NTotal - 1, NTotal, N + 1, Tuple>::recursivePrint(out, t);
}
};
template<size_t NTotal, size_t N, typename Tuple>
struct detail<true, NTotal, N, Tuple>
{
static inline void recursivePrint(std::ostream& out, Tuple& t)
{
out << std::endl;
}
};
recursivePrint
resides in the detail structure which takes four template parameters: Last
(true, if all values in m_args
have been written, false otherwise), NTotal
(Total number of elements in the tuple), N
(element index of the element which shall be written) and Tuple
. The Tuple
parameter
is for convinience because we don't want to take care of all types of all arguments on our one but rather let the compiler do that job.
The second definition of recursivePrint explicitly requests that bool Last
is true. It just stops the recursion and it adds a newline to the ostream. Note, that template specialization is only allowed for classes and structures and not for functions therefore it is necessary to define static member functions and not just global recursivePrint functions. The code will also not compile when no specialized version of recursivePrint is provided because the compiler knows that for N = NTotal
an out of range situation is created when calling std::get<T>(t);
At this point I will also point out another advantage of this implementation: You can pass any type to Checkpoint
, the code will compile as long as you provide a matching operator<<
, so the code is not limited to strings, ints and decimals.
recursivePrint
is first called from the CCheckpoint::print
method with the command
detail<sizeof...(Arguments) == 0, sizeof...(Arguments), 0, std::tuple<const Arguments&...> >::recursivePrint(out, m_args);
I will shortly discuss what is happening here. The sizeof...(Arguments)
operator returns the number of template parameters specified in Arguments. If no template arguments are given at all, the operator returns 0 and the specialized version of recursivePrint
will be called immediately. NTotal
is also determined by the sizeof...
operator. Importantly, the array will be expanded going from 0 to NTotal
to reproduce correct ordering in the output log file.
Within recursivePrint
, the next recursion is called by
detail<N == NTotal - 1, NTotal, N + 1, Tuple>::recursivePrint(out, t);
and just incrementing N
.
How are all Checkpoints how written to file? This job is done by the PrintCheckpoints
method:
void PrintCheckpoints()
{
for(ICheckpoint* pCP = ICheckpoint::s_pTop; pCP != nullptr; pCP = pCP->m_pNext)
pCP->print(*g_pErrorOut.get());
}
Here one very important aspect has to be mentioned: ICheckpoint::s_pTop
is declared to be thread_local. This is nice because each thread automatically has its own CS. However, to write checkpoints to the log the method must be called by the same thread whose checkpoints shall be written and not by any other thread!
Finally, I show you the CHECKPOINT Macro used to create Checkpoints in your code. Why do we need a Macro? Because the Checkpoint has to live on the stack meaning that you have to create a variable name on your own for each checkpoint. The macro, however, takes care of that. The reason I want to show you the definition is that the macro uses the C++11 feature of a variadic define corresponding to variadic templates
#define CHECKPOINT(...) \
auto RAND_IDENTIFIER(chkpt) = logging::Checkpoint(__VA_ARGS__);
where __VA_ARGS__
just forwards all arguments given to the macro to the Checkpoint. RAND_IDENTIFIER
creates an unique name for the new checkpoint. I took this Macro directly from valdok's article.
Additionally, I define two more macros PUTLOG(...)
and PUTLOG_AND_PRINT_CHECKPOINTS(...)
which immediately format and write the given expression. Additionally PUTLOG_AND_PRINT_CHECKPOINTS
will also format and write all Checkpoints in the CS of the calling thread.
Remarks
- Multithreading
In principle all functions presented here are thread safe and therefore no mutexes are needed at all. However, when one single std::ofstream
is used to write all checkpoints from all threads, the calls to ofstream
are of course not thread safe. But this is of course an implementation detail. I decided for my purposes, that is it sufficient to write all information to one file and therefore implemented thread synchronization in all my macros to guarantee that no messages are mixed up with other threads. If you prefer to use a different target for your logging info (e.g. a different stream per thread) then no thread synchronization is needed at all. - Wide chars
Using wchar_t
instead of char
to format error output is easily possible because everything just depends on templates and all you need is a proper operator<<
. However, in the above implementation all ocurrances of std::ostream
have to be replaced with std::wostream to handle wchar_t
properly. Therefore the logging machinery does not care if you hand over a wchar_t
or a char.
What matters is if the provided target stream can handle it. -
Exceptions
As already mentioned, the Checkpoint stack resides on the stack and not on the heap. Consequently a call to PrintCheckpoints()
will not work inside the catch block because after an exeption has been thrown the stack is already destroyed in the catch handler. Instead PrintCheckpoints()
has to be called before stack unwinding occurs. On Windows you may use the SEH exception model (please refer again to the article of valdok for an introduction to SEH exceptions). What can be done when just the STL shall be used? First, the STL provides signal handlers which are called for Segmentation Faults, Illegal instructions and Floating point exceptions before stack unwinding occurs. Within the handlers PrintCheckpoints()
can be called, the Checkpoints are written and the program exits. For user-defined exceptions PrintCheckpoints()
must be called in the constructor of the exception and the Checkpoints will be written.
Unfortunately this does not work for all not user-defined exceptions thrown. E.g. if std::vector::at()
throws out_of_range
stack unwinding will take place. However, using try and catch cleverly can circumvent this problem.
Using the code
Using the log is very simple: You just have to know the three macros PUTLOG
, PUTLOG_AND_PRINT_CHECKPOINTS
and CHECKPOINT
. The file test.cpp contains a small example which maybe gives you some ideas of how to use the three macros in your own exceptions, in c++ signal handlers and own debug objects. I would suggest to think about if std::ostream is suitable as a target destination for your log output in regarding your special problem (c.f. Remarks).
Conclusion
I hope I could explain to you how deferred logging can be implemented using C++11 and variadic templates. I think that the basic idea by valdok is very useful because no unnecessary information is written to the log reducing the work of the programmers a lot because you can be sure that each information written into the log really has a meaning. Additionally it keeps the CPU load at a minimum because no allocations are needed and no formatting or writing is done until its necessary or explicitly requested.
Let me also add one comment to the templates: The whole machinery looks quite complicated and therefore not really fast but note that all templates are evaluated at compile time and not at runtime, i.e. the compiler has to deal with it.
Questions, improvements or comments are highly appreciated.