Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

A service to manage and create logfiles

0.00/5 (No votes)
6 Jan 2004 2  
A service, and client code, to manage and create logfiles

Introduction

In a previous life I worked with a team of people developing server software. Our products consisted of a set of frameworks and a bunch of plugin DLL's. It was never easy, during the QA debugging loop, to know if the thread of control was in the framework exe file or if it had passed to a DLL. It was important to know where it was because that would tell us who to go and talk to.

Our processing took a job in some form and applied one of many possible transforms to it. Each transform was the responsibility of one developer, and the decisions on exactly which transform to apply was the responsibility of another. It made sense to try and log the progress of a job through the entire process, capturing, along the way, the reasons why a particular path was taken. That way we would be able to isolate a transformation problem to a specific module.

I took upon myself the task of writing a logging system that would achieve the following goals.

  1. Survive a crash of the application(s) being logged.
  2. Minimise performance impacts on the application(s) being logged. This was especially important because the logging system wasn't going to be a 'debug only' deal. Release versions of our software would also do logging.
  3. Log, on a per module basis, things that each developer thought was relevant.
  4. Provide the ability to filter the log file so that the stuff you want to see stands out from the crowd.
  5. Be able to 'age' a logfile.
  6. Be an easy retrofit to existing code.
It didn't take long to realise that a service would be the easiest way to achieve goals 1 and 2. The problem was that guaranteeing a message would be logged without impacting the performance of the application being logged meant that using guaranteed 'flush to disk' was out of the question if the logging code was part of the application. You either let the file system cache buffer your output, in which case you risk losing output if your application crashes, or you wait around while the operating system flushes your output to the log file. Running the logging side of the application as a separate process neatly sidesteps that conflict. After all, you don't really care if it takes a couple of seconds for a log record to get to the logfile, as long as it gets there even if your application has crashed.

Goals 3 and 4 were achieved by standardising the log file format. It's amazing how log file formats vary. I chose a CSV formatted text file. Each entry in the log file looks like this:

timestamp,sequence,numericcode,threadID,modulename,servername,messagestring
Commas in the messagestring portion don't separate fields. I think it's important to stress that the logfile format is a function of the code doing the logging. The service itself merely buffers log strings and writes them to permanent storage. It places no interpretation upon the data it handles.

You can open the logfile using notepad and expect to be able to read it. Part 2 of this series, however, will present a viewer capable of displaying and filtering, in real time, the logfile. The viewer is smart enough to cope with embedded linefeeds/newlines in the output. This makes it easy to include error messages from, for example, ODBC drivers, without having to do any massaging of the data, an important consideration if you don't want to slow down your application.

Goal 5 was achieved by adding a thread to the service that performs automatic rollover of logfile names and cleanup of older log files.

Goal 6? I think I achieved it. You be the judge.

Why not use the Windows Event Log?

I decided against using the Windows Event Log primarily because (as pointed out by John M. Drescher below) a debugging session using the log system can potentially generate hundreds of megabytes of log data, far in excess of the amounts of data the Event Log was designed to handle. Additionally, it's important to remember that the Windows Event Log is a shared resource, shared by all software including the Operating System itself. I think it's important to segregate system log data from application log data, particularly when a large part of that data may be debugging data.

System structure

The logging system consists of three parts. The first part is the logging service itself, covered in this article.

The second part is the client side code that gets compiled into the application(s) being logged, also covered in this article.

The third part is the log file viewer which is covered in part 2 of this series.

The logging service

When the logger service starts it immediately spawns a listener thread which opens a named pipe called \\.\pipe\logger. A client connects to the pipe and sends a command string. createlog,myproduct. The service responds by checking to see if it already has a logfile of this name under it's control; if so it creates a new named pipe instance called \\.\pipe\myproduct.log and connects the pipe to the logfile. If, however, the service doesn't yet have a logfile of this name it creates a CLogInstance object, creates a named pipe and connects the two. The ClogInstance object in turn creates a disk file called myproduct.log and copies anything written to the named pipe to the file.

The listener thread manages all the connections and log files through the CPipeData object. Newly created CLogInstance objects are maintained in a map keyed by the logfile name and newly created CServerPipeInstance objects are stored in an array. Every CServerPipeInstance object is connected to an existing CLogInstance object. There is only ever one CPipeData instance within the service.

The CPipeData object

class CPipeData : public CObject
{
    DECLARE_DYNAMIC(CPipeData);
public:
                CPipeData();
    virtual     ~CPipeData();

    void        AddPipe(CServerPipeInstance *pPipe);
    void        RemovePipe(DWORD wIndex);
    void        RotatePipes(DWORD wIndex);

    CServerPipeInstance *GetPipe(DWORD wIndex)
        { return (CServerPipeInstance *) m_pipeArray.GetAt(wIndex); }

    DWORD        GetCount() const
        { return DWORD(m_handleArray.GetSize()); }

    DWORD        Wait()
        { return DWORD(WaitForMultipleObjects(m_handleArray.GetSize(),
                           m_handleArray.GetData(), FALSE, INFINITE));
        }

private:
    CObArray        m_pipeArray;
    CArray<HANDLE, HANDLE>m_handleArray;
};
The important things here are the m_pipeArray, m_handleArray member variables and the Wait() function. The thread starts out by creating a named pipe, using overlapped I/O, that listens for connection requests. As connection requests arrive they are parsed into the request verb (currently only createlog is supported) and the logfile name. The thread then searches the m_logMap looking for a matching logfile name. If it doesn't find a match it creates a new CLogInstance object and adds it to the log map. Then it creates a matching CServerPipeInstance object and connects it to the CLogInstance object (either preexisting or newly created).

As each CServerPipeInstance object is created the event handle associated with the named pipe (m_op.hEvent in the CServerPipeInstance object) is added to an array of event handles which is used in a WaitForMultipleObjects() call. The CPipeData object also maintains a parallel array containing a pointer to each CPipeInstance object. There's a 1 to 1 correspondence between the handle array and the pipe array.

When the listener thread enters the WaitForMultipleObjects() call the thread waits until data is written to one of the pipes it's monitoring. When that happens the event handle is signalled by the system and the WaitForMultipleObjects() call returns a DWORD that specifies the array index for the event handle that was signalled. We know which handle was signalled therefore we know which pipe to operate on. The listener thread reads the data from that pipe and writes it to the head of the log text message queue managed by the CLogInstance object pointed at by the pipe.

There are actually two more handles in the handle array than there are log pipes. The handle at index 0 is an event handle used to signal the thread when the service wants to shut down. See my article at One use for overlapped I/O[^] for a discussion of this. The handle at index 1 is an event handle used by the command named pipe \\.\pipe\logger. Handles 2 to n correspond to named pipes opened in response to a createlog command sent down the command pipe.

Thus the handle array being monitored by the listening thread will always contain at least two handles, up to a limit of MAXIMUM_WAIT_OBJECTS which, at least according to my copy of the Platform SDK, is defined as 64. This means that up to 62 modules can be logging through the same server instance.

The ClogInstance Object

class CLogInstance : public CObject
{
public:
                CLogInstance();
                ~CLogInstance();

    BOOL        Create(LPCTSTR szBaseName);
    void        QueueMessage(LPCTSTR szLogEntry);

private:
    static void LogProcessorThread(LPVOID data);
    static void LogAgingThread(LPVOID data);

    CStringList m_messageQueue;
    CMutex      m_mutex;
};
Each CLogInstance object runs a thread to monitor the log text queue and write the contents to the logfile on permanent storage.

There's one CLogInstance object per log file and zero to possibly many CServerPipeInstance objects pointing to the one CLogInstance object.

The logfile is opened in a share mode that allows any other process to read or write to the file. For maximum performance we want to keep the file open all the time but we need to provide read access to other processes so that we can actually view the contents of the log file while the service is running. Write access? Sometimes it's useful to be able to delete the contents of the file without stopping the service.

Be aware that, because of log file aging, once we create a CLogInstance object we never delete it even if all pipes connected to the object are closed. This has just one implication. As long as the service is running you won't be able to delete the log file until log file aging has kicked in.

The CServerPipeInstance object

class CServerPipeInstance : public CObject
{
public:
                CServerPipeInstance();
                ~CServerPipeInstance();

protected:
    CLogInstance *m_pOwner;

private:
    HANDLE      m_hPipe;
    OVERLAPPED  m_op;
    DWORD       m_dwRead;
    CString     m_csBaseName;
};
The important stuff here is the pointer to a CLogInstance object which is the owner of this pipe, and the OVERLAPPED structure. Each OVERLAPPED instance contains an event handle which is used to monitor activity on that named pipe. The event handle is added to the m_handleArray member of the CPipeData object.

Each named pipe runs in message mode, which guarantees that a read won't return data until an entire string of data corresponding to a single write has been returned. If the pipes were running in byte mode the results written to the log file would be a string of single bytes chosen seemingly at random from all the pipes being monitored, which is not the desired result.

Once the data has been added to the queue the queue monitoring thread is signalled. It takes messages off the head of the queue and writes them to permanent storage. Permanent storage has the same basename, myproduct.log, stored in whichever directory you want. The source code presented here for the service places them in a directory called logger which lives under your %systemroot% directory.

When the client application closes it's end of the pipe the same pipe event handle is signalled, causing the wait to terminate. The code checks the result for the signalled handle using the GetOverlappedResult() and if the operation has failed it uses GetLastError() to determine what action to take. If the pipe was broken it is removed from the pipe array, it's handle is removed from the handle array and the pipe itself is deleted.

The application goes it's merry way doing whatever it has to do and shoving log messages down the pipe. The other end of the pipe takes messages off as fast as it can and queues them up so that another thread can take care of actually writing them to disk whenever it gets the chance. You want to clear the pipe as fast as possible otherwise the client will stall until there's space on the pipe for a new message. You also want to give the application priority to the disk whereever possible. To an extent the use of file flushing on each write militates against this goal but we're balancing performance impacts on the application being logged against ensuring that messages do get logged!

A gotcha in WaitForMultipleObjects()

There's a hidden gotcha in the way WaitForMultipleObjects() works. As part of my testing I wrote a quick and dirty client which opened four pipes to the same logfile and then wrote 1000 strings to each pipe in turn. Ie, string to pipe1, string to pipe2, etc and wrap around to pipe1 when pipe 4 has been written. I expected to see the strings in the logfile in the same order but no, I found a block of dozens of strings for pipe1 followed by a block of dozens of strings to pipe2 etc. Upon investigation I found that MSDN does indeed document that if multiple event handles are signalled simultaneously the WaitForMultipleObjects() call returns the index for the lowest numbered signalled event handle. Actually that makes perfect sense. So I added a function, RotatePipes(), to take a handle which has just been serviced out of it's place in the handle array and move it to the end, with the same change in ordering applying to the CServerPipeInstance array, thus making pipe servicing act in a round-robin fashion. Only pipes created by the command pipe (pipes 2 to n) are shuffled in this fashion.

In a real world scenario the round robin code might not be necessary (after all, the test program was doing nothing but write messages as fast as possible, but that's the point of stress testing - to flush out situations like these).

The CLogMap class

class CLogMap : public CMapStringToOb  
{
public:
                    CLogMap();
    virtual         ~CLogMap();

    CLogInstance    *GetLog(LPCTSTR szLogFileName);

private:
    static void     LogAgerThreadProc(LPVOID data);
    void	        AgeLogFiles();

    HANDLE          m_agingThreadHandle;
};
This class is used to maintain a map of CLogInstance objects keyed on the logfile name. It also contains the thread (LogAgerThreadProc()) used to age logfiles. The GetLog() method returns a CLogInstance object pointer which either corresponds to a pre-existing object or a new one if we didn't already have this logfile under our control.

Log file aging

When the aging thread starts it calculates how many seconds remain until local midnight and then sleeps for that length of time. When it awakens it blocks the queue monitor thread, closes the logfile, renames it to myproduct.yyyymmdd.log and creates a new file for continued logging. It also enumerates existing aged log files and deletes files older than 7 days.

The myproduct.yyyymmdd.log format was chosen so that aged logfiles will sort in directory listings in age order.

I hardcoded 7 days into the source code presented here. In a real world use of the logging system you would probably want to be able to specify how long to keep log files but I think the presentation of how you do that is outside the scope of this article.

Log file aging gotchas

I ran into two gotchas. One is a direct result of my decision to calculate the time from now to local midnight and wait that period of time. The code looks something like this.
while (bStop == FALSE)
{
    switch (WaitForSingleObject(hStopEvent, CalculateTimeTillMidnight())
    {
    case WAIT_OBJECT_0:
        bStop = TRUE;
        break;

    case WAIT_TIMEOUT:
        //	Do logfile aging...

        break;
    }
}
This is the most efficient way to code the waiting period. The thread simply doesn't run unless it's signalled to stop or until the timeout occurs. This is a good thing. Code that doesn't run consumes no CPU cycles and indeed might even be paged out of physical memory into the pagefile after a reasonable period of time. But there's a small problem here. If you want to test log file aging by moving the system clock forward to just before midnight you're not going to see aged log files suddenly appear. The timeout is calculated at the start of the wait and, once started, makes no reference to the system time. If you want to test log file aging this way it's necessary to change the system time, then stop and restart the service.

The other gotcha is in the way Windows handles renaming a file and creating a new file with the name the renamed file had. Ie, renaming myproduct.log to myproduct.yymmdd.log and immediately creating a new file called myproduct.log. Windows will pull file attribute information INCLUDING creation date from the file cache. Thus, the newly created log file will have the creation date of the original file even though it was created up to 24 hours later. I have to admit this behaviour puzzles me but there it is... This is documented in MSDN as one of the notes to the CreateFile() API.

Thus we do some trickery to set the creation date/time for the newly created logfile.

Server side filtering

Doesn't happen. Period. The service will log anything written to the log pipe. It copies verbatim anything written to a pipe it created to the logfile attached to that pipe.

The client side code

This consists of a single class, CLogger. You instantiate an instance of the class as a global object in each application or module which is to perform logging. Then, somewhere early in your code, you initialise the object with the module name, the name of the logfile and possibly the server name (the name of the machine on which the log service is running). Then each time you want to log something you call the Log() method. The Log() method takes a printf() style formatting string and a variable number of arguments.
//  At global scope

CLogger gblLogger;

//  Early in your program, maybe in DLLMain or InitInstance

gblLogger.Initialise(3, _T("MyProductModule1"), _T("MyProduct.log"));

//  And then, wherever in your code you want to log something...

gblLog.Log(_T("Format string", arg1, arg2, argn);
If you prefer you can write a bunch of operator overrides on the CLogger object to allow
gblLog.Log << data1 << data2;
but personally I don't like that way of writing to objects so I don't do it that way.

The Clogger object instance needs to be initialised. My design envisages (but does not require) the object being global so the constructor takes no parameters and merely initialises member variables to safe values.

Initalising your CLogger instance

void Initialise(
        DWORD dwLogLevel,
        LPCTSTR szModuleName,
        LPCTSTR szLogFileName,
        LPCTSTR szServerName = _T(".")
);

The first parameter tells the object the logging level desired. Typically you'd pull this out of the registry but to do that you need a way to put the value in the registry in the first place. As with setting the retention period for aged log files I consider this to be outside the scope of this article.

The second parameter identifies the module that's producing messages using this particular CLogger object instance. If your application consists of an exe file plus a bunch of DLL's it's really helpful to log exactly which module created the log message.

The third parameter specifies the name of the logfile. It's a matter of personal taste whether each module logs to it's own file or if all parts of the application log to the one file. Personal experience is that it's better to log to the one file provided you have a way to filter messages. The LogFileViewer I present in the second part of this series provides that functionality.

An optional fourth parameter specifies a machine name to which log messages should be directed. This allows you to log multiple instances of your application(s) running on different machines to a central logging server. If not specified this fourth parameter defaults to . (dot) which is an alias for the same machine the application is running on.

Each message logged includes the name of the machine on which the module was running and the threadID for the thread calling the Log() function, which makes it easy to distinguish otherwise apparently identical messages.

If you examine the code for the Initialise() function you'll find a 20 ms sleep (two timeslices on a single processor NT machine and at least 1 timeslice on a multi cpu machine). Don't be tempted to remove it. Each time you call Initialise you're writing to a named pipe in another process. That other process needs time to do the things you've asked it to do. Thus, your process needs to suspend itself to give the service a chance at running before you attempt to write data onto the log. If your application doesn't suspend itself you run the risk that you'll attempt to write to a log named pipe instance that hasn't yet been created.

Why use a named pipe?

I chose to use named pipes as the communication mechanism between client code and the service because I wanted to be able to write to the same logfile from more than one module in my applications. But I also didn't want to have to change the API between modules to include a log method. Would you want to extend the parameter list of every function exposed from a DLL so that you could pass a pointer to a logging object? I thought not. By using named pipes I could add logging functionality to any module without changing the external interfaces to that module.

Log levels

As noted earlier, the CLogger class expects a log level parameter. This provides the application with a way of pre-filtering which messages will be logged. Opinions vary about this. On the one hand, it's good to log everything, particularly if there's a filtering method available when reviewing the logfile. On the other hand, it's easy for developers to get carried away and log everything, which will impact both application/system performance and create awesomely large log files. (We developers at the cutting edge have hundreds of gigabytes of storage available but in my experience production systems lag behind developer systems by orders of magnitude).

I don't want to have to create special builds with extra logging. Hence the logging level.

  • Level 0 - error codes in the range of 0 to 999
  • Level 1 - warning codes in the range of 1000 to 1999
  • Level 2 - success codes in the range of 2000 to 2999
  • Level 3 - information codes in the range of 3000 to 3999
  • and so on with ever rising levels of information codes
I chose this set of definitions due to a philosophy about what should be logged. I always want to be able to see error codes, so the CLogger class will always report level 0 messages. I probably want to see warnings unless I'm really tight for disk space. Success codes? Some applications need them (for example a system which creates auditable documents would probably want to log successes as well as failures). Information? That's there for we developers, Q/A people.and those occasions when product support need more detailed information. Your customers may not like being asked to turn on level 17 logging and send you a CD containing a 500 meg log file but if it gets their support issue resolved in days instead of weeks they'll appreciate it.

So the philosophy is to write code to log everything and then decide, at run time, not to log certain things. Compiling the calls (and the format strings) into your end product makes the binary deliverables larger which is probably forgivable. What won't be forgiven is if all of this halves the performance of your application. The Log() function is written to do the minimal amount of work necessary to determine if the message is even going to be logged.

void CLogger::Log(DWORD dwErrorCode, LPCTSTR szFormatString, ...)
{
    if (dwErrorCode / 1000 <= m_dwLogLevel)
    {
        ASSERT(szFormatString);
        ASSERT(AfxIsValidString(szFormatString));

        va_list args;

        va_start(args, szFormatString);
        Log(dwErrorCode, szFormatString, args);
        va_end(args);
    }
}
That's a pretty minimal amount of work for cases where the message level is greater than the log level. Ie, when the message is going to be discarded.

What happens if the service isn't running?

The client side code waits 100 ms (defined as the constant PIPE_WAIT_TIME) for a pipe to become available. If a pipe doesn't become available in time the the code retries up to 10 times (defined as the constant PIPE_RETRY_COUNT) and eventually gives up. This means that you will incur an overhead of up to 1 second per module being logged, incurred on the first Log() call. Doesn't sound like much but if you're using 62 modules that's a minute elapsed. I considered adding code to start the service if it wasn't running but decided not to. Such code would only be useful if the application was running under an account with administration privileges.

These numbers, 100 ms and 10 times respectively, are based on gut feel. If you're logging to a very heavily loaded machine or across a heavily loaded network you may need to increase the numbers.

We try 10 times with a timeout of 100ms per try rather than try once with a timeout of 1 second because each retry tries to establish if the pipe even exists. If the pipe doesn't exist the code sends a pipe creation request to the command pipe and tries again. If our service is being overwhelmed with messages from various parts of our application this gives it a chance to catch up and create a new communications conduit.

If you don't intend to include the logging service as part of your product deliverable but do want to use it for development be sure to include some mechanism (such as #ifdef _DEBUG) to exclude the logging code from release builds.

Other classes

There are three utility classes included in this project. The first is the CSleeper class. It's sole purpose in life is to create an event handle that will never be signalled and wait a specified amount of time on that handle. I wrote this class to avoid possible deadlocks. I can't anticipate the kinds of applications that the CLogger class will find itself in. If it's compiled into an application that uses DDE the Sleep() API becomes potentially unsafe. See the MSDN documentation on Sleep().

The second utility class is the CSecurityToken class. This simply creates a SECURITY_ATTRIBUTES structure initialised so that any system object created using it grants access to anyone. Dangerous? Maybe, if your code does not otherwise check permissions or if your code can do dangerous things.

The final utility class is the CClientPipeInstance class which represents the client side of the named pipe.

Tested environments

I wrote this code on a 2 CPU Windows 2000 machine (dual Athlon 2000+) and tested it (obviously) on the development machine, and a single CPU Windows 2000 machine. Since it's multi-threaded it's important it be tested on a machine with 2 physical CPUs. I've also tested logging from one machine to another.

Using the code in your applications

The service itself was written using the VC6 service wizard. Thus it supports self registration if you launch it once using
logger /service
This registers it as a service in the Windows Service Control manager.

Client code needs to include the following files.

sleeper.h
sleeper.cpp
SecurityToken.h
SecurityToken.cpp
clientpipeinstance.h
clientpipeinstance.cpp
logger.h
logger.cpp
These files appear in the loggerclient_src.zip download.

Installing the service

If you build the service on the same machine you're running it on there's no need to install the service, the build process automatically registers it with the service control manager. On other machines you need to do this from a command prompt
logger /service
which will register the service as a manual start service. Since the client code doesn't attempt to start the service if it isn't running you need to either open the services control panel applet and make the service automatic (and reboot or start it) or modify this line
    SC_HANDLE hService = ::CreateService(
        hSCM, m_szServiceName, m_szServiceName,
        SERVICE_ALL_ACCESS, SERVICE_WIN32_OWN_PROCESS,
        SERVICE_DEMAND_START, SERVICE_ERROR_NORMAL,
        szFilePath, NULL, NULL, _T("RPCSS\0"), NULL, NULL);
in the CServiceModule::Install() function to this
    SC_HANDLE hService = ::CreateService(
        hSCM, m_szServiceName, m_szServiceName,
        SERVICE_ALL_ACCESS, SERVICE_WIN32_OWN_PROCESS,
        SERVICE_AUTO_START, SERVICE_ERROR_NORMAL,
        szFilePath, NULL, NULL, _T("RPCSS\0"), NULL, NULL);

Uninstalling the service

To uninstall the service do this from a command prompt.
logger /unregserver

Revision History

  • November 16, 2003, Initial version.

  • November 18, 2003. Incorporated some suggestions by Forogar[^] (see topic thread below).

    Added some missing code in the logfile aging method. As initially presented the creation date for each log file was not being set, thus log file aging would not in fact delete old log files.

    Refactored some of the code and created a new class, CLogMap to host the CLogInstance objects and perform log file aging.

  • November 24, 2003. Found and fixed a bug in the CLogInstance::LogProcessorThread() function. As previously presented it didn't correctly handle cases where some other process (ie, the Log File Viewer in progress) deleted the contents of the file. In the new version, just prior to writing a message to the log file, we do a seek to the current end of the log file.

    I changed the way I generated a timestamp to use the GetLocalTime() API rather than CTime. I have no idea how I managed to miss this API. Fortunately CodeProjects Feature Article showed this[^] article at just the right time. Thanks to RK_2000[^]. The new timestamp at least pretends to present millisecond timing.

  • November 29, 2003. Added some explanatory remarks about why I used named pipes as the communications mechanism between the service and client code. Also added an explanation of why the client code tries 10 times to establish a connection before giving up in disgust.

  • December 9, 2004. Updated the downloads.

  • January 7, 2004. Added sections describing how to make the service auto start, and how to uninistall it.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here