Introduction
In part 1 of this 2 part series, I presented a service to perform logging on behalf of a multi-module application. In this part, I present a Log File Viewer which provides near real-time display of the log messages processed by the logging system.
The viewer is an MFC MDI application which provides per-view filtering of the logged data. Filtering can be done on the message logging level, the originating module, the originating machine, the originating thread, or any combination of the above.
Basics
Each view in the app corresponds to a logfile. I used the doc/view architecture, but the CDocument
derived class only performs placeholder duty. All the work is done in the CLogFileInstance
class which is a member variable of the view.
When the application starts, it presents a dialog box where you select which logfile you want to view. More on this dialog box anon.
After you choose a log file (or more than one), the MFC Framework creates one CDocument
and one CListView
for each log file. The view in turn creates a CLogFileInstance
instance.
Each CLogFileInstance
creates two threads for current log files and one thread for old log files. An aged log file is a file whose name is of the pattern myproduct.yyyymmdd.log. A current log files name is of the pattern myproduct.log. The first thread, LogMonitorThread
monitors the log file and, when it sees a change, reads the changes and creates one or more CLogData
objects, where each object corresponds to a new log entry. This thread is started for all log files but it terminates itself once it's done reading the log file if the log file is an aged one. The assumption is that aged log files don't change.
Each new CLogData
instance is added to an array of CLogData
instances and a second thread, LogUpdateThread()
, is signalled. That thread notices the new objects added to the CLogData
array and, subject to filtering, adds them to the ClistCtrl
embedded in the view.
The LogUpdateThread
runs for all log files, not just current log files.
The flow of control looks like this:
The Log File
This is created by the logging service. A specific format is assumed, as documented in my previous article.
The LogMonitorThread
This thread runs once per second as specified for the timeout in the WaitForSingleObject()
call. The object it waits on is an event handle used to tell the thread to exit. Most of the time, that object won't be signalled and the significant work is done in the WAIT_TIMEOUT
code. The first time through the loop, the timeout is actually set to zero and the WaitForSingleObject()
call returns immediately. The code in the WAIT_TIMEOUT
handler sets the wait time to 1000 milliseconds for all subsequent passes through the loop.
When the wait timeout occurs, the thread first checks if the file it's monitoring has changed. Changes are simple to detect. Either the logging service has appended something to file or it hasn't. So the first check is, has the length of the file changed? If not, we have nothing to do so we just go back to the top of the loop and wait another second. If the file length changed, we then check if the current file length is less than the length we last knew about. (Remember that the logging service rolls files over at local midnight). If the current length of file is less than the length we last knew about, we clear the CLogData
array and tell our view that it should delete what it's currently displaying and reread the file from the beginning.
If the file length has changed and it's greater than the length we last knew about, then obviously a new log entry has been added. The code goes into a loop reading lines (terminated by an endline (but see below)) and creates new CLogData
objects which are added to the array. When it hits end of file, it goes back to the top of the loop and waits another second, after signalling the m_eSignal
event object, which causes the LogUpdateThread()
to run.
void CLogFileInstance::LogMonitorThread(LPVOID data)
{
{
CLogDataArray *me = (CLogDataArray *) data;
BOOL bStop = FALSE;
CLogData *pLog = (CLogData *) NULL;
int iWait = 0;
ASSERT(me);
ASSERT_KINDOF(CLogDataArray, me);
CString csData;
while (bStop == FALSE)
{
switch (WaitForSingleObject(me->m_eStop, iWait))
{
case WAIT_OBJECT_0:
bStop = TRUE;
break;
case WAIT_TIMEOUT:
iWait = 1000;
if (me->m_dwFilePosition == me->m_updateFile.GetLength())
break;
if (me->m_dwFilePosition > me->m_updateFile.GetLength())
{
me->m_dwFilePosition = 0;
me->m_view->PostMessage(guiAdvise, CLogViewerView::adviseClearList);
break;
}
me->m_updateFile.Seek(me->m_dwFilePosition, CFile::begin);
while (me->m_bContinue && me->m_updateFile.ReadString(csData))
{
if (isdigit(csData[0]))
{
pLog = new CLogData(csData, me->m_logFilterData);
ASSERT(pLog);
ASSERT_KINDOF(CLogData, pLog);
me->Add(pLog);
}
else if (pLog != (CLogData *) NULL)
pLog->AddText(csData);
else
;
me->m_dwFilePosition = me->m_updateFile.GetPosition();
}
SetEvent(me->m_eSignal);
}
if (
bStop == FALSE
&&
CLoadLogFilesDlg::IsAged(me->m_updateFile.GetFilePath())
)
bStop = TRUE;
}
}
_endthread();
}
There are a couple of things of note in this thread procedure. The first is that the inner loop (the one inside the WAIT_TIMEOUT
case) is controlled by a boolean variable, m_bContinue
. This is necessary so that the thread can be told to stop when it's in the midst of reading very large log files. If the view wants to close, it sets m_bContinue
to FALSE
and then waits on the thread handle. The boolean ensures the loop exits as soon as the work for the current line of the log file is done.
The second thing to note is that I wanted the logfile format to be flexible. It's not enough to assume that an endline represents the end of this log record and the next line is the start of a new record. I wanted to be able to log error messages that contained embedded linefeeds (ODBC messages for example) without having to modify the messages in the logging service or the log client code. So we make a simplifying assumption. If a line commences with a digit, we'll assume it's a new record. If not, this line is part of the message text for the preceding record.
Why Not Use FindFirstChangeNotification() or ReadDirectoryChangesW()?
I considered using both these functions to detect changes in the log files. The problem with them is that they only watch for changes in a directory or a tree of directories. If I could call a function that watched just one file, I'd use it. As it is, the NT watch functions notice a change in a directory (admittedly filtered) and leave it up to the user to determine if the change notification is relevant. It's just simpler to open a file and monitor its length periodically. I've run this software for periods of 48 hours plus with no updates and Task Manager shows 0 seconds time accumulated against it. That's enough justification in my mind to opt for the simpler method presented here.
After writing this article, I found myself wondering if the reasons I'd given above were sufficient and then thinking through the consequences of attempting to use either API.
One could use those APIs if one uses a single thread to update all instances of logfiles being monitored. The downside to doing this is that your thread code will have to cope with multiple log files. Well, it does anyway, but spreading the load across multiple threads means that each log file gets a crack at the CPU based on thread scheduling - collapsing this into a single thread means that either you 'round robin' each log file, handling one new record per log file, or you handle each log file serially, which means that one log file will dominate at the expense of the second, which in turn dominates at the expense of the third and so on. Spreading the load across multiple threads means that you get to use the thread scheduler for free.
The alternative to a single thread is pathological. If each log file is monitored on a separate thread and you try to use the FindFirstChangeNotification()
or ReadDirectoryChangesW()
APIs, you have two choices. Do it on a per thread basis, or collapse it into one thread.
Doing it on a per thread basis buys you nothing. One might just as well do it the way I chose, monitoring the length of the log file. Or you could use another thread to monitor all folders containing log files being monitored. When that thread notices a change, it signals the thread that handles that log file. Again, you have two alternatives. Maintain a mapping between a logfile and a thread so that you can signal just that thread, or signal all threads and let them sort it out.
Signalling just the one thread means that you have to maintain a mapping from thread to directory via an event object. Extra code to be debugged and maintained, and a more subtle bug. If two threads are monitoring different log files in the same directory and you're using the FindFirstChangeNotification()
or ReadDirectoryChangesW()
APIs, there's a very real possibility that only one thread will be notified of a change! The other thread might possibly wait for hours until it receives an update notification.
Signalling all threads means that you have to use a manual reset event object to ensure that all threads will see the notification. This leads to the question of just who resets that manual reset object? The only way I can think of is to make the thread that runs the FindFirstChangeNotification()
or ReadDirectoryChangesW()
APIs signal the manual reset event object and then wait on an array of objects, one per monitor thread and when each of those threads have completed their work, they signal their object in the array that the monitor thread waits on, which means code to add and remove event objects on the monitor thread. You see where this is going...
In the end, it's just a hell of a lot simpler to monitor the file length.
The Log Update Thread
Now the Log Update Thread takes over. This thread waits on two objects. The first is the same event handle that the monitor thread waits on, the one that tells the thread to exit. The second object is the m_eSignal
event object which tells this thread to check the CLogData
array for new entries. As new entries are found, they are added to the CListView
using PostMessage
s to the main UI thread.
void CLogFileInstance::LogUpdateThread(LPVOID data)
{
{
CLogDataArray *me = (CLogDataArray *) data;
BOOL bStop = FALSE;
CLogData *pLog;
HANDLE handleArray[2];
ASSERT(me);
ASSERT_KINDOF(CLogDataArray, me);
handleArray[0] = me->m_eStop;
handleArray[1] = me->m_eSignal;
while (bStop == FALSE)
{
switch (WaitForMultipleObjects(2, handleArray, FALSE, INFINITE))
{
case WAIT_OBJECT_0:
bStop = TRUE;
break;
case WAIT_OBJECT_0 + 1:
while (me->m_bContinue && me->m_currentEntry < me->GetSize())
{
pLog = (CLogData *) me->GetAt(me->m_currentEntry++);
ASSERT(pLog);
ASSERT_KINDOF(CLogData, pLog);
pLog->AddToView(me->m_view);
}
if (me->m_bContinue)
me->m_view->PostMessage(guiAdvise, CLogViewerView::adviseEndUpdate);
break;
}
}
}
_endthread();
}
The main thing of note in this thread procedure is the PostMessage
to the view. As pointed out by Neville Franks and Mike Dimmick in replies to a post I made on the Visual C++ forum, it's necessary to be very careful when attempting to touch Windows objects created in a different thread. My first draft of this thread procedure passed a pointer to the CListView
to the thread and the thread itself directly manipulated the underlying ListView
Control. It worked most of the time but would randomly hang the application when attempting to shut it down. Much headscratching later, I followed recommendations and started using PostMessage
. Voila! Problem solved. Thanks Neville and Mike.
When the application starts, it registers a custom windows message using the RegisterWindowMessage()
API. The registered message value is stored in a global UINT
called guiAdvise
. This is the message used to communicate between the LogUpdateThread()
and the view.
The View Class
The List Control is created in report mode (of course). The view contains the handler for the guiAdvise
message which looks like this:
LRESULT CLogViewerView::OnAdvise(WPARAM wp, LPARAM lp)
{
CListCtrl& ctl = GetListCtrl();
CLogText *pLog;
switch (wp)
{
case adviseAddItem:
pLog = (CLogText *) lp;
ASSERT(pLog);
ASSERT_KINDOF(CLogText, pLog);
if (pLog->IncludeThis(m_logFilter))
ctl.InsertItem(
LVIF_IMAGE | LVIF_PARAM,
INT_MAX,
LPSTR_TEXTCALLBACK,
0,
0,
pLog->Image(),
LPARAM(pLog)
);
break;
case adviseEndUpdate:
if (ctl.GetSelectedCount() == 0)
ctl.EnsureVisible(ctl.GetItemCount() - 1, FALSE);
ctl.SetColumnWidth(LOGVIEWER::COLUMN_DATE, LVSCW_AUTOSIZE_USEHEADER);
ctl.SetColumnWidth(LOGVIEWER::COLUMN_SEQUENCE, LVSCW_AUTOSIZE_USEHEADER);
ctl.SetColumnWidth(LOGVIEWER::COLUMN_CODE, LVSCW_AUTOSIZE_USEHEADER);
ctl.SetColumnWidth(LOGVIEWER::COLUMN_MODULE, LVSCW_AUTOSIZE_USEHEADER);
ctl.SetColumnWidth(LOGVIEWER::COLUMN_THREAD, LVSCW_AUTOSIZE_USEHEADER);
ctl.SetColumnWidth(LOGVIEWER::COLUMN_MACHINE, LVSCW_AUTOSIZE_USEHEADER);
ctl.SetColumnWidth(LOGVIEWER::COLUMN_MESSAGE, LVSCW_AUTOSIZE_USEHEADER);
if (m_bIsActiveWindow)
MAINFRAME->UpdateDlgBar(m_logFileInstance->m_logFilterData);
break;
case adviseClearList:
m_logFileInstance->RemoveAll();
ctl.DeleteAllItems();
break;
}
return 0;
}
The adviseAddItem
sub-message adds an item to the List Control. Each item has an associated image and the text is set to LPSTR_TEXTCALLBACK
which means the text to be displayed isn't stored in the List Control at all. Instead, when the List Control needs to draw the text, it calls back into our view asking for the text. Finally, we set the itemdata
for this item as the pointer to the CLogText
object corresponding to this item.
When the LogUpdateThread()
finishes noticing new records in the CLogData
array, it sends the view a message, adviseEndUpdate
. When the view receives this message:
- If no line is selected in the List Control, we scroll to the end of the display to ensure the last item added is visible. If a line is selected, we assume it's a line of interest that ought not to scroll out of sight.
- We also do an automatic column resize once the update has finished.
Why CLogText
? As alluded to earlier in this article, the code presented here doesn't assume one line in the text file corresponds to one record. We want to allow a log message string to contain embedded newlines so that we don't have to pre-process each line before writing it to the log file. (What would we replace an endline with anyway?).
So our CLogData
object (which represents a log record) contains all the data necessary for a complete record (timestamp, sequence, code, module, thread and the first line of the message text. If the message text doesn't contain an endline (the next line of the file starts with a digit) that's it. If, however, the next line of the file doesn't start with a digit, we assume we've seen an embedded newline and this line is a continuation of the preceding record. In this case, we don't need the overhead of a complete record. Instead, we create a CLogText
object which contains only the text of this line and a pointer to the preceding complete record (a CLogData
object and add the new object to an array of CLogText
objects in the parent CLogData
object.
The CLogText
is the base class for the CLogData
class. The class looks like this:
class CLogText : public CObject
{
friend class CLogData;
DECLARE_DYNAMIC(CLogText);
CLogText();
public:
CLogText(const CLogData *pOwner, LPCTSTR szText);
virtual int Image() const;
virtual BOOL IncludeThis(CLogFilter& pFilter) const;
void RenderText(LVITEM& rItem, CLogFilterData& rFilter);
virtual void FormatForTip(
LPTSTR szBuffer, int iBuffLen,
CLogFilterData& rFilter
)const;
protected:
virtual int Code() const;
virtual int Sequence() const;
virtual LPCTSTR Thread(CLogFilterData& rFilter) const;
virtual LPCTSTR Machine(CLogFilterData& rFilter) const;
virtual LPCTSTR Module(CLogFilterData& rFilter) const;
virtual LPCTSTR TimeStamp() const;
virtual LPCTSTR Text() const;
CLogData *Owner() const;
private:
CLogData *m_pOwner;
CString m_csText;
};
Notice there are a bunch of protected virtual
functions. Those functions are duplicated in the derived class (CLogData
) and are only called via the RenderText()
function (which is why they're protected). The code in the view that requests display data treats all objects as though they were CLogText
's and lets the object itself sort it out. The base class functions either return data in the base class (the m_csText
data), or they vector through the m_pOwner
pointer to return data stored in the owner object. (Actually, this isn't entirely true. For aesthetic reasons, the code displays blanks for all data but the message string if the object is a CLogText
. I prefer to see the timestamp, etc. only on the main record and blanks for continuation records).
The view handles requests from the List Control for data through the OnGetDispInfo
function. The code is thus:
void CLogViewerView::OnGetDispInfo(NMHDR* pNMHDR, LRESULT* pResult)
{
LV_DISPINFO *pDispInfo = (LV_DISPINFO *) pNMHDR;
LV_ITEM *pItem = &(pDispInfo)->item;
if (pItem->mask & LVIF_TEXT)
{
CLogText *pLog = (CLogText *) pItem->lParam;
ASSERT(pLog);
ASSERT_KINDOF(CLogText, pLog);
pLog->RenderText(*pItem, m_logFileInstance->m_logFilterData);
}
*pResult = 0;
}
Pretty simple. The function checks that the list control is asking for text and if so, passes the request on to the underlying object. A pointer to the object was set as the itemdata
for the item when the item was created.
Filtering
All of this is well and good. We now have a program that reads log files, displays them, draws pretty icons and notices almost immediately when a new record has been appended. Heck, it's even smart enough to automatically scroll when new records are added unless you have a record selected (in which case it assumes that you want to read that record and doesn't scroll).
So you're interested in seeing error records but the logging level is set to log everything. The signal to noise ratio might be 100 information messages to 1 error message (or much higher). This is where filtering comes in.
As each new log record is read, it's parsed into its various parts. I can't imagine wanting to filter on the timestamp or the message text (or the sequence for that matter) but I can well imagine needing to filter on code, module, machine or thread. Filtering on the code is easy. If the code is less than 1000 it's an error. If less than 2000 but more than 999 it's a warning, and so on.
For the other filterable criteria, we need to record what we've seen. I don't want to have to guess at thread IDs. Nor do you. Likewise for modules.
Filtering Details
The CLogFilterData
object contains three CStringArray
s. Each array contains each unique string
corresponding to a module name, a machine name, or a threadID
. The corresponding data stored (per record) in the CLogData
objects are actually offsets into the arrays in the CLogFilterInstance
objects. In other words, if we see 'ModuleName
' in the Module
field of the log record, we search the Module
array of the CLogFilterData
object for a match. If we find a match, we store the appropriate index as part of the CLogdata
object; if not, we add 'ModuleName
' to the Module
array returning the new index. Either way, an entry in ClogData
will resolve to a string
stored in the array.
In addition to providing lookup for the CLogData
objects, the filter data is also used to populate three combo boxes on the mainframe dialog bar. As the user switches between views, the dialog bar is updated with filter data derived from that view and the users' last selection is reselected.
Unfortunately, changing the filter criteria means that all currently displayed data must be dropped and the List Control repopulated. If there's a way to set an item in a List Control and not have it display, I haven't found it.
Thus, changing the filter criteria causes the List Control to be emptied, the m_currentEntry
variable in the CLogFileInstance
object to be set to zero and the Log Update thread to be signalled. Aha, so that's why it's a separate thread!
The offsets to string
s in the string
arrays are zero based. The combo boxes are populated with the contents of the string
array and, if the string
array contains more than one entry (i.e., more than one module name, machine name or thread id) an extra item (Show All) is included. The code in the filter dialog bar must account for this extra item. I used -1
as the signal to the filtering code to say 'ignore this item when filtering'.
Anatomy of a Bug
As I found to my cost today (December 9th, 2003) it's a right royal mistake to store the combobox
current selection as the data being used in the filter. The problem was the extra phantom entry, (Show All). It threw off the indices by 1
. Now that wouldn't have been a problem except that something tells me that a combo box containing two items
is just plain wrong. If there's only one choice, it seems to me that the combobox
should only contain that one choice and that it should also be automatically selected and the combobox
disabled.
If you believe that, then you find yourself writing code to add the (Show All) entry when it's needed, and not otherwise. But then it's necessary at some later time to know if the (Show All) entry has been added. You see where this is going...
Anyway, my code to handle the combobox
es was just plain wrong because I was trying to conflate the users selection with the index into the filter arrays and trying to handle the case where all array entries should match. Doing it this way degenerated into a game of trying to find all cases where the choice was the sentinel value (-1) and trying to do something sensible with that value when selecting an entry in the combobox. That way madness (and MFC ASSERT's in debug builds) lies.
It was much simpler in the end to maintain two variables per combobox
. The first is the users selection - the value returned by CComboBox::GetCurSel()
. Given the way this application maintains filter data (the number of entries in a combobox
never decreases unless the log file is cleared), that's a reasonable design choice.
The second variable is the item data associated with the chosen combobox
item. That value is always the offset into the string
array.
The Select Log Files Dialog Box
I showed you this earlier in the article and hinted there'd be more anon. It's now anon. :) Here it is again to refresh your memory.
The dialog class (CLoadLogFilesDlg
) is derived from the CFileDialog
class and provides a custom dialog template which is added to the standard common controls file open dialog. The extra controls are two buttons, 'Load all log files' and 'Load all of today's log files'. Should be simple. The first button loads any file in the directory with a .log extension, the second button applies a filter to ensure the files are of the form 'anytext.log' rather than 'anytext.yyyymmdd.log'.
I knew that it would be necessary to have my own array of string
s containing either the list of log files selected using the normal file open dialog methods (clicking on files and holding down the control key or shift key to select more than one), or a list of log files fitting the criteria used by the custom buttons.
My first problem was closing the dialog if one of the custom buttons was clicked. The dialog didn't want to close. It seems that the dialog won't close using the OK button if the File name: field is empty. (Thanks to Neville Franks again for telling me this). I tried the obvious things such as sending a WM_COMMAND
message to the dialog specifying IDOK
. Didn't close. Try again with IDCANCEL
. Still didn't close. Tried using EndDialog(IDOK);
. Still didn't close. This was starting to get silly so I used spy++ and discovered that the custom dialog template is created as a child dialog of the CFileDialog
class. If you want to send messages to the dialog from your custom message handlers, you need to send them to the parent dialog. Aha!
I still had the problem of closing the dialog. You can't send IDOK
to the parent dialog and expect it to close if the File name: field is empty. But you can't easily make that field be not empty. You can close the dialog by sending IDCANCEL
but then how do you know if the user actually selected a file? My solution was to override the DoModal()
function and, in code internal to the class, do one of two things.
- If the user hits either of the two custom buttons, populate my own
string
array with files that match (either aged or not aged). - If the user selects files using the file listing combo box and then presses the 'Open' button, populate my own
string
array with the files using the standard CFileDialog
functions.
The class lies to the caller! If the user hits either of the two custom buttons, the dialog is told to cancel itself after populating an array with files that match the criteria. If the user hits the Open button and no file is selected, the dialog doesn't close. If the user selects a file and hits the Open button, the dialog returns after populating the string
array.
The overridden DoModal
returns either IDOK
or IDCANCEL
to the caller depending on whether the string
array is empty or not. This is the desired result regardless of whether we're lying about the exact path we followed.
Futures
As it stands, this program does everything I want. But you might want it to do more. I can imagine extending it to email notifications to someone if a particular error code is encountered. (thanks to Paul Perkins for the idea).
History
- Version 1 - 5th December, 2003
- Version 2 - 9th December, 2003. Fixed a problem brought about by inadequate testing :-( where the
LogUpdateThread
was exiting after the initial file load for current log files. Reworked the way the filter state is handled (separated the filter data from the combo box selection). - Version 3 - 10th December, 2003. As part of writing a mea culpa for bugs in version 2, I realized there was another hole in the filtering schema. Specifically, when a log file rolled over, the filter data wasn't cleared and thus filter criteria that were relevant to yesterday's file would appear in today's viewer if the viewer was left running over the rollover moment. Now fixed.
- Version 4 - 12th December 2003. Added the extra check for whether the thread is being terminated as suggested by c-sharp below.
- Version 5 - 29th January 2004. Expanded the explanation of why I didn't use the
FindFirstChangeNotification()
or ReadDirectoryChangesW()
APIs. - Version 5 - 25th February 2004. Included a suggested fix for the duplicate lines problem by rnoda (see below). My fault entirely, I tested this under high load conditions which masked the problem. It appeared under low load conditions. Bummer.