Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / C#

EtwDataViewer - Analyze, visualize, and make sense of your ETW logs

5.00/5 (5 votes)
6 Aug 2013Ms-PL13 min read 42.8K   1.3K  
Event tracing for Windows is a super efficient log technology, alas logs are still flat raw logs. EtwDataViewer is a prototype for adding browsability, analysis, and a visual tree representation.

EtwDataViewer in action

Introduction

This is a follow-up article to Application Analysis with Event Tracing for Windows (ETW).

In this article I will experiment with different features that a custom ETW log viewer can have.

  • Support for manifests
  • Analysis through a plug-in
  • Build tree structure
  • Drill-down capabilities
  • WMI information

Background

The intent here is to improve on raw logs, which are normally viewed in some simple text editor with limited possibilities for searching and restructuring.

Raw text logs are usually compact and a bit cryptic in nature, since very verbose logs may slow down an application.

Writing a lot of data might hit performance. Firstly, if data has to be formatted/calculated/retrieved before being output. This is a serious performance hit. Secondly, memory is copied several times in internal buffers in the system. When you normally write to disk, you never write to disk directly, it is copied to an internal buffer, which is then flushed to disk by a background thread in the OS when the disk is ready. In some cases one prefers to open and close the log file every time it is written to, and flush the data directly or use non-buffered log files. This is slow, but you are guaranteed not to miss any logs, which may otherwise be lost if the application crashes.

With today's 8 core machines with hyper fast SSD disks. The performance penalty might be negligible. But with older machines and spinning drives, this was a problem, and it is still a problem on embedded hardware where the hardware resources are low.

What normally is done to make faster logs is writing more compact logs, by using abbreviations, using less formatting, omitting logging some values, We pay for this speed-up in readability. If you are a new to a job, these logs can be very hard to read and understand. Even for the ones that made the logs, the logs may be hard to read if the logs lacks structure or indentation.

Below is an extract from what could be a text log.

*** Generating document (objs: 12304)
*** Generated document [ok]
*** Removing empty objects
*** Optimizing structure
*** Compatiblity check
*** Refresh

Structure can be added by using indentation. The deeper the call depth, the more we indent.

*** Generate document [start]
   Filename: c:\temp\xyz.doc)
   Objects: 12304
   *** Preparing document [start]
      *** Removing empty objects
      *** Optimizing structure
      *** Compatiblity check
   *** Prepare document [end]
*** Generate document [end]
*** GUI [start]
   *** Refresh
*** GUI [end]

Better. We can see that the Refresh log entry is unrelated to the generation of the document. This is because we have introduced a context, by adding indents.

Logs can be added for very different reasons. By a developer verifying correct behavior during development, where just the initial parameters and the result is logged. For example a file path, and a boolean saying that the file was loaded.

*** C:\temp\mydoc.txt [false]

On the other hand, if logs are added for making the trouble-shooting easier, one would of course consider to log what type of file path it is, and what we were trying to do. Network paths are notoriously troublesome due to restricted security permissions. If something goes wrong we would log an error and possibly the exception.

* Trying to create file for editing *
* Creating in temporary folder      *
***** Error *****
   Cannot create file
   Folder C:\temp doesn't seem to exist
*****************

The second log is much better for trouble-shooting, but what we are really doing is anticipating an error and handling the error. There are cases were the program simply takes the wrong turn, and do somthing else than we intended or obviates something important. This might be due to a dangling else, some other programmatic error, or a faulty requirement.

Logging where debugging won't help

Let's say that we have an online ordering system, and in some cases the final amount end up wrong. In those cases we want logs in order to check the paths it takes through the system. Ideally comparing values against values that are correct. In an isolated system, where we can press the button once, we can simply debug it. If the problem is rarely occurring we need logs, but if you need to read a 200MB log file, and you don't have good built-in tools for searching and removing irrelevant logs, the risk is that you miss it because the data logs look similar.

What we need is very verbose logging and a log-viewer on steroids!!

Imagine a log where can we follow the steps the application takes, and see if and where something goes wrong, and also why. Also imagine a log-viewer that let us find relevant information quickly, letting us show/hide relevant information, and combine the event timestamps with CPU, disk, and network performance counters.

A custom ETW log viewer

ETW logs are fast

I like ETW logs because the underlying technology is fast, and that data doesn't need to be formatted or repeated. The manifest defines the data structures, and the type of data. There is no need to explain what type of data it is, as is normally done in text logs. This opens up new possibilities, we will be able to log a lot more data, even in tight loops.

The problem is that there is no good tool that can interpret your logs. Xperfviewer.exe can certainly show them, but only in a flat format. Browsability and searchability are limited. The data can be exported to Excel, but the searchability is still limited. What about writing a logviewer that is custom-made for browsing and searching?

Wishlist for a logviewer

Below I list a few things I would like to see in a log viewer app.

  • Correlation between what happens in the app and what is in the log
  • Tree view - hierarchical visualization
  • Drill-down capabilities
  • Possibilities to switch on and off irrelevant log points
  • Filtering based on log value content
  • Merging logs from other runs (simultaneous processes)
  • Add complementary ETW logs from Windows, like cpu and disk activity

Searchability is one of the most important things. Let's say that an Order record gets updated with an incorrect value. In a 200 MB log, it would be very powerful, showing just the UPDATE SQL statements concerning the Order table. When we have found it, we show the rest of the log, and see what it was actually doing. The other way around would be horrible to do, reading all logs until we found the UPDATE statement among other 20000 SQL statements.

A generic viewer is not possible

Building a generic logviewer for ETW is not a viable approach. The ETW manifest which describes the log points are very generic, you can build as simple or complex data structures as you want, and define several providers, log-levels, keywords, opcodes, etc. What you cannot specify in the manifest is how the data points are related. As the creator of these logs, you know how they are related. Provider A, is a high-level webservice log, Provider B, is for logging SQL statements, Event Id 1 and 2, are entry and exit events, etc. Unfortunately there is no convention how one should build these logs, so one manifest may be totally different from another manifest.

A customizable generic viewer is possible

Regardless of the the differences of the manifests, many things in the wishlist is still possible to implement. The beauty of the manifest is that it is a simple XML file, which can be parsed. The information in the manifest can be very useful indeed. It will let us do the following:

  • Extract the exe-filename
  • Enumerate event providers
  • Enumerate events for each provider

Having this information we will be able to:

  • Dynamically search for active processes
  • Enable/Disable event providers
  • Show/hide specific events in the log

What we still don't know is the relation between the events. If one event is at a higher-level than another event, so that the second event should be a sub node to the first. At this point, the viewer need some help. We need to supply a plugin, that builds a tree based on the event IDs. In my EtDataViewer app, I have created an interface for plugins.

C#
public interface IEtwDataAnalyzer
{
 // Decode data and add objects for later retrieval
 // I parsed sql statements and added a parsed object tree
 List<EtwDataEvent> PreprocessEvents(List<EtwDataEvent> newEvents);
 
 // Process events and arrange them in a tree structure
 List<EtwDataEventItem> BuildTree(List<EtwDataEvent> list);

 // Returns a list of referenced sql tables
 List<string> ReferencedTableNames { get; }
 
 // Info to display when an event is selected
 string GetEtwDataDetails(EtwDataEventItem etwData);

 // Remove objects previously added in the preprocess phase
 // Objects that are not marked with the serializable attribute
 // cannot be saved. Call PreprocessEvents to restore.
 void PrepareForSave(List<EtwDataEvent> allEvents);
}

The only non-generic thing is the function ReferencedTableNames. If your providers doesn't provide an sql log, it can be ignored. The viewer itself was built to be able to filter on SQL statements.

Of course, the viewer can be extended to filter on something else, like IP addresses, file names, process Ids, or customer data. But in that case this functionality also need to be added to the GUI.

EtwDataViewer

Load Manifest

When we start the EtwDataViewer it starts off as blank application

The first thing we need to do is load the manifest describing the event data points. Without the manifest the application doesn't work at all.

Blank EtwDataViewer

Initializing

Loaded Manifest in EtwDataViewer

The first thing the app does, is extracting the providers from the manifest. The provider name and its GUID. After that it extracts all the events by name and Id, and populates a tree where each node contain a checkbox, which allow you to show or hide it from the log.

At this point the app should load a corresponding decoder plugin. Perhaps looking for a DLL with the same name as the manifest in a plugin folder or load them all in and query them if the support a certain provider GUID.

I will probably add this functionality in the future. At the time of writing, I am focusing on the search capabilities. It is easy to replace the hard coded plugin, by replacing the first line in the MainWindow.xaml.cs file.

C#
private IEtwDataAnalyzer _demoAnalyzer = new DemoPlugin.EtwDataAnalyzerDemo();

Start Logging

When you are ready to start logging, click Start in the File menu, then the events will start appearing.

Explanation of the windows in EtwDataViewer

This is how it looks like when the logging is in progress (Except for the orange text that is not visible). The events end up on the left-hand side, the process information is listed at the top, and one has filtering possibilities to the right.

WMI and Process information

When you choose the menu option Start logging. The app extracts the name of the exe file from the manifest and does a wmi query by filename to get all running processes with that name. It retrieves information like CPU usage, memory consumption, etc.

That part is quite easy with the Management Strongly Typed Class Generator. It is a tool (Mgmtclassgen.exe) which generates classes for accessing the WMI information. I generated one class for accessing process information, but also one for disk accesses, and finally one for tcp/ip activity. The code below finds all process with a specific filename.

C#
_providerFilenames.ForEach(filename => {
var file = new System.IO.FileInfo(filename);
var shortFilename = filename.Replace(file.Extension, string.Empty).ToLower();

// appname appears as 'appname', 'appname#1', 'appname#2', etc
// WQL (WMI Query Language) is similar to SQL
var condition = string.Format("Name LIKE '{0}%'", shortFilename);

var vbusInstances = ROOT.CIMV2.Win32.PerfFormattedData_PerfProc_Process.GetInstances(condition)

Collecting Events

I didn't write the event trace watchers myself. I found an implementation that I could use. The EventTraceWatcher class for ETW Events by Daniel Vasquez Lopez at MSDN (released under the same license as I release). The implementation of ReleaseHandle/CloseTrace is unfortunately broken.

C#
protected override bool ReleaseHandle() {
  return NativeMethods.CloseTrace(this.traceHandle) != 0;
}

NativeMethods.CloseTrace is a PInvoke declared function, it just forwards the call to the Win32 implementation. The first wrong is that CloseTrace returns 0 on success. It is a minor flaw (who checks return-codes anyway ;)

The real problem is that the call to CloseTrace always fails. The code is correct but there seem to be something wrong with how data is marshaled that makes it fail (ref/out/in declaration).

The problem manifests itself in that the sessions remain open and they have to be manually closed.

Performance Monitor

Go into Performance monitor, right-click on stop, followed by delete.

Workarounds and changes to the EventTraceWatcher

I made a work-around for the CloseTrace in C++/CLI code, making almost the same call, which worked. Actually, I made a couple of more C++/CLI modifications for controlling the traces.

In addition. According to the MSDN documentation. A trace should be opened by its provider ID and be given a unique session ID. The original implementation has skipped the session ID, which I added in my implementation.

All those modifications make it hard for me to release it here in CodeProject. At this stage it is a patch-work, and far from finished. I think one ought to rewrite some part of that library in C++/CLI.

In the downloadable file I left out the work-around, added the attribute assembly: InternalsVisibleTo("EtwConsumer")] in order to access etw event data, which was declared with the internal attribute. I also added an interface to the original implementation of the EventTraceWatcher class, making it inherit from IEventTraceWatcher. This allowed me to make unit tests for my EtwConsumer library, faking events but being able to test the inner workings of my classes.

Filtering and building the tree

Every time new event data is collected, the data passes through a filter, so that only the remaining list is actually showed in the visual tree.

C#
void BuildTree(List<EtwDataEvent> etwEvents, bool forceUpdate = false)
{
  // Remove events from processes that are unselected
  var filteredList0 = ApplyProcessIdFilter(etwEvents);

  // Remove events according to the event Id list
  var filteredList1 = ApplyEventFilter(filteredList0);

  // Remove events (SELECT/UPDATE/INSERT...)
  var filteredList2 = ApplySqlFilter(filteredList1);

  // Remove events based on table name
  var filteredList3 = ApplySqlTableFilter(filteredList2);

  // Build a hierarchial tree depending on context
  var visualTree = PostProcessEvents(filteredList3);

  Dispatcher.Invoke(() =>
  {
    if (forceUpdate) _filteredEvents.Clear();
    visualTree.ForEach(n => _filteredEvents.Add(n));
  });
}

The calls are wrappers that calls the same methods in the plugin. The code modifies an internal list, which is of type ObservableCollection, which automatically notifies the GUI of changes. However, this list can only be modified from the GUI thread, hence the call via the Dispatcher.Invoke.

Irony .NET Language Implementation Kit

My implementation uses Irony, a library for scanning and parsing. It is supplied with a sample grammar file for parsing SQL statements. I use that grammar file, to extract the table names from the sql log. The sample file, cannot handle the whole SQL standard, but the grammar is very easy to extend. The grammar is described in C# code, but with expressions very similar to BNF notation. If you can read BNF, it will be ridiculously simple to modify it to your needs.

Running the demo

The ETW demo manifest needs to be installed/registered in the system. Under a visual studio command prompt with admin privileges, run the Wevtutil.exe command to install it.

// Install
Wevtutil.exe im EtwProducerSampleAppTracing.man

// Uninstall
Wevtutil.exe im EtwProducerSampleAppTracing.man

In the File menu of the EtwDataViewer.exe there is an option called Demo. Click on that one. It will load a manifest embedded in its own binary. Then launch EtwProducerSampleApp.exe, and click on any of the three buttons to generate events. The alternative way is to load the manifest manually. You will find it in the demo folder, and then click Start in the File menu.

EtwProducerSampleApp.exe

Then EtwDataViewer should look similar to the picture below. If you make changes to the filters, press Rebuild to process the collected events through the filters in order to reflect the changes.

Populated view of EtwDataViewer

Conclusion

The tool still needs some polishing regarding starting and stopping the traces. For my needs the tool is already usable, it lets me find information quickly by removing irrelevant log points. In the future I would like to see these logs combined with network activity. Logging to different providers, at least in C# under a virtual as I am running, seem to have a flaw. The timestamp is not always in order. An event in one provider may have an earlier timestamp than an event in another provider. This might be that the providers are processed by different background threads. It is on a microsecond level, but it makes a difference if you want to sort events in order to build call hierarchies. Consider using one provider for the whole app to avoid this problem.

The Windows operating system is full of ETW logs, from drivers, network, CPU, disk, graphics interfaces. You can freely combine these log-points and relate them to your own data points.

My intent was not to create the ultimate log viewer, but to show you the potential these ETW logs have. Plant a seed, and let your imagination take you further. You can use my code to create your own log-viewer, and add graphs/report for your specific needs.

Points of Interest

History

  • 5 Aug. 2013 - Initial release.

License

This article, along with any associated source code and files, is licensed under The Microsoft Public License (Ms-PL)