If you’ve ever spent time debugging .NET memory dumps in WinDBG, you will be familiar with the commands shown below, which aren’t always the most straight-forward to work with!
However, back in May 2013, Microsoft released the CLR MD library, describing it as:
… a set of advanced APIs for programmatically inspecting a crash dump of a .NET program much in the same way as the SOS Debugging Extensions (SOS). It allows you to write automated crash analysis for your applications and automate many common debugger tasks.
This post explores some of the things you can achieve by instead using CLR MD, a C# library which is now available as a NuGet Package. If you’re interested, the full source code for all the examples is available.
Getting Started with CLR MD
This post isn’t meant to serve as a Getting Started guide, there’s already a great set of tutorials linked from project README that serve that purpose:
However, we will be looking at what else CLR MD allows you to achieve.
I’ve previously written about the Garbage Collectors, so the first thing that we’ll do is see what GC related information we can obtain. The .NET GC creates 1 or more Heaps, depending on the number of CPU cores available and the mode it is running in (Server/Workstation). These heaps are in-turn made up of several Segments, for the different Generations (Gen0/Ephememral, Gen1, Gen2 and Large). Finally, it’s worth pointing out that the GC initially Reserves the memory it wants, but only Commits it when it actually needs to. So using the code shown here, we can iterate through the different GC Heaps, printing out the information about their individual Segments as we go:
Analysing String Usage
But knowing what’s inside those heaps is more useful, as David Fowler nicely summed up in a tweet, string
s often significantly contribute to memory usage:
Now, we could analyze the memory dump to produce a list of the most frequently occurring string
s, as Nick Craver did with a memory dump from the App Pool of a Stack Overflow server (click for larger image):
However, we’re going to look more closely at the actual contents of the string
and in particular, analyze what the underlying encoding is, i.e., ASCII
, ISO-8859-1 (Latin-1)
or Unicode
.
By default, the .NET string Encoder, instead of giving an error, replaces any characters it can’t convert with ‘ ’ (which is known as the Unicode Replacement Character). So we will need to force it to throw an exception. This means we can detect the most compact encoding possible, by trying to convert to the raw string
data to ASCII
, ISO-8859-1 (Latin-1)
and then Unicode
(sequence of UTF-16 code units) in turn. To see this in action, below is the code from the IsASCII(..)
function:
private static Encoding asciiEncoder = Encoding.GetEncoding(
Encoding.ASCII.EncodingName,
EncoderFallback.ExceptionFallback,
DecoderFallback.ExceptionFallback);
private static bool IsASCII(string text, out byte[] textAsBytes)
{
var unicodeBytes = Encoding.Unicode.GetBytes(text);
try
{
textAsBytes = Encoding.Convert(Encoding.Unicode, asciiEncoder, unicodeBytes);
return true;
}
catch (EncoderFallbackException )
{
textAsBytes = null;
return false;
}
}
Next, we run this on a memory dump of Visual Studio with the HeapStringAnalyser source code solution loaded and get the following output:
The most interesting part is reproduced below:
Overall 145,872 "System.String" objects take up 12,391,286 bytes (11.82 MB)
Of this underlying byte arrays (as Unicode) take up 10,349,078 bytes (9.87 MB)
Remaining data (object headers, other fields, etc) are 2,042,208 bytes (1.95 MB), at 14 bytes per object
Actual Encoding that the "System.String" could be stored as (with corresponding data size)
10,339,638 bytes ( 145,505 strings) as ASCII
3,370 bytes ( 65 strings) as ISO-8859-1 (Latin-1)
6,070 bytes ( 302 strings) as Unicode
Total: 10,349,078 bytes
So in this case, we can see that out of the 145,872 string
objects in memory, 145,505 of them could actually be stored as ASCII
, a further 65 as ISO-8859-1 (Latin-1)
and only 302 need the full Unicode
encoding.
Additional Resources
Hopefully, this post has demonstrated that CLR MD is a powerful tool. If you want to find out more, please refer to the links below:
The post Analysing .NET Memory Dumps with CLR MD appeared first on my blog Performance is a Feature!