Having a previously stable web application
suddenly fall over, throwing OutOfMemory
exceptions is – obviously – not a good
thing. Unfortunately, an application – web or desktop – can perform perfectly
fine through development and QA, then hit production and fall over
spectacularly under heavy load, multiple users, or just gradually over time.
There are plenty of ways for this to
happen, and one of the most common and trickiest to diagnose is through memory
leaks. This article gives a little background to how unexpected memory issues
can creep into .NET code. It then walks through a simple troubleshooting
example using an ASP.NET application and ANTS
Memory Profiler.
Managed memory, unmanaged memory, and where errors creep
in
Working in .NET certainly does simplify
memory management, but it doesn’t remove the problem entirely. At minimum, an
understanding of garbage collection and the object heaps helps you avoid nasty
performance overheads from managing memory. But you’re also likely to encounter
issues with unmanaged memory, which you may not realise you’re using.
For example, under the hood, the standard
.NET framework imaging libraries often use large amounts of unmanaged memory,
even though you interact with a .NET wrapper. These can leak, and under heavy
use, they can slow down or crash an application in a non-intuitive way – it’s
not always obvious to go looking for unmanaged memory problems when you’re
writing .NET code.
Similarly, in a complex codebase, it’s easy
to forget to unregister event handlers. These can then hang on to memory, and
lead to memory usage rising over time, which will gradually degrade
performance, and can lead to crashes.
Regularly profiling an application not only
helps you fix the obvious issues like OutOfMemory
exceptions, but it can also
alert you to problems before you have to see that nasty crash in production. As
a simple example, seeing a high proportion of memory in the Generation 2 heap
is an indicator that memory is being held onto for a long time, and that you
may have a leak somewhere.
Memory profiling – comparing before and after
Profiling with ANTS
Memory Profiler is based on taking memory snapshots. The profiler attaches
to an application, and when you take a snapshot, it examines the state of the
memory being used.
What you look at when you use ANTS Memory
Profiler is the difference between the snapshots. The profiler shows you
a timeline with on-going performance counters as an overview of the
application’s general behaviour, and as a guide to when best to take a
snapshot.
A good approach is to start with a baseline
snapshot when the application is idle, then apply load or go through the
reproduction steps for the error you’re troubleshooting.
If there’s an issue, memory usage will
climb on the timeline, and either stay high or fall at a lower than expected
rate. Taking a second snapshot at this point lets you look at what’s changed
and see which objects are surviving in memory for longer than they should.
We’ll walk through this in a bit more
detail using a simple example web application.
Example case: the leaky web application
For this example, we’ve taken NerdDinner
(an ASP.NET MVC demo application) and modified it to show a reasonably common
problem.
NerdDinner displays locations on a map, and
we’ve included the ability to output that map to a PDF, using a 3rd party PDF
library:
But when our version of NerdDinner has
multiple simultaneous users, it’s been reported that it slows down drastically,
and has even crashed with OOM exceptions.
This is not ideal. Because it was stable
before we added the new functionality, and remains stable under light usage, we’ve
got a fair idea of where to start investigating – we’ll throw load at the new
PDF export functionality, and see what the graphs look like.
Here’s what we’ll do:
- Open NerdDinner
-
Take a baseline snapshot while it’s idle
-
Generate some load on the PDF functionality
- Take a second snapshot to compare
- Examine the profiler data to see if we’re leaking memory and where
Setup is simple. We just start the profiler
and click New profiling session.
If you’ve used a previous version, you’ll
probably notice that version 8 looks a bit different. In particular it’s
quicker to get started and re-run profiling sessions, and it lets you profile
using any web browser.
On the left of the screen, we choose IIS
– ASP.NET:
We enter the location of the web application,
ensure we’ve selected the option to profile unmanaged code, and click Start
profiling.
NerdDinner launches in the browser, and the
profiler begins collecting data. We start to see memory usage on the timeline.
At this point, we take out baseline snapshot.
The summary screen shows us some basic
information about memory usage, but it doesn’t really get interesting until we
take another snapshot.
Here’s the baseline:
To simulate load and trigger the issue, we’ll
use TinyGet to make multiple requests to the PDF export function.
The memory usage starts to climb sharply on
the timeline, and we take another snapshot.
The summary screen now shows us what’s
changed between our baseline and applying load. In this case, it’s actually
pretty clear cut.
The pie chart shows us that a massive
amount of the memory is being held by unmanaged code.
To see where this memory is going, we can
use the Unmanaged breakdown by module. This shows us 855MB being used by
MuPDFlib, the module we know to be our new PDF component. The small grey bar
next to the other modules is the size in the baseline snapshot. Our PDF module
doesn’t have one, so quite apart from being massively larger than anything
else, we know that it’s newly allocated memory.
So the classes associated with this module
look like the right place to start looking for our issue.
But what’s causing the leak?
To find out, we go to the Class list
and sort by unmanaged size.
We see that while the MuPDF .NET class is
using a huge amount of unmanaged memory, its .NET memory consumption is
relatively small. So much so that it would probably have gone unnoticed if we
hadn’t selected ‘unmanaged profiling’.
Next, we look at the instance list, where
we see several instances of MuPDF in memory, using plenty of unmanaged space.
This confirms that this class is a likely
culprit, so we can go ahead and draw an instance retention graph and find out
why the memory is being held onto.
In this particular case, the graph is
almost comically simple – MuPDF is being held on the finalizer queue.
That’s a little bit odd, and at this point
we need to actually dig into our code and find out why.
Fixing the leak
Our example is relatively simple to
navigate. We go to the finalizer for our implementation of MuPDF.
~MuPDF()
{
if (this.m_pNativeObject != IntPtr.Zero)
{
this._Api.DisposeMuPDFClass(this.m_pNativeObject);
this.m_pNativeObject = IntPtr.Zero;
if (this._ImagePin.IsAllocated)
{
this._ImagePin.Free();
}
}
Logger.Logging.logMessage("Finalized");
}
The application is logging each time the
finalizer is run.
.NET only has a single finalizer thread, so
because the logging system we’re using takes a long time to talk to the
database, the thread is being blocked, preventing it from cleaning up the objects,
and causing them to remain in memory for longer than they should.
If we look back to the timeline, we can
also see something interesting now we’ve stopped generating load.
Rather than staying high and constant, the
application’s memory usage is actually declining very slowly. So the memory is
being freed after the logging finishes, but much more slowly than it gets
allocated under load. This is why we didn’t notice the issue until the application
was deployed in the wild.
In this case there are some easy fixes
available to us. We could either remove the finalizer logging or troubleshoot
the database query to reduce the latency. Because the logging was probably part
of some debugging instrumentation in development, and this isn’t really a very
sensible thing to be doing, we’ll just take it out. Alternatively, a much
better solution would be to implement IDisposable
.
At this point, we rebuild the application,
and re-run the profiling session (using the same technique) to check that the
problem is fixed.
We can see here that memory usage returns
to acceptable levels much more rapidly than before.
Conclusions
Our walkthrough has shown a fairly simple
troubleshooting example: debugging code that made its way into production. But
the effects are real enough – a non-obvious memory leak whose consequences only
manifest when the application is under heavy usage. The actual .NET memory
usage does not look suspicious, and it’s only when we inspect the unmanaged
memory consumed by the .NET code that the source of the problem emerges.
Note: Red Gate Software offers a free
trial of ANTS Memory Profiler for you to try it out on your own
application.