(untagged)

Avoiding and fixing unexpected memory issues in .NET applications

Red Gate Software

1 Nov 2013

How to track down and fix unexpected memory leaks - a worked example

This article is in the Product Showcase section for our sponsors at CodeProject. These articles are intended to provide you with information on products and services that we consider useful and of value to developers.

Having a previously stable web application suddenly fall over, throwing OutOfMemory exceptions is – obviously – not a good thing. Unfortunately, an application – web or desktop – can perform perfectly fine through development and QA, then hit production and fall over spectacularly under heavy load, multiple users, or just gradually over time.

There are plenty of ways for this to happen, and one of the most common and trickiest to diagnose is through memory leaks. This article gives a little background to how unexpected memory issues can creep into .NET code. It then walks through a simple troubleshooting example using an ASP.NET application and ANTS Memory Profiler.

Managed memory, unmanaged memory, and where errors creep in

Working in .NET certainly does simplify memory management, but it doesn’t remove the problem entirely. At minimum, an understanding of garbage collection and the object heaps helps you avoid nasty performance overheads from managing memory. But you’re also likely to encounter issues with unmanaged memory, which you may not realise you’re using.

For example, under the hood, the standard .NET framework imaging libraries often use large amounts of unmanaged memory, even though you interact with a .NET wrapper. These can leak, and under heavy use, they can slow down or crash an application in a non-intuitive way – it’s not always obvious to go looking for unmanaged memory problems when you’re writing .NET code.

Similarly, in a complex codebase, it’s easy to forget to unregister event handlers. These can then hang on to memory, and lead to memory usage rising over time, which will gradually degrade performance, and can lead to crashes.

Regularly profiling an application not only helps you fix the obvious issues like OutOfMemory exceptions, but it can also alert you to problems before you have to see that nasty crash in production. As a simple example, seeing a high proportion of memory in the Generation 2 heap is an indicator that memory is being held onto for a long time, and that you may have a leak somewhere.

Memory profiling – comparing before and after

Profiling with ANTS Memory Profiler is based on taking memory snapshots. The profiler attaches to an application, and when you take a snapshot, it examines the state of the memory being used.

What you look at when you use ANTS Memory Profiler is the difference between the snapshots. The profiler shows you a timeline with on-going performance counters as an overview of the application’s general behaviour, and as a guide to when best to take a snapshot.

A good approach is to start with a baseline snapshot when the application is idle, then apply load or go through the reproduction steps for the error you’re troubleshooting.

If there’s an issue, memory usage will climb on the timeline, and either stay high or fall at a lower than expected rate. Taking a second snapshot at this point lets you look at what’s changed and see which objects are surviving in memory for longer than they should.

We’ll walk through this in a bit more detail using a simple example web application.

Example case: the leaky web application

For this example, we’ve taken NerdDinner (an ASP.NET MVC demo application) and modified it to show a reasonably common problem.

NerdDinner displays locations on a map, and we’ve included the ability to output that map to a PDF, using a 3rd party PDF library:

But when our version of NerdDinner has multiple simultaneous users, it’s been reported that it slows down drastically, and has even crashed with OOM exceptions.

This is not ideal. Because it was stable before we added the new functionality, and remains stable under light usage, we’ve got a fair idea of where to start investigating – we’ll throw load at the new PDF export functionality, and see what the graphs look like.

Here’s what we’ll do:

Open NerdDinner
Take a baseline snapshot while it’s idle
Generate some load on the PDF functionality
Take a second snapshot to compare
Examine the profiler data to see if we’re leaking memory and where

Setup is simple. We just start the profiler and click New profiling session.

If you’ve used a previous version, you’ll probably notice that version 8 looks a bit different. In particular it’s quicker to get started and re-run profiling sessions, and it lets you profile using any web browser.

On the left of the screen, we choose IIS – ASP.NET:

We enter the location of the web application, ensure we’ve selected the option to profile unmanaged code, and click Start profiling.

NerdDinner launches in the browser, and the profiler begins collecting data. We start to see memory usage on the timeline.

At this point, we take out baseline snapshot.

The summary screen shows us some basic information about memory usage, but it doesn’t really get interesting until we take another snapshot.

Here’s the baseline:

To simulate load and trigger the issue, we’ll use TinyGet to make multiple requests to the PDF export function.

The memory usage starts to climb sharply on the timeline, and we take another snapshot.

The summary screen now shows us what’s changed between our baseline and applying load. In this case, it’s actually pretty clear cut.

The pie chart shows us that a massive amount of the memory is being held by unmanaged code.

To see where this memory is going, we can use the Unmanaged breakdown by module. This shows us 855MB being used by MuPDFlib, the module we know to be our new PDF component. The small grey bar next to the other modules is the size in the baseline snapshot. Our PDF module doesn’t have one, so quite apart from being massively larger than anything else, we know that it’s newly allocated memory.

So the classes associated with this module look like the right place to start looking for our issue.

But what’s causing the leak?

To find out, we go to the Class list and sort by unmanaged size.

We see that while the MuPDF .NET class is using a huge amount of unmanaged memory, its .NET memory consumption is relatively small. So much so that it would probably have gone unnoticed if we hadn’t selected ‘unmanaged profiling’.

Next, we look at the instance list, where we see several instances of MuPDF in memory, using plenty of unmanaged space.

This confirms that this class is a likely culprit, so we can go ahead and draw an instance retention graph and find out why the memory is being held onto.

In this particular case, the graph is almost comically simple – MuPDF is being held on the finalizer queue.

That’s a little bit odd, and at this point we need to actually dig into our code and find out why.

Fixing the leak

Our example is relatively simple to navigate. We go to the finalizer for our implementation of MuPDF.

~MuPDF()
{
    if (this.m_pNativeObject != IntPtr.Zero)
    {
        this._Api.DisposeMuPDFClass(this.m_pNativeObject);
        this.m_pNativeObject = IntPtr.Zero;
        if (this._ImagePin.IsAllocated)
        {
            this._ImagePin.Free();
        }
    }
    Logger.Logging.logMessage("Finalized");
}

The application is logging each time the finalizer is run.

.NET only has a single finalizer thread, so because the logging system we’re using takes a long time to talk to the database, the thread is being blocked, preventing it from cleaning up the objects, and causing them to remain in memory for longer than they should.

If we look back to the timeline, we can also see something interesting now we’ve stopped generating load.

Rather than staying high and constant, the application’s memory usage is actually declining very slowly. So the memory is being freed after the logging finishes, but much more slowly than it gets allocated under load. This is why we didn’t notice the issue until the application was deployed in the wild.

In this case there are some easy fixes available to us. We could either remove the finalizer logging or troubleshoot the database query to reduce the latency. Because the logging was probably part of some debugging instrumentation in development, and this isn’t really a very sensible thing to be doing, we’ll just take it out. Alternatively, a much better solution would be to implement IDisposable.

At this point, we rebuild the application, and re-run the profiling session (using the same technique) to check that the problem is fixed.

We can see here that memory usage returns to acceptable levels much more rapidly than before.

Conclusions

Our walkthrough has shown a fairly simple troubleshooting example: debugging code that made its way into production. But the effects are real enough – a non-obvious memory leak whose consequences only manifest when the application is under heavy usage. The actual .NET memory usage does not look suspicious, and it’s only when we inspect the unmanaged memory consumed by the .NET code that the source of the problem emerges.

Note: Red Gate Software offers a free trial of ANTS Memory Profiler for you to try it out on your own application.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here