Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / ASM

How Understanding Assembly Language Helps Debug .NET Applications

4.97/5 (16 votes)
15 Feb 2012CPOL14 min read 53.7K  
This article shows several examples of situations where understanding assembly language helps debug seemingly impossible problems with .NET applications.

Introduction

During the last few years, I've been asked many times why I bother exercising my x86 and x64 assembly language skills, and especially why I find assembly language important to teach at courses, conferences, and one-off sessions. After all, .NET developers are light years away from the actual assembly code generated for their applications, and surely there can't arise a need to write by hand any assembly code.

I agree entirely with the sentiment that you won't often have to write assembly code by hand, unless you are working on a very low-level optimization; also, there would be no way to invoke your assembly code directly from a .NET program. However, I believe that all .NET developers should be able to read assembly code, mostly for debugging purposes but also for profiling and performance optimization.

In this article, I will show you some examples of where understanding assembly code and general stack structure -- details usually shielded from .NET developers -- will help debug otherwise impossible problems, without any "advanced" tools and even without Visual Studio. However, I will have to make some assumptions. This article presupposes basic familiarity with x86 assembly language, stack structure, calling conventions, and also some understanding of WinDbg and SOS commands. There are some excellent resources on the web that you can use to catch up on these topics:

Analyze a Corrupted or Incomplete Call Stack

It doesn't happen often, but even managed applications experience a stack corruption from time to time. Here are some possible causes for a stack corruption:

  • Stack overflow -- because of an infinite recursion or a large repeated stack allocation
  • P/Invoke stack imbalance -- mismatch of the managed and unmanaged function signatures
  • Random memory corruption -- usually caused by an unmanaged component in the process

When a stack corruption occurs, it's often very difficult to determine the culprit because... the stack is corrupted! Any trace of what the application was doing at the time of the corruption might have been overwritten with garbage on the stack. In fact, even debugger commands -- such as !CLRStack -- might not work properly. What can you do when a stack corruption occurs? Naturally, the only thing remaining is to walk the stack manually.

First, let's assume that the stack pointer (ESP) has not been corrupted (whereas the EBP register may have been corrupted). In that case, we know where the stack begins, and can start scanning backwards for execution residue. Namely, most frames on the stack preserve the EBP register, making it possible to retrace execution by finding a pair {EBP, return address} and following the linked list of frames starting from EBP. Below is an analysis that follows these steps to reconstruct the stack:

0:000> !CLRStack
OS Thread Id: 0x3318 (0)
Child SP       IP Call Site
00233000 00450818 FileExplorer.MainForm.RecursivelyFillTreeview
(System.Windows.Forms.TreeNode, System.String)    

There's just one frame on the stack, and even though it looks valid, clearly the stack did not begin at that method and we are missing more frames. It's time to try reconstructing the stack manually from ESP using the dds command, which dumps memory and tries to resolve symbols. Unfortunately, because the code is managed, we will not have any valid symbols on the stack without help from an SOS command, such as !U.

0:000> dds esp
00233000  00000000
00233004  00000000
00233008  00000000
0023300c  00000000
00233010  00000000
00233014  00000000
00233018  00000000
0023301c  00000000
00233020  0220e1ec
00233024  021e364c
00233028  00000000
0023302c  00000000
00233030  021e364c
00233034  00233058
00233038  0023307c
0023303c  00450826
00233040  021e513c
00233044  00000000
00233048  00000000
0023304c  00000000
00233050  00000000
00233054  00000000
00233058  00000000
0023305c  00000000
00233060  00000000
00233064  0220e1ec
00233068  021e364c
0023306c  00000000
00233070  00000000
00233074  021e364c
00233078  0023309c
0023307c  002330c0    

The marked words on the stack look like an {EBP, return address} pair. Why am I saying this? Because the first value is sufficiently close to the value of ESP, which makes me confident that it points to the stack -- as EBP should -- and the second value is sufficiently far away from the stack -- indeed, it should be an executable code address. To verify that it's an address, let's use the !U command:

0:000> !u 00450826
Normal JIT generated code
FileExplorer.MainForm.RecursivelyFillTreeview(System.Windows.Forms.TreeNode, System.String)
Begin 004507d0, size f6
...snipped...    

Indeed, this looks like a valid method, and we can continue guessing. If our guess for EBP was right, it should point to another saved EBP, which should be followed by another return address, enabling us to retrace the stack in full:

0:000> dds 0023307c L2
0023307c  002330c0
00233080  00450826    

Sure enough, the first value again looks like a valid saved EBP, and the second value is the exact same address as earlier, making it seem like a recursive function gone wild. We can repeat this procedure until we reach the top of the stack to obtain the entire call stack, which in this case would span hundreds of screens.

Another variation of stack corruptions worth mentioning is the situation where the ESP register is corrupt as well, and we can't trust it to point to the actual stack. This is less frequent in simple stack overflow scenarios, but might happen due to a buffer overflow, a random memory corruption, or a wild stack imbalance. In that case, we have to obtain the top of the stack by other means. Fortunately, every Windows thread has a data structure called Thread Environment Block (TEB) which contains the range of its stack, and the !teb debugger command can dump the current thread's TEB conveniently. Armed with this information, we can start walking the stack looking for {EBP, return address} pairs.

0:000> dt ntdll!_NT_TIB
   +0x000 ExceptionList    : Ptr32 _EXCEPTION_REGISTRATION_RECORD
   +0x004 StackBase        : Ptr32 Void
   +0x008 StackLimit       : Ptr32 Void
   +0x00c SubSystemTib     : Ptr32 Void
   +0x010 FiberData        : Ptr32 Void
   +0x010 Version          : Uint4B
   +0x014 ArbitraryUserPointer : Ptr32 Void
   +0x018 Self             : Ptr32 _NT_TIB
 0:000> !teb
 ...snipped...    

Correlate Crash Location to Source Code Line

Often times, you are facing a crash dump with a relatively simple exception in it, and want to resolve the root cause to a specific line of code. Commands such as !CLRStack are renowned for not reporting source line information accurately, and if your method has hundreds of lines, finding the line of code that crashed might be akin to the famous needle in a haystack.

In cases like these, reading a little disassembly might be just the right thing to do. With help from the SOS !U command, you will have hints in the generated disassembly pointing you to various .NET methods or CLR helpers your code is using. Isolating the offending instruction and correlating it to a specific line of code will usually be quite simple. Let's tackle an example -- we have the following exception call stack:

0:005> !PrintException
Exception object: 02c0fff0
Exception type:   System.NullReferenceException
Message:          Object reference not set to an instance of an object.
InnerException:   <none>
StackTrace (generated):
    SP       IP       Function
    0530F370 00380A8A fileexplorer!FileExplorer.MainForm+<>c__DisplayClass1.
                      <treeView1_AfterSelect>b__0(System.Object)+0x4a
    0530F3AC 67A3C958 mscorlib_ni!System.Threading.QueueUserWorkItemCallback.WaitCallback_Context
                      (System.Object)+0x3c
    0530F3B4 67A20846 mscorlib_ni!System.Threading.ExecutionContext.Run
                      (System.Threading.ExecutionContext, System.Threading.ContextCallback, 
                      System.Object, Boolean)+0xe6
    0530F3D8 67A3D872 mscorlib_ni!System.Threading.QueueUserWorkItemCallback.
                      System.Threading.IThreadPoolWorkItem.ExecuteWorkItem()+0x5a
    0530F3EC 67A3D0A7 mscorlib_ni!System.Threading.ThreadPoolWorkQueue.Dispatch()+0x13f
StackTraceString: <none>
HResult: 80004003    

The exception occurred in a strangely-named function, <>c__DisplayClass1.<treeView1_AfterSelect>b__0. If you've had some experience with ILDASM, you might know that this is the kind of name the C# compiler gives anonymous methods (or lambdas). specifically, treeView1_AfterSelect is the method that contains the lambda we are looking at. But where inside the lambda did we crash? Source information is not available (perhaps we don't even have symbols for that frame), but we can inspect the disassembly at the faulting address:

0:005> !u 00380A8A 
Normal JIT generated code
FileExplorer.MainForm+<>c__DisplayClass1.<treeView1_AfterSelect>b__0(System.Object)
Begin 00380a40, size e1
...snipped...
00380a6c 33d2            xor     edx,edx
00380a6e 8955f0          mov     dword ptr [ebp-10h],edx
00380a71 33d2            xor     edx,edx
00380a73 8955dc          mov     dword ptr [ebp-24h],edx
00380a76 33d2            xor     edx,edx
00380a78 8955e0          mov     dword ptr [ebp-20h],edx
00380a7b c745ec00000000  mov     dword ptr [ebp-14h],0
00380a82 90              nop
00380a83 90              nop
00380a84 8b45e4          mov     eax,dword ptr [ebp-1Ch]
00380a87 8b4804          mov     ecx,dword ptr [eax+4]
>>> 00380a8a 3909        cmp     dword ptr [ecx],ecx
00380a8c e82fa8637a      call    System_Windows_Forms_ni+0x15b2c0 (7a9bb2c0) 
                         (System.Windows.Forms.TreeNode.get_Name(), mdToken: 06004a49)
00380a91 8945d8          mov     dword ptr [ebp-28h],eax
00380a94 8b4dd8          mov     ecx,dword ptr [ebp-28h]
00380a97 e8acee6767      call    mscorlib_ni+0x28f948 (679ff948) 
                         (System.IO.Directory.GetFiles(System.String), mdToken: 06004245)
00380a9c 8945d4          mov     dword ptr [ebp-2Ch],eax
00380a9f 8b45d4          mov     eax,dword ptr [ebp-2Ch]
00380aa2 8945dc          mov     dword ptr [ebp-24h],eax
00380aa5 33d2            xor     edx,edx
00380aa7 8955f0          mov     dword ptr [ebp-10h],edx
00380aaa 90              nop
...snipped...    

Looking at the disassembled code, we are now able to conclude what exactly caused the null reference, and where we are in the function's code. Specifically, we crashed right before calling the TreeNode.get_Name() method, which is the getter for the TreeNode.Name property. The only thing that could have gone wrong immediately before the call is that the TreeNode object was null (indeed, the cmp instruction we see is there for the sole reason of making sure the receiver of the call is not null). Furthermore, we know that the result of the TreeNode.get_Name() method call is then transferred into the ECX register and passed to the Directory.GetFiles method. This should be enough to identify the offending line of code in the source file:

C#
ThreadPool.QueueUserWorkItem(_ =>
{
    foreach (string file in Directory.GetFiles(node.Name))
    {
        listBox1.Items.Add(Path.GetFileName(file));
    }
});    

Determine Function Arguments

Another thing that might happen to you often is that you have a crash dump or a live debuggee, but are unable to retrieve function arguments from the stack. There are many commands that attempt to do so -- !CLRStack -p is the managed option, kb attempts to do the job for unmanaged frames, and the excellent SOSEX extension offers the !mk command. Nonetheless, because of the variety of x86 calling conventions, and especially because the JIT uses a custom fastcall-resembling calling convention, at times neither of these commands will actually work.

For example, consider the following call stack, in which your thread is clearly waiting for a .NET monitor, in the Monitor.Enter call:

0:000> !CLRStack
OS Thread Id: 0x2a88 (0)
ESP       EIP     
0037e8a8 76f2013d [GCFrame: 0037e8a8] 
0037e978 76f2013d [HelperMethodFrame_1OBJ: 0037e978] System.Threading.Monitor.Enter(System.Object)
0037e9d0 003f0b68 FileExplorer.MainForm.listBox1_DoubleClick(System.Object, System.EventArgs)
0037ea34 5933407c System.Windows.Forms.Control.OnDoubleClick(System.EventArgs)
0037ea4c 59666146 System.Windows.Forms.ListBox.WndProc(System.Windows.Forms.Message ByRef)
0037eaf8 58e086a0 System.Windows.Forms.Control+ControlNativeWindow.OnMessage
(System.Windows.Forms.Message ByRef)
0037eb00 58e08621 System.Windows.Forms.Control+ControlNativeWindow.WndProc
(System.Windows.Forms.Message ByRef)
0037eb14 58e084fa System.Windows.Forms.NativeWindow.Callback(IntPtr, Int32, IntPtr, IntPtr)
0037ecb8 007c09e4 [NDirectMethodFrameStandalone: 0037ecb8] 
System.Windows.Forms.UnsafeNativeMethods.DispatchMessageW(MSG ByRef)
0037ecc8 58e18cee System.Windows.Forms.Application+ComponentManager.
System.Windows.Forms.UnsafeNativeMethods.IMsoComponentManager.FPushMessageLoop(Int32, Int32, Int32)
0037ed64 58e18957 System.Windows.Forms.Application+ThreadContext.
RunMessageLoopInner(Int32, System.Windows.Forms.ApplicationContext)
0037edb8 58e187a1 System.Windows.Forms.Application+ThreadContext.
RunMessageLoop(Int32, System.Windows.Forms.ApplicationContext)
0037ede8 58dd5911 System.Windows.Forms.Application.Run(System.Windows.Forms.Form)
0037edfc 003f00ae FileExplorer.Program.Main()
0037f020 727b1b4c [GCFrame: 0037f020]    

Well, one obvious thing to find out is which synchronization object your thread is locking, i.e., what was the argument passed to the Monitor.Enter method. Trying !CLRStack -a does not help:

0:000> !clrstack -a
OS Thread Id: 0x2a88 (0)
ESP       EIP     
0037e8a8 76f2013d [GCFrame: 0037e8a8] 
0037e978 76f2013d [HelperMethodFrame_1OBJ: 0037e978] System.Threading.Monitor.Enter(System.Object)
0037e9d0 003f0b68 FileExplorer.MainForm.listBox1_DoubleClick(System.Object, System.EventArgs)
    PARAMETERS:
        this = 0x02708308
        sender = 0x0271c4d4
        e = 0x02c8f400
    LOCALS:
        0x0037e9f4 = 0x02c8f470
        0x0037e9f0 = 0x02c8f4e8
        0x0037ea00 = 0x00000001
        0x0037e9ec = 0x02708990
        0x0037e9e8 = 0x027089b4
...snipped...    

As you see, SOS was not able to report the argument to Monitor.Enter. Perhaps the unmanaged call stack will help?

0:000> kb
ChildEBP RetAddr  Args to Child              
0037e4ac 76600bdd 00000002 0037e4fc 00000001 ntdll!ZwWaitForMultipleObjects+0x15
0037e548 75541a2c 0037e4fc 0037e570 00000000 KERNELBASE!WaitForMultipleObjectsEx+0x100
0037e590 7545086a 00000002 7efde000 00000000 KERNEL32!WaitForMultipleObjectsExImplementation+0xe0
0037e5e4 764b2bf1 00000054 004d61e8 ffffffff USER32!RealMsgWaitForMultipleObjectsEx+0x14d
0037e610 764a202d 004d61e8 ffffffff 0037e638 ole32!CCliModalLoop::BlockFn+0xa1
0037e690 7285d245 00000002 ffffffff 00000001 ole32!CoWaitForMultipleHandles+0xcd
0037e6b0 7285d1a6 00000000 ffffffff 00000001 mscorwks!NT5WaitRoutine+0x51
0037e71c 7285d10a 00000001 004d61e8 00000000 mscorwks!MsgWaitHelper+0xa5
0037e73c 729142c8 00000001 004d61e8 00000000 mscorwks!Thread::DoAppropriateAptStateWait+0x28
0037e7c0 7291435d 00000001 004d61e8 00000000 mscorwks!Thread::DoAppropriateWaitWorker+0x13c
0037e810 729144e1 00000001 004d61e8 00000000 mscorwks!Thread::DoAppropriateWait+0x40
0037e86c 727b5422 ffffffff 00000001 00000000 mscorwks!CLREvent::WaitEx+0xf7
0037e880 728e98e2 ffffffff 00000001 00000000 mscorwks!CLREvent::Wait+0x17
0037e90c 729136e0 00497728 ffffffff 00497728 mscorwks!AwareLock::EnterEpilog+0x8c
0037e928 72913664 e6620e7e 0037ea18 02708308 mscorwks!AwareLock::Enter+0x61
0037e9c8 003f0b68 02c8f4e8 02c8f4c8 02c8f470 mscorwks!JIT_MonEnterWorker_Portable+0xb3
WARNING: Frame IP not in any known module. Following frames may be wrong.
0037ea28 5933407c 02c8f400 0271c6bc 0271c4d4 0x3f0b68
0037ea44 59666146 00000000 00000000 0037ea84 System_Windows_Forms_ni+0x72407c
...snipped...    

Notice that the JIT_MonEnterWorker_Portable frame corresponds to the Monitor.Enter method call. How do I know this? By inspecting the return address: the unmanaged frame's return address is 003f0b68, which is also the EIP value for the listBox1_DoubleClick method in the managed stack trace.

Now we can expect to find the first three arguments passed to Monitor.Enter displayed in the unmanaged stack trace. Unfortunately, kb reports correct argument information only when the arguments are passed through the stack -- it does not distinguish between the standard C and Win32 calling conventions, and the custom calling conventions used by the CLR JIT. In fact, in this case, if we were to continue down that path, we might have diagnosed the problem incorrectly!

Where do we find the argument, then? There's hardly anything left but to inspect the disassembly of the calling method and try to determine how the argument is passed to Monitor.Enter:

0:000> !u 0x3f0b68
Normal JIT generated code
FileExplorer.MainForm.listBox1_DoubleClick(System.Object, System.EventArgs)
Begin 003f0a10, size 1ca
...snipped...
003f0b3b 8b55cc          mov     edx,dword ptr [ebp-34h]
003f0b3e 8b4dc8          mov     ecx,dword ptr [ebp-38h]
003f0b41 3909            cmp     dword ptr [ecx],ecx
003f0b43 e8f8cff471      call    mscorlib_ni+0x68db40 (7233db40) 
(System.Threading.Thread.Start(System.Object), mdToken: 060012b3)
003f0b48 90              nop
003f0b49 b9c8000000      mov     ecx,0C8h
003f0b4e e82d87a971      call    mscorlib_ni+0x1d9280 (71e89280) 
(System.Threading.Thread.Sleep(Int32), mdToken: 060012d6)
003f0b53 90              nop
003f0b54 8b45d0          mov     eax,dword ptr [ebp-30h]
003f0b57 8b8050010000    mov     eax,dword ptr [eax+150h]
003f0b5d 8945c0          mov     dword ptr [ebp-40h],eax
003f0b60 8b4dc0          mov     ecx,dword ptr [ebp-40h]
003f0b63 e83d203c72      call    mscorwks!JIT_MonEnterWorker (727b2ba5)
>>> 003f0b68 90          nop
003f0b69 90              nop
003f0b6a 8b4dc8          mov     ecx,dword ptr [ebp-38h]
003f0b6d 3909            cmp     dword ptr [ecx],ecx
003f0b6f e8dccdf471      call    mscorlib_ni+0x68d950 (7233d950) 
(System.Threading.Thread.Join(), mdToken: 060012d1)
...snipped...    

Somewhere in the marked five lines, we have the argument passing process, but it does not go through the stack. Note that there are only two registers used -- EAX and ECX, and they are both initialized to the same value (found at the address EBP-40h). Excellent -- all that's left is to obtain the value of either of these registers, and we're done!

...Not so fast, though. x86 registers are scarce, and are very likely to be reused across function calls. It stands to reason that both registers have been overwritten with other values, making it impossible to find what they contained previously. Indeed, their current values don't make sense:

0:000> r eax
eax=00000054
0:000> r ecx
ecx=00000000    

Fortunately, we have EBP to the rescue! Recall that to reconstruct the stack earlier, we had access to the entire EBP chain that connects all the frames on the stack. This means we always have the EBP value for any frame, and the k command conveniently reports it for us:

0:000> k
ChildEBP RetAddr  
0037e4ac 76600bdd ntdll!ZwWaitForMultipleObjects+0x15
0037e548 75541a2c KERNELBASE!WaitForMultipleObjectsEx+0x100
0037e590 7545086a KERNEL32!WaitForMultipleObjectsExImplementation+0xe0
0037e5e4 764b2bf1 USER32!RealMsgWaitForMultipleObjectsEx+0x14d
0037e610 764a202d ole32!CCliModalLoop::BlockFn+0xa1
0037e690 7285d245 ole32!CoWaitForMultipleHandles+0xcd
0037e6b0 7285d1a6 mscorwks!NT5WaitRoutine+0x51
0037e71c 7285d10a mscorwks!MsgWaitHelper+0xa5
0037e73c 729142c8 mscorwks!Thread::DoAppropriateAptStateWait+0x28
0037e7c0 7291435d mscorwks!Thread::DoAppropriateWaitWorker+0x13c
0037e810 729144e1 mscorwks!Thread::DoAppropriateWait+0x40
0037e86c 727b5422 mscorwks!CLREvent::WaitEx+0xf7
0037e880 728e98e2 mscorwks!CLREvent::Wait+0x17
0037e90c 729136e0 mscorwks!AwareLock::EnterEpilog+0x8c
0037e928 72913664 mscorwks!AwareLock::Enter+0x61
0037e9c8 003f0b68 mscorwks!JIT_MonEnterWorker_Portable+0xb3
WARNING: Frame IP not in any known module. Following frames may be wrong.
0037ea28 5933407c 0x3f0b68
0037ea44 59666146 System_Windows_Forms_ni+0x72407c
0037eaf0 58e086a0 System_Windows_Forms_ni+0xa56146
0037eaf8 58e08621 System_Windows_Forms_ni+0x1f86a0    

Life is easy now. All we need to do is subtract 0x40 from this value and find the argument passed to Monitor.Enter at that address:

0:000> dd 0037ea28-40 L1
0037e9e8  027089b4
0:000> !do -nofields 027089b4
Name: System.String
MethodTable: 71f20b70
EEClass: 71cdd66c
Size: 44(0x2c) bytes
 (C:\Windows\assembly\GAC_32\mscorlib\2.0.0.0__b77a5c561934e089\mscorlib.dll)
String: SecondaryLock    

Now we have something that definitely looks like an object, and we can verify that this object is indeed used for synchronization by inspecting the process' sync blocks with the !SyncBlk command:

0:000> !syncblk
Index SyncBlock MonitorHeld Recursion Owning Thread Info  SyncBlock Owner
   16 004d61a4            3         1 00497728  2a88   0   02708990 System.String
   17 004d61d4            3         1 004eb4d8  2504   5   027089b4 System.String
...snipped...    

And there it is: not only we find the object our thread is waiting for, but we also have its owning thread, which allows further reconstruction of the application's wait chain.

Notice that the approach above worked because the method calling Monitor.Enter passed to it a local variable, which was available on the stack. A similar approach would work for a method argument, but there might be more complex cases in which the argument would not be readily available on the calling method's stack frame. Still, armed with the knowledge we have about Monitor.Enter, namely the fact it receives its argument through the ECX register, we can inspect the disassembly of Monitor.Enter:

0:000> u mscorwks!JIT_MonEnterWorker_Portable
mscorwks!JIT_MonEnterWorker_Portable:
729135bc 6a7c            push    7Ch
729135be b8cc13ca72      mov     eax,offset mscorwks! ?? ::FNODOBFM::`string'+0x1ccc4 (72ca13cc)
729135c3 e8c1eae9ff      call    mscorwks!_EH_prolog3_catch (727b2089)
729135c8 894dec          mov     dword ptr [ebp-14h],ecx
729135cb 33db            xor     ebx,ebx
729135cd 8d8d78ffffff    lea     ecx,[ebp-88h]
...snipped...    

Very early on, Monitor.Enter stores the parameter on the stack (this is often called "parameter spilling"), and we can expect to be able to retrieve it from there. Indeed, the EBP value for the JIT_MonEnterWorker_Portable frame was 0037e9c8, and the argument address is at offset -0x14 from that location:

0:000> dd 0037e9c8-14 L1
0037e9b4  027089b4    

Find the Static Root that References Your Object

A typical memory leak analysis session conducted using SOS involves identifying a bunch of objects that are being leaked (not freed) and then identifying the chain of references from some GC root that points to them. This is a fairly tedious process (profilers are much better at this), and it's even worse because at times the actual root information would not be available. One such case is when the root is a static variable.

A typical root reference chain for a managed object that is retained by a static GC root would have a pinned object array appear as the rooted object. Below is a typical reference chain. (Note that I am using an x64 example here -- it makes the memory search stage more interesting, and also gives some heterogeneity to the examples.)

0:010> !gcroot 0000000002bcaf58 
...snipped...
DOMAIN(0000000000C1C5F0):HANDLE(Pinned):5017f8:Root:0000000012761018(System.Object[])-> 
00000000039b3c30(System.EventHandler)-> 
0000000002bcab38(System.Object[])-> 
0000000002bcf8d8(System.EventHandler)-> 
0000000002bcaf58(FileExplorer.MainForm+FileInformation)    

This object array is ubiquitous, it would seem that all static root references stem from it. Indeed (and this is a CLR implementation detail), static fields are stored in this array and their retention as far as the GC is concerned is through it. This also makes it difficult to determine which static field of which class is responsible for the static reference. For example, in the reference chain above, it is apparent that there is a static EventHandler-typed field (which is likely an event) that retains the FileInformation instance -- but it's very desirable to find the details of that static field.

More than six years ago, Doug Stewart wrote a short blog post outlining the general process in cases like these. This process generally works, but requires some adaptation in the 64-bit era, so here goes. First, let's take a look at that rooted array:

0:010> !do 0000000012761018 
Name: System.Object[] 
MethodTable: 000007fef68858f8 
EEClass: 000007fef649eb78 
Size: 8192(0x2000) bytes 
Array: Rank 1, Number of elements 1020, Type CLASS 
Element Type: System.Object 
Fields: 
None    

OK, so it's an array with 1020 elements, and one of these elements must be our event handler. Is it the case? Let's search its memory and make sure:

0:010> s -q 0000000012761018 L2000 00000000039b3c30 
00000000`12762e10  00000000`039b3c30 00000000`0278b380    

Sure enough, our event handler is one of the array elements, at the address 00000000`12762e10. Now there are two key observations:

  1. The EventHandler instance ended up in the array somehow. Maybe if we can find other references to this array address, we can find who put it there and then determine whose static field it is.
  2. There is a reference from that EventHandler instance to one of our application's objects (eventually). Then there should be additional references to this array address, which shape the chain of references to our application's object.

Frankly, both of these are long shots, because it might be the case that the address is calculated dynamically, but let's give it a spin. Doug's original guidance at this point is to launch a memory search for any references to the array location, which would complete in a few seconds for a 32-bit address space; not so much for a 64-bit address space!

However, we are looking for references in managed code only, so no need to traverse the entire address space. It suffices to look at the address ranges of modules in the current AppDomain:

0:010> !dumpdomain 
...snipped... 
-------------------------------------- 
Domain 1: 0000000000c1c5f0 
LowFrequencyHeap: 0000000000c1c638 
HighFrequencyHeap: 0000000000c1c6c8 
StubHeap: 0000000000c1c758 
Stage: OPEN 
SecurityDescriptor: 0000000000c1de90 
Name: FileExplorer.exe 
Assembly: 0000000000c3cd80 [C:\Windows\assembly\GAC_64\mscorlib\2.0.0.0__b77a5c561934e089\mscorlib.dll]
ClassLoader: 0000000000c3ce40 
SecurityDescriptor: 0000000000c3cc40 
  Module Name 
000007fef6461000 C:\Windows\assembly\GAC_64\mscorlib\2.0.0.0__b77a5c561934e089\mscorlib.dll
000007ff000f2568 C:\Windows\assembly\GAC_64\mscorlib\2.0.0.0__b77a5c561934e089\sortkey.nlp
000007ff000f2020 C:\Windows\assembly\GAC_64\mscorlib\2.0.0.0__b77a5c561934e089\sorttbls.nlp
Assembly: 0000000000c57480 [D:\courses\NET Debugging\Exercises\4_MemoryLeak\Binaries\FileExplorer.exe] 
ClassLoader: 0000000000c57540 
SecurityDescriptor: 0000000000c57390 
  Module Name 
000007ff000433d0 D:\courses\NET Debugging\Exercises\4_MemoryLeak\Binaries\FileExplorer.exe 
...many more of these guys...    

Now we have a couple of module addresses and can constrain our memory search. It seems safe to start at 7ff`00000000 and go through a few hundred megabytes looking for our address. Generally speaking, the proper WinDbg command here would be:

0:010> s -q 000007ff`00000000 L?00000000`40000000 00000000`12762e10    

(...recall that we are looking for a full QWORD.) The problem is that we might miss unaligned references to that address, which may occur if it is hard-coded into some instruction (e.g. a MOV). So instead, we should be looking for the individual byte sequence, and remember that we are on a little endian architecture:

0:010> s -b 000007ff`00000000 L?00000000`40000000 10 2e 76 12
000007ff`001913d3  10 2e 76 12 00 00 00 00-48 8b 00 48 89 44 24 60  ..v.....H..H.D$` 
000007ff`00191440  10 2e 76 12 00 00 00 00-48 8b d0 e8 60 c1 87 f7  ..v.....H...`...    

Voila! Two references to the array location, and now let's take a look at them with the !U command to see if they are code:

0:010> !u 000007ff`001913d3 
Normal JIT generated code 
FileExplorer.MainForm+FileInformation..ctor(System.String) 
Begin 000007ff001912d0, size 18d 
...snipped...
000007ff`001913d0 90              nop 
000007ff`001913d1 48b8102e761200000000 mov rax,12762E10h 
...snipped...
000007ff`0019143e 48b9102e761200000000 mov rcx,12762E10h 
000007ff`00191448 488bd0          mov     rdx,rax 
...snipped...    

They are both a match inside FileInformation's constructor, which gives us an excellent clue where to look. Indeed, here's the source code showing the event registration sequence:

C#
public FileInformation(string fullPath)
{
    Path = fullPath;
    Name = System.IO.Path.GetFileName(Path);
    FirstFewLines = File.ReadAllLines(Path).Take(100).ToArray();
    FileInformationNeedsRefresh += FileInformation_FileInformationNeedsRefresh;
}    

Conclusion

Hopefully, you are now more convinced that basic assembly reading skills, understanding of calling conventions, and familiarity with the stack structure can provide actual benefits when debugging your .NET applications or analyzing crash dumps.

Assembly reading skills do not come automatically; you must practice them frequently. The best approach would be to compile a set of examples similar to the above and go through them periodically. If the agile guys are advocating code katas to practice TDD, why can't we have disassembly katas to practice our assembly reading skills?

Further Reading

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)