Introduction
During the last few years, I've been asked many times why I bother exercising my x86 and x64 assembly language skills, and especially why I find assembly language important to teach at courses, conferences, and one-off sessions. After all, .NET developers are light years away from the actual assembly code generated for their applications, and surely there can't arise a need to write by hand any assembly code.
I agree entirely with the sentiment that you won't often have to write assembly code by hand, unless you are working on a very low-level optimization; also, there would be no way to invoke your assembly code directly from a .NET program. However, I believe that all .NET developers should be able to read assembly code, mostly for debugging purposes but also for profiling and performance optimization.
In this article, I will show you some examples of where understanding assembly code and general stack structure -- details usually shielded from .NET developers -- will help debug otherwise impossible problems, without any "advanced" tools and even without Visual Studio. However, I will have to make some assumptions. This article presupposes basic familiarity with x86 assembly language, stack structure, calling conventions, and also some understanding of WinDbg and SOS commands. There are some excellent resources on the web that you can use to catch up on these topics:
Analyze a Corrupted or Incomplete Call Stack
It doesn't happen often, but even managed applications experience a stack corruption from time to time. Here are some possible causes for a stack corruption:
- Stack overflow -- because of an infinite recursion or a large repeated stack allocation
- P/Invoke stack imbalance -- mismatch of the managed and unmanaged function signatures
- Random memory corruption -- usually caused by an unmanaged component in the process
When a stack corruption occurs, it's often very difficult to determine the culprit because... the stack is corrupted! Any trace of what the application was doing at the time of the corruption might have been overwritten with garbage on the stack. In fact, even debugger commands -- such as !CLRStack
-- might not work properly. What can you do when a stack corruption occurs? Naturally, the only thing remaining is to walk the stack manually.
First, let's assume that the stack pointer (ESP
) has not been corrupted (whereas the EBP
register may have been corrupted). In that case, we know where the stack begins, and can start scanning backwards for execution residue. Namely, most frames on the stack preserve the EBP
register, making it possible to retrace execution by finding a pair {EBP, return address}
and following the linked list of frames starting from EBP
. Below is an analysis that follows these steps to reconstruct the stack:
0:000> !CLRStack
OS Thread Id: 0x3318 (0)
Child SP IP Call Site
00233000 00450818 FileExplorer.MainForm.RecursivelyFillTreeview
(System.Windows.Forms.TreeNode, System.String)
There's just one frame on the stack, and even though it looks valid, clearly the stack did not begin at that method and we are missing more frames. It's time to try reconstructing the stack manually from ESP
using the dds
command, which dumps memory and tries to resolve symbols. Unfortunately, because the code is managed, we will not have any valid symbols on the stack without help from an SOS command, such as !U
.
0:000> dds esp
00233000 00000000
00233004 00000000
00233008 00000000
0023300c 00000000
00233010 00000000
00233014 00000000
00233018 00000000
0023301c 00000000
00233020 0220e1ec
00233024 021e364c
00233028 00000000
0023302c 00000000
00233030 021e364c
00233034 00233058
00233038 0023307c
0023303c 00450826
00233040 021e513c
00233044 00000000
00233048 00000000
0023304c 00000000
00233050 00000000
00233054 00000000
00233058 00000000
0023305c 00000000
00233060 00000000
00233064 0220e1ec
00233068 021e364c
0023306c 00000000
00233070 00000000
00233074 021e364c
00233078 0023309c
0023307c 002330c0
The marked words on the stack look like an {EBP, return address}
pair. Why am I saying this? Because the first value is sufficiently close to the value of ESP
, which makes me confident that it points to the stack -- as EBP
should -- and the second value is sufficiently far away from the stack -- indeed, it should be an executable code address. To verify that it's an address, let's use the !U
command:
0:000> !u 00450826
Normal JIT generated code
FileExplorer.MainForm.RecursivelyFillTreeview(System.Windows.Forms.TreeNode, System.String)
Begin 004507d0, size f6
...snipped...
Indeed, this looks like a valid method, and we can continue guessing. If our guess for EBP
was right, it should point to another saved EBP
, which should be followed by another return address, enabling us to retrace the stack in full:
0:000> dds 0023307c L2
0023307c 002330c0
00233080 00450826
Sure enough, the first value again looks like a valid saved EBP
, and the second value is the exact same address as earlier, making it seem like a recursive function gone wild. We can repeat this procedure until we reach the top of the stack to obtain the entire call stack, which in this case would span hundreds of screens.
Another variation of stack corruptions worth mentioning is the situation where the ESP
register is corrupt as well, and we can't trust it to point to the actual stack. This is less frequent in simple stack overflow scenarios, but might happen due to a buffer overflow, a random memory corruption, or a wild stack imbalance. In that case, we have to obtain the top of the stack by other means. Fortunately, every Windows thread has a data structure called Thread Environment Block (TEB) which contains the range of its stack, and the !teb
debugger command can dump the current thread's TEB conveniently. Armed with this information, we can start walking the stack looking for {EBP, return address}
pairs.
0:000> dt ntdll!_NT_TIB
+0x000 ExceptionList : Ptr32 _EXCEPTION_REGISTRATION_RECORD
+0x004 StackBase : Ptr32 Void
+0x008 StackLimit : Ptr32 Void
+0x00c SubSystemTib : Ptr32 Void
+0x010 FiberData : Ptr32 Void
+0x010 Version : Uint4B
+0x014 ArbitraryUserPointer : Ptr32 Void
+0x018 Self : Ptr32 _NT_TIB
0:000> !teb
...snipped...
Correlate Crash Location to Source Code Line
Often times, you are facing a crash dump with a relatively simple exception in it, and want to resolve the root cause to a specific line of code. Commands such as !CLRStack
are renowned for not reporting source line information accurately, and if your method has hundreds of lines, finding the line of code that crashed might be akin to the famous needle in a haystack.
In cases like these, reading a little disassembly might be just the right thing to do. With help from the SOS !U
command, you will have hints in the generated disassembly pointing you to various .NET methods or CLR helpers your code is using. Isolating the offending instruction and correlating it to a specific line of code will usually be quite simple. Let's tackle an example -- we have the following exception call stack:
0:005> !PrintException
Exception object: 02c0fff0
Exception type: System.NullReferenceException
Message: Object reference not set to an instance of an object.
InnerException: <none>
StackTrace (generated):
SP IP Function
0530F370 00380A8A fileexplorer!FileExplorer.MainForm+<>c__DisplayClass1.
<treeView1_AfterSelect>b__0(System.Object)+0x4a
0530F3AC 67A3C958 mscorlib_ni!System.Threading.QueueUserWorkItemCallback.WaitCallback_Context
(System.Object)+0x3c
0530F3B4 67A20846 mscorlib_ni!System.Threading.ExecutionContext.Run
(System.Threading.ExecutionContext, System.Threading.ContextCallback,
System.Object, Boolean)+0xe6
0530F3D8 67A3D872 mscorlib_ni!System.Threading.QueueUserWorkItemCallback.
System.Threading.IThreadPoolWorkItem.ExecuteWorkItem()+0x5a
0530F3EC 67A3D0A7 mscorlib_ni!System.Threading.ThreadPoolWorkQueue.Dispatch()+0x13f
StackTraceString: <none>
HResult: 80004003
The exception occurred in a strangely-named function, <>c__DisplayClass1.<treeView1_AfterSelect>b__0
. If you've had some experience with ILDASM, you might know that this is the kind of name the C# compiler gives anonymous methods (or lambdas). specifically, treeView1_AfterSelect
is the method that contains the lambda we are looking at. But where inside the lambda did we crash? Source information is not available (perhaps we don't even have symbols for that frame), but we can inspect the disassembly at the faulting address:
0:005> !u 00380A8A
Normal JIT generated code
FileExplorer.MainForm+<>c__DisplayClass1.<treeView1_AfterSelect>b__0(System.Object)
Begin 00380a40, size e1
...snipped...
00380a6c 33d2 xor edx,edx
00380a6e 8955f0 mov dword ptr [ebp-10h],edx
00380a71 33d2 xor edx,edx
00380a73 8955dc mov dword ptr [ebp-24h],edx
00380a76 33d2 xor edx,edx
00380a78 8955e0 mov dword ptr [ebp-20h],edx
00380a7b c745ec00000000 mov dword ptr [ebp-14h],0
00380a82 90 nop
00380a83 90 nop
00380a84 8b45e4 mov eax,dword ptr [ebp-1Ch]
00380a87 8b4804 mov ecx,dword ptr [eax+4]
>>> 00380a8a 3909 cmp dword ptr [ecx],ecx
00380a8c e82fa8637a call System_Windows_Forms_ni+0x15b2c0 (7a9bb2c0)
(System.Windows.Forms.TreeNode.get_Name(), mdToken: 06004a49)
00380a91 8945d8 mov dword ptr [ebp-28h],eax
00380a94 8b4dd8 mov ecx,dword ptr [ebp-28h]
00380a97 e8acee6767 call mscorlib_ni+0x28f948 (679ff948)
(System.IO.Directory.GetFiles(System.String), mdToken: 06004245)
00380a9c 8945d4 mov dword ptr [ebp-2Ch],eax
00380a9f 8b45d4 mov eax,dword ptr [ebp-2Ch]
00380aa2 8945dc mov dword ptr [ebp-24h],eax
00380aa5 33d2 xor edx,edx
00380aa7 8955f0 mov dword ptr [ebp-10h],edx
00380aaa 90 nop
...snipped...
Looking at the disassembled code, we are now able to conclude what exactly caused the null
reference, and where we are in the function's code. Specifically, we crashed right before calling the TreeNode.get_Name()
method, which is the getter for the TreeNode.Name
property. The only thing that could have gone wrong immediately before the call is that the TreeNode
object was null
(indeed, the cmp
instruction we see is there for the sole reason of making sure the receiver of the call is not null
). Furthermore, we know that the result of the TreeNode.get_Name()
method call is then transferred into the ECX
register and passed to the Directory.GetFiles
method. This should be enough to identify the offending line of code in the source file:
ThreadPool.QueueUserWorkItem(_ =>
{
foreach (string file in Directory.GetFiles(node.Name))
{
listBox1.Items.Add(Path.GetFileName(file));
}
});
Determine Function Arguments
Another thing that might happen to you often is that you have a crash dump or a live debuggee, but are unable to retrieve function arguments from the stack. There are many commands that attempt to do so -- !CLRStack -p
is the managed option, kb
attempts to do the job for unmanaged frames, and the excellent SOSEX extension offers the !mk
command. Nonetheless, because of the variety of x86 calling conventions, and especially because the JIT uses a custom fastcall-resembling calling convention, at times neither of these commands will actually work.
For example, consider the following call stack, in which your thread is clearly waiting for a .NET monitor, in the Monitor.Enter
call:
0:000> !CLRStack
OS Thread Id: 0x2a88 (0)
ESP EIP
0037e8a8 76f2013d [GCFrame: 0037e8a8]
0037e978 76f2013d [HelperMethodFrame_1OBJ: 0037e978] System.Threading.Monitor.Enter(System.Object)
0037e9d0 003f0b68 FileExplorer.MainForm.listBox1_DoubleClick(System.Object, System.EventArgs)
0037ea34 5933407c System.Windows.Forms.Control.OnDoubleClick(System.EventArgs)
0037ea4c 59666146 System.Windows.Forms.ListBox.WndProc(System.Windows.Forms.Message ByRef)
0037eaf8 58e086a0 System.Windows.Forms.Control+ControlNativeWindow.OnMessage
(System.Windows.Forms.Message ByRef)
0037eb00 58e08621 System.Windows.Forms.Control+ControlNativeWindow.WndProc
(System.Windows.Forms.Message ByRef)
0037eb14 58e084fa System.Windows.Forms.NativeWindow.Callback(IntPtr, Int32, IntPtr, IntPtr)
0037ecb8 007c09e4 [NDirectMethodFrameStandalone: 0037ecb8]
System.Windows.Forms.UnsafeNativeMethods.DispatchMessageW(MSG ByRef)
0037ecc8 58e18cee System.Windows.Forms.Application+ComponentManager.
System.Windows.Forms.UnsafeNativeMethods.IMsoComponentManager.FPushMessageLoop(Int32, Int32, Int32)
0037ed64 58e18957 System.Windows.Forms.Application+ThreadContext.
RunMessageLoopInner(Int32, System.Windows.Forms.ApplicationContext)
0037edb8 58e187a1 System.Windows.Forms.Application+ThreadContext.
RunMessageLoop(Int32, System.Windows.Forms.ApplicationContext)
0037ede8 58dd5911 System.Windows.Forms.Application.Run(System.Windows.Forms.Form)
0037edfc 003f00ae FileExplorer.Program.Main()
0037f020 727b1b4c [GCFrame: 0037f020]
Well, one obvious thing to find out is which synchronization object your thread is locking, i.e., what was the argument passed to the Monitor.Enter
method. Trying !CLRStack -a
does not help:
0:000> !clrstack -a
OS Thread Id: 0x2a88 (0)
ESP EIP
0037e8a8 76f2013d [GCFrame: 0037e8a8]
0037e978 76f2013d [HelperMethodFrame_1OBJ: 0037e978] System.Threading.Monitor.Enter(System.Object)
0037e9d0 003f0b68 FileExplorer.MainForm.listBox1_DoubleClick(System.Object, System.EventArgs)
PARAMETERS:
this = 0x02708308
sender = 0x0271c4d4
e = 0x02c8f400
LOCALS:
0x0037e9f4 = 0x02c8f470
0x0037e9f0 = 0x02c8f4e8
0x0037ea00 = 0x00000001
0x0037e9ec = 0x02708990
0x0037e9e8 = 0x027089b4
...snipped...
As you see, SOS was not able to report the argument to Monitor.Enter
. Perhaps the unmanaged call stack will help?
0:000> kb
ChildEBP RetAddr Args to Child
0037e4ac 76600bdd 00000002 0037e4fc 00000001 ntdll!ZwWaitForMultipleObjects+0x15
0037e548 75541a2c 0037e4fc 0037e570 00000000 KERNELBASE!WaitForMultipleObjectsEx+0x100
0037e590 7545086a 00000002 7efde000 00000000 KERNEL32!WaitForMultipleObjectsExImplementation+0xe0
0037e5e4 764b2bf1 00000054 004d61e8 ffffffff USER32!RealMsgWaitForMultipleObjectsEx+0x14d
0037e610 764a202d 004d61e8 ffffffff 0037e638 ole32!CCliModalLoop::BlockFn+0xa1
0037e690 7285d245 00000002 ffffffff 00000001 ole32!CoWaitForMultipleHandles+0xcd
0037e6b0 7285d1a6 00000000 ffffffff 00000001 mscorwks!NT5WaitRoutine+0x51
0037e71c 7285d10a 00000001 004d61e8 00000000 mscorwks!MsgWaitHelper+0xa5
0037e73c 729142c8 00000001 004d61e8 00000000 mscorwks!Thread::DoAppropriateAptStateWait+0x28
0037e7c0 7291435d 00000001 004d61e8 00000000 mscorwks!Thread::DoAppropriateWaitWorker+0x13c
0037e810 729144e1 00000001 004d61e8 00000000 mscorwks!Thread::DoAppropriateWait+0x40
0037e86c 727b5422 ffffffff 00000001 00000000 mscorwks!CLREvent::WaitEx+0xf7
0037e880 728e98e2 ffffffff 00000001 00000000 mscorwks!CLREvent::Wait+0x17
0037e90c 729136e0 00497728 ffffffff 00497728 mscorwks!AwareLock::EnterEpilog+0x8c
0037e928 72913664 e6620e7e 0037ea18 02708308 mscorwks!AwareLock::Enter+0x61
0037e9c8 003f0b68 02c8f4e8 02c8f4c8 02c8f470 mscorwks!JIT_MonEnterWorker_Portable+0xb3
WARNING: Frame IP not in any known module. Following frames may be wrong.
0037ea28 5933407c 02c8f400 0271c6bc 0271c4d4 0x3f0b68
0037ea44 59666146 00000000 00000000 0037ea84 System_Windows_Forms_ni+0x72407c
...snipped...
Notice that the JIT_MonEnterWorker_Portable
frame corresponds to the Monitor.Enter
method call. How do I know this? By inspecting the return address: the unmanaged frame's return address is 003f0b68
, which is also the EIP
value for the listBox1_DoubleClick
method in the managed stack trace.
Now we can expect to find the first three arguments passed to Monitor.Enter
displayed in the unmanaged stack trace. Unfortunately, kb
reports correct argument information only when the arguments are passed through the stack -- it does not distinguish between the standard C and Win32 calling conventions, and the custom calling conventions used by the CLR JIT. In fact, in this case, if we were to continue down that path, we might have diagnosed the problem incorrectly!
Where do we find the argument, then? There's hardly anything left but to inspect the disassembly of the calling method and try to determine how the argument is passed to Monitor.Enter
:
0:000> !u 0x3f0b68
Normal JIT generated code
FileExplorer.MainForm.listBox1_DoubleClick(System.Object, System.EventArgs)
Begin 003f0a10, size 1ca
...snipped...
003f0b3b 8b55cc mov edx,dword ptr [ebp-34h]
003f0b3e 8b4dc8 mov ecx,dword ptr [ebp-38h]
003f0b41 3909 cmp dword ptr [ecx],ecx
003f0b43 e8f8cff471 call mscorlib_ni+0x68db40 (7233db40)
(System.Threading.Thread.Start(System.Object), mdToken: 060012b3)
003f0b48 90 nop
003f0b49 b9c8000000 mov ecx,0C8h
003f0b4e e82d87a971 call mscorlib_ni+0x1d9280 (71e89280)
(System.Threading.Thread.Sleep(Int32), mdToken: 060012d6)
003f0b53 90 nop
003f0b54 8b45d0 mov eax,dword ptr [ebp-30h]
003f0b57 8b8050010000 mov eax,dword ptr [eax+150h]
003f0b5d 8945c0 mov dword ptr [ebp-40h],eax
003f0b60 8b4dc0 mov ecx,dword ptr [ebp-40h]
003f0b63 e83d203c72 call mscorwks!JIT_MonEnterWorker (727b2ba5)
>>> 003f0b68 90 nop
003f0b69 90 nop
003f0b6a 8b4dc8 mov ecx,dword ptr [ebp-38h]
003f0b6d 3909 cmp dword ptr [ecx],ecx
003f0b6f e8dccdf471 call mscorlib_ni+0x68d950 (7233d950)
(System.Threading.Thread.Join(), mdToken: 060012d1)
...snipped...
Somewhere in the marked five lines, we have the argument passing process, but it does not go through the stack. Note that there are only two registers used -- EAX
and ECX
, and they are both initialized to the same value (found at the address EBP-40h
). Excellent -- all that's left is to obtain the value of either of these registers, and we're done!
...Not so fast, though. x86 registers are scarce, and are very likely to be reused across function calls. It stands to reason that both registers have been overwritten with other values, making it impossible to find what they contained previously. Indeed, their current values don't make sense:
0:000> r eax
eax=00000054
0:000> r ecx
ecx=00000000
Fortunately, we have EBP
to the rescue! Recall that to reconstruct the stack earlier, we had access to the entire EBP
chain that connects all the frames on the stack. This means we always have the EBP
value for any frame, and the k
command conveniently reports it for us:
0:000> k
ChildEBP RetAddr
0037e4ac 76600bdd ntdll!ZwWaitForMultipleObjects+0x15
0037e548 75541a2c KERNELBASE!WaitForMultipleObjectsEx+0x100
0037e590 7545086a KERNEL32!WaitForMultipleObjectsExImplementation+0xe0
0037e5e4 764b2bf1 USER32!RealMsgWaitForMultipleObjectsEx+0x14d
0037e610 764a202d ole32!CCliModalLoop::BlockFn+0xa1
0037e690 7285d245 ole32!CoWaitForMultipleHandles+0xcd
0037e6b0 7285d1a6 mscorwks!NT5WaitRoutine+0x51
0037e71c 7285d10a mscorwks!MsgWaitHelper+0xa5
0037e73c 729142c8 mscorwks!Thread::DoAppropriateAptStateWait+0x28
0037e7c0 7291435d mscorwks!Thread::DoAppropriateWaitWorker+0x13c
0037e810 729144e1 mscorwks!Thread::DoAppropriateWait+0x40
0037e86c 727b5422 mscorwks!CLREvent::WaitEx+0xf7
0037e880 728e98e2 mscorwks!CLREvent::Wait+0x17
0037e90c 729136e0 mscorwks!AwareLock::EnterEpilog+0x8c
0037e928 72913664 mscorwks!AwareLock::Enter+0x61
0037e9c8 003f0b68 mscorwks!JIT_MonEnterWorker_Portable+0xb3
WARNING: Frame IP not in any known module. Following frames may be wrong.
0037ea28 5933407c 0x3f0b68
0037ea44 59666146 System_Windows_Forms_ni+0x72407c
0037eaf0 58e086a0 System_Windows_Forms_ni+0xa56146
0037eaf8 58e08621 System_Windows_Forms_ni+0x1f86a0
Life is easy now. All we need to do is subtract 0x40
from this value and find the argument passed to Monitor.Enter
at that address:
0:000> dd 0037ea28-40 L1
0037e9e8 027089b4
0:000> !do -nofields 027089b4
Name: System.String
MethodTable: 71f20b70
EEClass: 71cdd66c
Size: 44(0x2c) bytes
(C:\Windows\assembly\GAC_32\mscorlib\2.0.0.0__b77a5c561934e089\mscorlib.dll)
String: SecondaryLock
Now we have something that definitely looks like an object, and we can verify that this object is indeed used for synchronization by inspecting the process' sync blocks with the !SyncBlk
command:
0:000> !syncblk
Index SyncBlock MonitorHeld Recursion Owning Thread Info SyncBlock Owner
16 004d61a4 3 1 00497728 2a88 0 02708990 System.String
17 004d61d4 3 1 004eb4d8 2504 5 027089b4 System.String
...snipped...
And there it is: not only we find the object our thread is waiting for, but we also have its owning thread, which allows further reconstruction of the application's wait chain.
Notice that the approach above worked because the method calling Monitor.Enter
passed to it a local variable, which was available on the stack. A similar approach would work for a method argument, but there might be more complex cases in which the argument would not be readily available on the calling method's stack frame. Still, armed with the knowledge we have about Monitor.Enter
, namely the fact it receives its argument through the ECX
register, we can inspect the disassembly of Monitor.Enter
:
0:000> u mscorwks!JIT_MonEnterWorker_Portable
mscorwks!JIT_MonEnterWorker_Portable:
729135bc 6a7c push 7Ch
729135be b8cc13ca72 mov eax,offset mscorwks! ?? ::FNODOBFM::`string'+0x1ccc4 (72ca13cc)
729135c3 e8c1eae9ff call mscorwks!_EH_prolog3_catch (727b2089)
729135c8 894dec mov dword ptr [ebp-14h],ecx
729135cb 33db xor ebx,ebx
729135cd 8d8d78ffffff lea ecx,[ebp-88h]
...snipped...
Very early on, Monitor.Enter
stores the parameter on the stack (this is often called "parameter spilling"), and we can expect to be able to retrieve it from there. Indeed, the EBP
value for the JIT_MonEnterWorker_Portable
frame was 0037e9c8
, and the argument address is at offset -0x14
from that location:
0:000> dd 0037e9c8-14 L1
0037e9b4 027089b4
Find the Static Root that References Your Object
A typical memory leak analysis session conducted using SOS involves identifying a bunch of objects that are being leaked (not freed) and then identifying the chain of references from some GC root that points to them. This is a fairly tedious process (profilers are much better at this), and it's even worse because at times the actual root information would not be available. One such case is when the root is a static
variable.
A typical root reference chain for a managed object that is retained by a static
GC root would have a pinned object array appear as the rooted object. Below is a typical reference chain. (Note that I am using an x64 example here -- it makes the memory search stage more interesting, and also gives some heterogeneity to the examples.)
0:010> !gcroot 0000000002bcaf58
...snipped...
DOMAIN(0000000000C1C5F0):HANDLE(Pinned):5017f8:Root:0000000012761018(System.Object[])->
00000000039b3c30(System.EventHandler)->
0000000002bcab38(System.Object[])->
0000000002bcf8d8(System.EventHandler)->
0000000002bcaf58(FileExplorer.MainForm+FileInformation)
This object array is ubiquitous, it would seem that all static
root references stem from it. Indeed (and this is a CLR implementation detail), static
fields are stored in this array and their retention as far as the GC is concerned is through it. This also makes it difficult to determine which static
field of which class is responsible for the static
reference. For example, in the reference chain above, it is apparent that there is a static
EventHandler
-typed field (which is likely an event) that retains the FileInformation
instance -- but it's very desirable to find the details of that static
field.
More than six years ago, Doug Stewart wrote a short blog post outlining the general process in cases like these. This process generally works, but requires some adaptation in the 64-bit era, so here goes. First, let's take a look at that rooted array:
0:010> !do 0000000012761018
Name: System.Object[]
MethodTable: 000007fef68858f8
EEClass: 000007fef649eb78
Size: 8192(0x2000) bytes
Array: Rank 1, Number of elements 1020, Type CLASS
Element Type: System.Object
Fields:
None
OK, so it's an array with 1020 elements, and one of these elements must be our event handler. Is it the case? Let's search its memory and make sure:
0:010> s -q 0000000012761018 L2000 00000000039b3c30
00000000`12762e10 00000000`039b3c30 00000000`0278b380
Sure enough, our event handler is one of the array elements, at the address 00000000`12762e10
. Now there are two key observations:
- The
EventHandler
instance ended up in the array somehow. Maybe if we can find other references to this array address, we can find who put it there and then determine whose static
field it is. - There is a reference from that
EventHandler
instance to one of our application's objects (eventually). Then there should be additional references to this array address, which shape the chain of references to our application's object.
Frankly, both of these are long shots, because it might be the case that the address is calculated dynamically, but let's give it a spin. Doug's original guidance at this point is to launch a memory search for any references to the array location, which would complete in a few seconds for a 32-bit address space; not so much for a 64-bit address space!
However, we are looking for references in managed code only, so no need to traverse the entire address space. It suffices to look at the address ranges of modules in the current AppDomain:
0:010> !dumpdomain
...snipped...
--------------------------------------
Domain 1: 0000000000c1c5f0
LowFrequencyHeap: 0000000000c1c638
HighFrequencyHeap: 0000000000c1c6c8
StubHeap: 0000000000c1c758
Stage: OPEN
SecurityDescriptor: 0000000000c1de90
Name: FileExplorer.exe
Assembly: 0000000000c3cd80 [C:\Windows\assembly\GAC_64\mscorlib\2.0.0.0__b77a5c561934e089\mscorlib.dll]
ClassLoader: 0000000000c3ce40
SecurityDescriptor: 0000000000c3cc40
Module Name
000007fef6461000 C:\Windows\assembly\GAC_64\mscorlib\2.0.0.0__b77a5c561934e089\mscorlib.dll
000007ff000f2568 C:\Windows\assembly\GAC_64\mscorlib\2.0.0.0__b77a5c561934e089\sortkey.nlp
000007ff000f2020 C:\Windows\assembly\GAC_64\mscorlib\2.0.0.0__b77a5c561934e089\sorttbls.nlp
Assembly: 0000000000c57480 [D:\courses\NET Debugging\Exercises\4_MemoryLeak\Binaries\FileExplorer.exe]
ClassLoader: 0000000000c57540
SecurityDescriptor: 0000000000c57390
Module Name
000007ff000433d0 D:\courses\NET Debugging\Exercises\4_MemoryLeak\Binaries\FileExplorer.exe
...many more of these guys...
Now we have a couple of module addresses and can constrain our memory search. It seems safe to start at 7ff`00000000
and go through a few hundred megabytes looking for our address. Generally speaking, the proper WinDbg
command here would be:
0:010> s -q 000007ff`00000000 L?00000000`40000000 00000000`12762e10
(...recall that we are looking for a full QWORD
.) The problem is that we might miss unaligned references to that address, which may occur if it is hard-coded into some instruction (e.g. a MOV
). So instead, we should be looking for the individual byte sequence, and remember that we are on a little endian architecture:
0:010> s -b 000007ff`00000000 L?00000000`40000000 10 2e 76 12
000007ff`001913d3 10 2e 76 12 00 00 00 00-48 8b 00 48 89 44 24 60 ..v.....H..H.D$`
000007ff`00191440 10 2e 76 12 00 00 00 00-48 8b d0 e8 60 c1 87 f7 ..v.....H...`...
Voila! Two references to the array location, and now let's take a look at them with the !U
command to see if they are code:
0:010> !u 000007ff`001913d3
Normal JIT generated code
FileExplorer.MainForm+FileInformation..ctor(System.String)
Begin 000007ff001912d0, size 18d
...snipped...
000007ff`001913d0 90 nop
000007ff`001913d1 48b8102e761200000000 mov rax,12762E10h
...snipped...
000007ff`0019143e 48b9102e761200000000 mov rcx,12762E10h
000007ff`00191448 488bd0 mov rdx,rax
...snipped...
They are both a match inside FileInformation
's constructor, which gives us an excellent clue where to look. Indeed, here's the source code showing the event registration sequence:
public FileInformation(string fullPath)
{
Path = fullPath;
Name = System.IO.Path.GetFileName(Path);
FirstFewLines = File.ReadAllLines(Path).Take(100).ToArray();
FileInformationNeedsRefresh += FileInformation_FileInformationNeedsRefresh;
}
Conclusion
Hopefully, you are now more convinced that basic assembly reading skills, understanding of calling conventions, and familiarity with the stack structure can provide actual benefits when debugging your .NET applications or analyzing crash dumps.
Assembly reading skills do not come automatically; you must practice them frequently. The best approach would be to compile a set of examples similar to the above and go through them periodically. If the agile guys are advocating code katas to practice TDD, why can't we have disassembly katas to practice our assembly reading skills?
Further Reading