Introduction
While understanding garbage collection fundamentals is vital to working with .NET, it is also important to understand how object allocation works. It shows you just how simple and performant it is, especially compared to the potentially blocking nature of native heap allocations. In a large, native, multi-threaded application, heap allocations can be major performance bottleneck which requires you to perform all sorts of custom heap management techniques. It's also harder to measure when this is happening because many of those details are hidden behind the OS's allocation APIs. More importantly, understanding this will give you clues to how you can mess up and make object allocation far less efficient.
In this article, I want to go through an example taken from Chapter 2 of Writing High-Performance .NET Code and then take it further with some additional examples that weren't covered in the book.
Viewing Object Allocation in a Debugger
Let's start with a simple object definition: completely empty.
class MyObject
{
}
static void Main(string[] args)
{
var x = new MyObject();
}
In order to examine what happens during allocation, we need to use a "real" debugger, like Windbg. Don't be afraid of this. If you need a quick primer on how to get started, look at the free sample chapter on this page, which will get you up and running in no time. It's not nearly as bad as you think.
Build the above program in Release mode for x86 (you can do x64 if you'd like, but the samples below are x86).
In Windbg, follow these steps to start and debug the program:
- Ctrl+E to execute a program. Navigate to and open the built executable file.
- Run command:
sxe ld clrjit
(this tells the debugger to break on loading any assembly with clrjit in the name, which you need loaded before the next steps) - Run command:
g
(continues execution) - When it breaks, run command:
.loadby sos clr
(loads .NET debugging tools) - Run command:
!bpmd ObjectAllocationFundamentals Program.Main
(Sets a breakpoint at the beginning of a method. The first argument is the name of the assembly. The second is the name of the method, including the class it is in.) - Run command:
g
Execution will break at the beginning of the Main
method, right before new()
is called. Open the Disassembly window to see the code.
Here is the Main
method's code, annotated for clarity:
mov ecx,006f3864h
call 006e2100
mov edi,eax
Note that the actual addresses will be different each time you execute the program. Step over (F10, or toolbar) a few times until call 006e2100
(or your equivalent) is highlighted. Then Step Into that (F11). Now you will see the primary allocation mechanism in .NET. It's extremely simple. Essentially, at the end of the current gen0
segment, there is a reserved bit of space which I will call the allocation buffer. If the allocation we're attempting can fit in there, we can update a couple of values and return immediately without more complicated work.
If I were to outline this in pseudocode, it would look like this:
if (object fits in current allocation buffer)
{
Increment a pointer, return address;
}
else
{
call JIT_New to do more complicated work in CLR
}
The actual assembly looks like this:
006e2100 8b4104 mov eax,dword ptr [ecx+4] ds:002b:006f3868=0000000c
006e2103 648b15300e0000 mov edx,dword ptr fs:[0E30h]
006e210a 034240 add eax,dword ptr [edx+40h]
006e210d 3b4244 cmp eax,dword ptr [edx+44h]
006e2110 7709 ja 006e211b
006e2112 894240 mov dword ptr [edx+40h],eax
006e2115 2b4104 sub eax,dword ptr [ecx+4]
006e2118 8908 mov dword ptr [eax],ecx
006e211a c3 ret
006e211b e914145f71 jmp clr!JIT_New (71cd3534)
In the fast path, there are only 9 instructions, including the return. That's incredibly efficient, especially compared to something like malloc
. Yes, that complexity is traded for time at the end of object lifetime, but so far, this is looking pretty good!
What happens in the slow path? The short answer is a lot. The following could all happen:
- A free slot somewhere in gen0 needs to be located
- A gen0 GC is triggered
- A full GC is triggered
- A new memory segment needs to be allocated from the operating system and assigned to the GC heap
- Objects with finalizers need extra bookkeeping
- Possibly more...
Another thing to notice is the size of the object: 0x0c
(12 decimal) bytes. As covered elsewhere, this is the minimum size for an object in a 32-bit process, even if there are no fields.
Now let's do the same experiment with an object that has a single int
field.
class MyObjectWithInt { int x; }
Follow the same steps as above to get into the allocation code.
The first line of the allocator on my run is:
00882100 8b4104 mov eax,dword ptr [ecx+4] ds:002b:00893874=0000000c
The only interesting thing is that the size of the object (0x0c) is exactly the same as before. The new int
field fits into the minimum size. You can see this by examining the object with the !DumpObject
command (or the abbreviated version: !do
). To get the address of the object after it has been allocated, step over instructions until you get to the ret
instruction. The address of the object is now in the eax
register, so open up the Registers view and see the value. On my computer, it has a value of 2372770. Now execute the command: !do 2372770
You should see similar output to this:
0:000> !do 2372770
Name: ConsoleApplication1.MyObjectWithInt
MethodTable: 00893870
EEClass: 008913dc
Size: 12(0xc) bytes
File: D:\Ben\My Documents\Visual Studio 2013\Projects\ConsoleApplication1\ConsoleApplication1\bin\Release\ConsoleApplication1.exe
Fields:
MT Field Offset Type VT Attr Value Name
70f63b04 4000001 4 System.Int32 1 instance 0 x
This is curious. The field is at offset 4 (and an int
has a length of 4), so that only accounts for 8 bytes (range 0-7). Offset 0 (i.e., the object's address) contains the method table pointer, so where are the other 4 bytes? This is the sync block and they are actually at offset -4 bytes, before the object's address. These are the 12 bytes.
Try it with a long
.
class MyObjectWithLong { long x; }
The first line of the allocator is now:
00f22100 8b4104 mov eax,dword ptr [ecx+4] ds:002b:00f33874=00000010
Showing a size of 0x10 (decimal 16 bytes), which we would expect now. 12 byte minimum object size, but 4 already in the overhead, so an extra 4 bytes for the 8 byte long
. And an examination of the allocated object shows an object size of 16 bytes as well.
0:000> !do 2932770
Name: ConsoleApplication1.MyObjectWithLong
MethodTable: 00f33870
EEClass: 00f313dc
Size: 16(0x10) bytes
File: D:\Ben\My Documents\Visual Studio 2013\Projects\ConsoleApplication1\ConsoleApplication1\bin\Release\ConsoleApplication1.exe
Fields:
MT Field Offset Type VT Attr Value Name
70f5b524 4000002 4 System.Int64 1 instance 0 x
If you put an object reference into the test class, you'll see the same thing as you did with the int
.
Finalizers
Now let's make it more interesting. What happens if the object has a finalizer? You may have heard that objects with finalizers have more overhead during GC. This is true--they will survive longer, require more CPU cycles, and generally cause things to be less efficient. But do finalizers also affect object allocation?
Recall that our Main
method above looked like this:
mov ecx,006f3864h
call 006e2100
mov edi,eax
If the object has a finalizer, however, it looks like this:
mov ecx,119386Ch
call clr!JIT_New (71cd3534)
mov esi,eax
We've lost our nifty allocation helper! We have to now jump directly to JIT_New
. Allocating an object that has a finalizer is a LOT slower than a normal object. More internal CLR structures need to be modified to track this object's lifetime. The cost isn't just at the end of object lifetime.
How much slower is it? In my own testing, it appears to be about 8-10x worse than the fast path of allocating a normal object. If you allocate a lot of objects, this difference is considerable. For this, and other reasons, just don't add a finalizer unless it really is required.
Calling the Constructor
If you are particularly eagle-eyed, you may have noticed that there was no call to a constructor to initialize the object once allocated. The allocator is changing some pointers, returning you an object, and there is no further function call on that object. This is because memory that belongs to a class field is always pre-initialized to 0
for you and these objects had no further initialization requirements. Let's see what happens if we change to the following definition:
class MyObjectWithInt { int x = 13
Now the Main
function looks like this:
mov ecx,0A43834h
call 00a32100
mov esi,eax
mov dword ptr [esi+4],0Dh
The field initialization was inlined into the caller!
Note that this code is exactly equivalent:
class MyObjectWithInt { int x
But what if we do this?
class MyObjectWithInt
{
int x;
[MethodImpl(MethodImplOptions.NoInlining)]
public MyObjectWithInt()
{
this.x = 13;
}
}
This explicitly disables inlining for the object constructor. There are other ways of preventing inlining, but this is the most direct.
Now we can see the call to the constructor happening after the memory allocation:
mov ecx,0F43834h
call 00f32100
mov esi,eax
mov ecx,esi
call dword ptr ds:[0F43854h]
Exercise for the Reader
Can you get the allocator shown above to jump to the slow path? How big does the allocation request have to be to trigger this? (Hint: Try allocating arrays of various sizes.) Can you figure this out by examining the registers and other values from the running code?
Summary
You can see that in most cases, allocation of objects in .NET is extremely fast and efficient, requiring no calls into the CLR and no complicated algorithms in the simple case. Avoid finalizers unless absolutely needed. Not only are they less efficient during cleanup in a garbage collection, but they are slower to allocate as well.
Play around with the sample code in the debugger to get a feel for this yourself.