Introduction
I've been trying to figure out how to calculate the heap size of a managed object for a while now, but I've finally figured it out.
After trying methods from here, I came across this code snippet a while back from here which proved to be promising:
Marshal.ReadInt32(typeof(T).TypeHandle.Value, 4)
After further research, I discovered that TypeHandle.Value
is actually the pointer to a type's MethodTable, and that code snippet read a DWORD
from it. (This will be elaborated on later.)
Background
As you may already know, the layout of a managed object in the heap is as follows (64-bit):
Offset | Size | Type |
-8 | 8 | Object Header |
0 | 8 | MethodTable* |
8 | ... | Fields |
A MethodTable
contains a type's information necessary for the CLR. The first two fields inside the MethodTable
are used for calculation of the heap size:
Offset | Size | Type | Name |
0 | 4 | DWORD | m_dwFlags |
4 | 4 | DWORD | m_BaseSize |
If you haven't noticed, that code snippet from earlier read the DWORD
m_BaseSize
. However, the first DWORD
is also very important in calculating the size.
The engineers of the CLR are very creative in minimizing the size of objects. The lowest WORD
in m_dwFlags
is the component size of a type. If the type is an "array type" such as an int[]
or string
, the value of the lowest WORD
will be the size of one component (read: element). For example, for a string
, the component size will be 2 (sizeof(char)
), and for an int[]
, it will be 4 (sizeof(int)
). The other WORD
is used as flags.
Going back to the snippet above, the second DWORD
, m_BaseSize
, is the base instance size of the object when allocated on the heap. By default, this value is 24 (64-bit) or 12 (32-bit) because that is the minimum size of an object:
#define MIN_OBJECT_SIZE (2*sizeof(uint8_t*) + sizeof(ObjHeader))
m_BaseSize
alone is typically enough to calculate the heap size of an object, but there are two special types in the CLR that have dynamic sizes; that is, their sizes vary per instance. Those are strings
and arrays
. Therefore, the runtime uses this formula for calculating the size of objects in the heap:
MT->GetBaseSize() + ((OBJECTTYPEREF->GetSizeField() * MT->GetComponentSize())
In other words:
Base instance size + (length * component size)
For instance, the size of an object
would evaluate to this (64-bit):
24 + (1 * 0) == 24
Using this formula, we can calculate the heap size of any object.
Implementation
Disclaimer: This may be considered very evil.
Note: I aliased UInt32
as DWORD
and UInt16
as WORD
.
Thankfully, replicating the MethodTable
can be done easily thanks to the StructLayout
and FieldOffset
attribute:
[StructLayout(LayoutKind.Explicit)]
public unsafe struct MethodTable
{
[FieldOffset(0)] private DWFlags m_dwFlags;
[FieldOffset(4)] private DWORD m_BaseSize;
...
Because only one WORD
is used for component size, I made a separate struct
splitting the two WORDS
for convenience:
[StructLayout(LayoutKind.Explicit)]
internal struct DWFlags
{
[FieldOffset(0)] internal WORD m_componentSize;
[FieldOffset(2)] internal WORD m_flags;
...
Now that we have the representation of a MethodTable
, it's just a matter of acquiring it. Going back to TypeHandle.Value
, we know that it already points to a MethodTable*
, so now it's just a matter of casting it!
var methodTable = (MethodTable*) typeof(T).TypeHandle.Value;
Now we can calculate the heap size of any object at runtime. You can write your own methods for calculating it. Here is an example of my code to show how you can calculate the size:
public static int HeapSize<T>(ref T t) where T : class
{
var methodTable = (MethodTable*) typeof(T).TypeHandle.Value;
if (typeof(T).IsArray) {
var arr = t as Array;
return (int) methodTable->BaseSize + arr.Length * methodTable->ComponentSize;
}
if (t is string) {
var str = t as string;
return (int) methodTable->BaseSize + str.Length * methodTable->ComponentSize;
}
return (int) methodTable->BaseSize;
}
Note: I only followed the specified formula for array-type objects, because otherwise the formula would still evaluate to the base size.
Now all that's left to do is to verify it works.
string s = "foo";
HeapSize
gives us:
HeapSize(ref s) == 32
WinDbg
gives us:
!DumpObj /d 000001f98001bc08
Name: System.String
MethodTable: 00007fff1c1a6830
EEClass: 00007fff1ba86cb8
Size: 32(0x20) bytes
File: C:\WINDOWS\Microsoft.Net\assembly\GAC_64\mscorlib\
v4.0_4.0.0.0__b77a5c561934e089\mscorlib.dll
String: foo
And that is how the GC calculates the heap size of objects!