Introduction
This weekend, I wrote a tiny generic method DirectCast
with the DynamicAssembly
and ILGenerator
, which opened a small hole to the mysterious internals of the CLR.
This part has the following sections. All criticism is welcome.
- The Tiny, Wicked DirectCast Method
- Not-so-true and Not-so-false Booleans
- Mission Possible: Modifying private Fields
- How This Works?
- Casting Collections for Contravariance
- Points of Interest
- History
This tiny method DirectCast<TX, TY>
discussed in this article can be built with the help of the DynamicAssembly
class and less than 20 lines of code. Here it is:
static void Main(string[] args) {
const string FileName = "ClrHacker.dll";
var a = AppDomain.CurrentDomain
.DefineDynamicAssembly(new AssemblyName("ClrHacker"), AssemblyBuilderAccess.RunAndSave);
var mod = a.DefineDynamicModule(FileName);
var type = mod.DefineType("ClrHacker",
TypeAttributes.Public | TypeAttributes.Sealed |
TypeAttributes.Abstract | TypeAttributes.Class);
CreateDirectCastMethod(type);
type.CreateType();
a.Save(FileName);
Console.WriteLine(FileName + " was saved. Now you can reference it in other projects.");
}
The CreateDirectCastMethod
is listed below:
static void CreateDirectCastMethod(TypeBuilder type) {
var m = type.DefineMethod("DirectCast", MethodAttributes.Public | MethodAttributes.Static);
var g = m.DefineGenericParameters("TX", "TY");
m.SetParameters(g[0]);
m.SetReturnType(g[1]);
var il = m.GetILGenerator();
il.Emit(OpCodes.Ldarg_0);
il.Emit(OpCodes.Ret);
}
Bulid and run the program. You will get a small assembly ClrHacker.dll with a single generic method named DirectCast
in the class ClrHacker
, which looks like the following in ILSpy
. This little method DirectCast
is the star of this article. It takes an instance of TX
and returns it as TY
.
public static TY DirectCast<TX, TY>(TX P_0)
{
return (TY)P_0;
}
You can never build such a method with the C# compiler since it will not allow you to directly cast P_0
, which is of an unknown type TX
, to another unknown type TY
, where both type parameters have no restriction or relationship at all.
Nor can you build a working DynamicMethod
like this, as a verification exception will be thrown when you run that method, if you used the code in CreateDirectCastMethod
to build a similar DynamicMethod
.
However, when I built a DynamicAssembly
, saved it to the disk, and referenced it from another project, the above method managed to run and produced some fancy results.
Many C# programmers know that Boolean
values can only be either true
or false
. How is it possible to be neither true
nor false
at the same time?
Here is how it comes.
The Underlying Values of True and False
With DirectCast<TX, TY>
, we can cast boolean
values to integer values. Let's run the following code, we will read that t(rue) = 1
and f(alse) = 0
on the console.
var t = ClrHacker.DirectCast<bool, byte>(true);
var f = ClrHacker.DirectCast<bool, byte>(false);
Console.WriteLine("t(rue) = " + t.ToString());
Console.WriteLine("f(alse) = " + f.ToString());
Actually, boolean values are 1
or 0
in CLR.
More bool Values and the Comparison Among Them
Reversibly, we can use DirectCast<TX, TY>
to make some other bool
values by casting integer values to bool
!
var TRUE = true;
var FALSE = false;
bool b8 = ClrHacker.DirectCast<byte, bool>(8);
Console.WriteLine("b8 = " + b8.ToString());
Console.WriteLine("b8 == true: " + (b8 == true).ToString());
Console.WriteLine("b8 == false: " + (b8 == false).ToString());
Console.WriteLine("b8 == TRUE: " + (b8 == TRUE).ToString());
Console.WriteLine("b8 == FALSE: " + (b8 == FALSE).ToString());
In the above code, we firstly assigned to local variable TRUE
and FALSE
to corresponding bool
values, and we directly cast the number 8 to a bool
value and store it to a local variable b8
. The result is shown below:
b8 = True
b8 == true: True
b8 == false: False
b8 == TRUE: False
b8 == FALSE: False
Here is the reading of the results:
- Executing the
Boolean.ToString
method against the b8
(remember that it is actually 8
), it prints True
. That's expected. (b8 == true).ToString
will be compiled to do the same thing as b8.ToString
. (You can verify this and the below with ILSpy). (b8 == false).ToString
will be compiled to load b8
and number 0 onto the evaluation stack and perform a ceq
(check equality) operation. (b8 == TRUE)
will be compiled to load b8
and the Boolean
variable TRUE
onto the evaluation stack and perform ceq
operation. - And similar things will happen to the
(b8 == FALSE)
.
Examine the last two lines and we will see that there is a bool
value that neither equals to an existing true
value nor equals to another existing false
value.
Now we make another numeric boolean value again with number 7.
bool b7 = ClrHacker.DirectCast<byte, bool>(7);
Console.WriteLine("b7 = " + b7.ToString());
Console.WriteLine("b7 == true: " + (b7 == true).ToString());
Console.WriteLine("b7 == false: " + (b7 == false).ToString());
Console.WriteLine("b7 == TRUE: " + (b7 == TRUE).ToString());
Console.WriteLine("b7 == FALSE: " + (b7 == FALSE).ToString());
Guess what you will see and compare with the output.
b7 = True
b7 == true: True
b7 == false: False
b7 == TRUE: False
b7 == FALSE: False
Nothing new here, if you have already got used to the not-so-true and not-so-false boolean values. So, how about comparing b8
with b7
?
Console.WriteLine("b8 = " + b8.ToString());
Console.WriteLine("b7 = " + b7.ToString());
Console.WriteLine("b8 == b7: " + (b8 == b7).ToString());
The output shows both b8
and b7
all prints True
to the console, but they don't equal to each other. The reason is actually very straightforward, they are actually different numbers.
b8 = True
b7 = True
b8 == b7: False
Proving bool Value is Single Byte
sizeof(bool)
will tell you that a bool
value is a single byte thing.
With DirectCast
, we write the following code:
bool b15 = ClrHacker.DirectCast<int, bool>(15);
bool b255 = ClrHacker.DirectCast<int, bool>(255);
bool b1023 = ClrHacker.DirectCast<int, bool>(1023);
bool b1024 = ClrHacker.DirectCast<int, bool>(1024);
Console.WriteLine("b255 == b15: " + (b255 == b15).ToString());
Console.WriteLine("b255 == b1023: " + (b255 == b1023).ToString());
Console.WriteLine("b255 == b1024: " + (b255 == b1024).ToString());
Here, we have four bool
values, made from numbers 15
, 255
, 1023
and 1024
repectively. The reason why those four numbers are chosen is that the first two are within the range of a byte, and the latter two require two bytes. The latter two will be truncated to a single byte when performing DirectCast<int, bool>
, if the bool
type is a single byte.
Here is the result:
b255 == b15: False
b255 == b1023: True
b255 == b1024: False
Since 1023
(binary 1 1111 1111
) is truncated to 255
(binary 1111 1111
), the comparison returns True
. And 1024
(binary 10 0000 0000
) is truncated to 0
, the comparison returns False
.
To prove this, we can execute the following code to change those bool
values back to int
:
Console.WriteLine("(int)b1023 = " + ClrHacker.DirectCast<bool, int>(b1023).ToString());
Console.WriteLine("(int)b1024 = " + ClrHacker.DirectCast<bool, int>(b1024).ToString());
We will see that 1023
has been truncated to 255
:
(int)b1023 = 255
(int)b1024 = 0
More stories about the boolean internals can be read here: What is the size of a boolean in C#.
Almost all C# programmers know that private
fields are inaccessible outside of the class where those fields are defined, but they can be modifiable via reflection, dynamic methods or P/Invoke marshalling.
With DirectCast
, there is another way. No reflection, no dynamic methods, no marshalling.
WARNING:
We are going to explore a dangerous part hidden in the CLR!
The operations we do here can completely crash an application.
I don't recommend you to employ this trick in production environments.
The Good Citizen--Class Me
We begin with a good citizen, an ordinary class Me
, which contains two public
properties (and two backing fields generated by the compiler correspondingly) and a public
method.
sealed class Me
{
public DateTime Date { get; set; }
public string Word { get; set; }
public void Tell() {
Console.Write("Me: ");
Console.WriteLine(Date.ToShortDateString() + "(" + Date.Ticks.ToString("X16")
+ "," + Date.Ticks.ToString() + "): " + Word);
}
}
The Miserably Shy Class A
The "shy class" A
is a "twin sister" of Me
. A
has the same number and types of fields as Me
and also a public
method. But all fields in A
are private
, and also its constructor.
sealed class A
{
DateTime value;
string text;
private A() { }
public void Print() {
Console.Write("A: ");
Console.WriteLine(value.ToString() + ", " + text);
}
}
Direct Casting Me to A
Usually, A
is so private
that it is miserably uninitializable and unalterable. But the wicked DirectCast
method opens a backdoor for us to do so.
Initially, we instantiate an instance of Me
and assign it to a local variable me
.
Me me = new Me { Date = new DateTime(1997, 7, 1), Word = "Hello world!" };
me.Tell();
Afterwards, we use DirectCast
to cast variable me
to another variable a
, typed A
.
A a = ClrHacker.DirectCast<Me, A>(me);
a.Print();
The result on the console reads:
Me: 1997/7/1(08BE53C8DA54C000,630033120000000000): Hello world!
A: 1997/7/1 0:00:00, Hello world!
We have never assigned any field value to a
. However, it prints the same value from me
.
Question:
Can we say that we have ever instantiated an instance of class A
? (Answer is below)
Changing the Private Field
Now, we will make the Print
method in class A
print another thing, without touching variable a
.
We change the value of the good citizen me
.
me.Date = new DateTime(1999, 12, 31);
And we optionally call the Tell
method from me
and the Print
method from a
, to verify the changes.
me.Tell();
a.Print();
Here is the result:
Me: 1999/12/31(08C121391D7A8000,630821952000000000): Hello world!
A: 1999/12/31 0:00:00, Hello world!
Question:
Is the good citizen Me
exploited to do bad things--manipulating the untouchable A
?
In this case, both class Me
and class A
are reference types. In CLR, reference types are pointers to memory addresses.
C programmers know this very well that if two C structs have the same memory layout, it is possible to cast them back and forth.
It seems that it is true for CLR class
es (reference types) too. Variable me
and variable a
are actually pointing to the same memory slots. We can say that while we change the Date
property of the memory slot where variable me
points to, we change the corresponding field where variable a
points to as well.
The Undisguised View
The Debugger in Visual Studio will undisguise the underlying information of a
is still a class Me
.
Why the debugger knows that variable a
is of class Me
in disguise but the CLR still works pretending that variable a
is of class A
?
It is because the memory layout of objects in .NET CLR is not the same as native C programs. Each .NET object has some overhead called Object Header and Method Table Pointer (details in Managed object internals, Part 1. The layout). Although we have successfully fooled the CLR by making two different variables pointing to the same data slots, the debugger can still discover the underlying type of variable a
by examining the object header. Meanwhile, if we run the following code, we will also read "ClrFun2.Me" on the console.
Console.WriteLine(a.GetType().FullName)
Therefore, we were actually working with the disguised instance of Me
, which appeared to be another class, on the above sections.
Is it possible to instantiate an instance of A
and change its field values? Yes.
Altering Private Fields via Public Wormholes
In the basic class library, there is a method FormatterServices.GetUninitializedObject
which can create an uninitialized object. Thus we can use it to create an instance of A
--allocating corresponding memory slots. Then we DirectCast
A
to Me
and change the public
property of Me
. Fields in the instance of A
will change simultaneously.
var u = System.Runtime.Serialization.FormatterServices.GetUninitializedObject(typeof(A)) as A;
me = ClrHacker.DirectCast<A, Me>(u);
Console.WriteLine(u.GetType().FullName);
u.Print();
me.Tell();
me.Date = new DateTime(1999, 12, 25);
me.Word = "Happy X'Mas";
u.Print();
me.Tell();
The console output is listed below:
ClrFun2.A
A: 0001/1/1 0:00:00,
Me: 0001/1/1(0000000000000000,0):
A: 1999/12/25 0:00:00, Happy X'Mas
Me: 1999/12/25(08C11C821F000000,630816768000000000): Happy X'Mas
Now we manage to initialize an object which cannot be publicly initialized and assign values to it.
The DirectCasted Primitive Value Type
We have come so far that we have realized that the DirectCast
method disguised the reference type. How about the value types?
In the former chapter of this article, we have DirectCast
ed that integer type to the boolean type. Will this be undisguised by the GetType
function as well?
Let's test with the following code:
bool b = ClrHacker.DirectCast<int, bool>(7);
Console.WriteLine(b.GetType().FullName);
We will read "System.Boolean
" on the console, instead of "System.Int32
". Why the GetType
function failed to undisguise the DirectCast
ed value?
If we decomile the above code with ildasm, we can read the following IL code about things that happen before GetType
is called.
ldc.i4.7
call !!1 ClrHacker::DirectCast<int32, bool>(!!0)
box [mscorlib]System.Boolean
call instance class [mscorlib]System.Type [mscorlib]System.Object::GetType()
The first line loads Int32
number 7 onto the evaluation stack.
The second line consumes that number on the stack and calles the DirectCast
method, which pushes the value back to the evaluation stack.
Any value type must be box
ed to a reference instance before calling accessing its members in .NET CLR. To box
a value type, its type must be provided. Therefore, the C# compiler uses the information that variable b
is of type Boolean
and generates an instruction to box
the return value of DirectCast
, then call
the GetType
function.
If variable b
is a reference type, the box
operation will not happen. Thus no type information is exchanged before calling the GetType
, which hence returns the original type of a variable like the previous section.
Altering Immutable Value Type Instance
Let's try another example which casts between a custom value type to a primitive type:
long b = ClrHacker.DirectCast<DateTime, long>(DateTime.MaxValue);
Console.WriteLine(b);
The above example will crash with the following message.
Unhandled Exception: System.InvalidProgramException: Common Language Runtime detected
an invalid program.
C/C++ programmers may probably think that since sizeof(DateTime)
within an unsafe context returns 8, which is the same result as sizeof(long)
, those two types should be possible to be DirectCast
ed. However, it doesn't work in the CLR world.
If we replace the long
type with another custom value type DateTimeStruct
like the following code, it works.
struct DateTimeStruct
{
public int T, U;
}
DateTimeStruct b = ClrHacker.DirectCast<DateTime, DateTimeStruct>(DateTime.MaxValue);
Help Wanted:
If someone knows the reason why it doesn't work, please comment.
To DirectCast
a DateTime
instance to the long
type, we have to use another version of DirectCast
, which can be made with the following code. The method body is identical to the previous version, the only differences are that both parameter and return types are changed to be ref
types. Subsequently, this method is possible to open wormholes (references) to value type instances.
// creates: ref TY DirectCast<TX, TY>(ref TX)
static void CreateRefDirectCastMethod(TypeBuilder type) {
var m = type.DefineMethod("DirectCast", MethodAttributes.Public | MethodAttributes.Static);
var g = m.DefineGenericParameters("TX", "TY");
m.SetParameters(g[0].MakeByRefType());
m.SetReturnType(g[1].MakeByRefType());
var il = m.GetILGenerator();
il.Emit(OpCodes.Ldarg_0);
il.Emit(OpCodes.Ret);
}
Then we can use the following code to utilize the new DirectCast
method.
var t = DateTime.MaxValue;
ref long b = ref ClrHacker.DirectCast<DateTime, long>(ref t);
Console.WriteLine($"b = {b}");
We will read the following on the console, which is the same value of DateTime.MaxValue.Ticks
, approximately the internal value of DateTime
.
b = 3155378975999999999
The ref
keyword passes the address of the DateTime
to the DirectCast
method, and then the ref long b
holds the reference to the data part of the DateTime
struct when DirectCast
returns.
Therefore, we can change the value of t
without touching it, by manipulating the wormhole provided by variable b
. The following code snippet demonstrates this.
var t = DateTime.MaxValue;
ref long b = ref ClrHacker.DirectCast<DateTime, long>(ref t);
Console.WriteLine($"b = {b}");
Console.WriteLine($"t = {t}");
Console.WriteLine("Changing b...");
b = TimeSpan.TicksPerHour + TimeSpan.TicksPerMinute + TimeSpan.TicksPerSecond;
Console.WriteLine($"b = {b}");
Console.WriteLine($"t = {t}");
We will read this on the console that the value of t
has changed from 9999/12/31 23:59:59 to 0001/1/1 1:01:01.
b = 3155378975999999999
t = 9999/12/31 23:59:59
Changing b...
b = 36610000000
t = 0001/1/1 1:01:01
The DirectCast
can also be used to cast collection types.
Warning:
This is also dangerous! If it crashes your application, it just crashes. You won't have any chance of catching any exception, nor can see any stack trace or exception message in the Windows Event Viewer.
List<T> Doesn't Support Covariance or Contravariance
In the following example, there is a List<T>
of BaseClass
, yet all items within it are of type SubClass
, which is derived from BaseClass
.
var b = new List<BaseClass> {
new SubClass { N = 1, Prefix = "A", S2 = "START" },
new SubClass { N = 2, Prefix = "B" },
new SubClass { N = 4, Prefix = "D", S1 = "END" },
};
The definitions of BaseClass
and SubClass
are listed below:
class BaseClass
{
public int N { get; set; }
}
sealed class SubClass : BaseClass
{
public string Prefix { get; set; }
public string S1 { get; set; }
public string S2 { get; set; }
}
DirectCasting to Force Contravariance
Since List<T>
does not support covariance or contravariance, it is impossible to implicitly or explicitly cast List<BaseClass>
to List<SubClass>
. With DirectCast
, you can do so, like the following code shows:
foreach (SubClass item in ClrHacker.DirectCast<List<BaseClass>, List<SubClass>>(b)) {
Console.WriteLine(String.Join(",", item.Prefix, item.N, item.S1, item.S2));
}
Uncomment the 4th line on the code snippet where variable b
is initialized (//new BaseClass { N = 3 }
), the program will probably crash when it runs to the 3rd item which is not a SubClass
but a BaseClass
.
The reason for the crash is a bit complicated but quite simple.
- The 3rd item in the
List<SubClass>
is actually a BaseClass
. The SubClass
needs extra bytes to fill in its extra properties (Prefix
, S1
and S2
) -- they are fields indeed, which do not exist in BaseClass
. - Filling in those fields will make the CLR read beyond the scope of the memory slots a
BaseClass
instance occupies. It usually does not lead to application crash, if the extra fields of SubClass
do not exceed the boundary of the application memory. CLR will simply read some "garbage bytes" behind the BaseClass
instance into those extra fields. - Very unfortunately, in this example, the extra fields of
SubClass
are all String
s and the memory layout of a .NET String
is prefixed by a number indicating the length of the String
. So the "garbage bytes" may tell the CLR that the length of those three String
s could be very large numbers. - When
Console.WriteLine
on the above snippet is called, the CLR has to access the contents of item.Prefix
, item.S1
or item.S2
, the wrong large lengths of those properties will make the CLR read beyond the memory boundary that the operating system assigns to the application, thus the execution engine crashes.
Weird Behavior of DirectCasted Items in Enumeration
The DirectCast
ed item
produced by the enumerator behaves a bit extraordinary. You can neither test it against null
, nor test it whether it is
the SubClass
type.
var b = new List<BaseClass> {
new SubClass { N = 1, Prefix = "A", S2 = "START" },
new SubClass { N = 2, Prefix = "B" },
new BaseClass { N = 3 },
new SubClass { N = 4, Prefix = "D", S1 = "END" },
};
foreach (SubClass item in ClrHacker.DirectCast<List<BaseClass>, List<SubClass>>(b)) {
if (item == null || item is SubClass == false) {
continue;
}
Console.WriteLine(String.Join(",", item.Prefix, item.N, item.S1, item.S2));
}
Run the above snippet, the program will still probably crash without throwing a single exception.
Note: Why item is SubClass == false
is not working in this example is because the compiler already knows that the item
must be of type SubClass
from the foreach
statement, thus it simply changes the is
operation to a comparison to null
. Decompile the above code with ILSpy and you will see that.
Preventing the Crash with Type Comparison
As we have known before, calling the GetType
method on a DirectCast
ed instance can reveal its real type. One way to prevent the crash is to force a type comparison and skip items not being a SubClass
.
foreach (SubClass item in ClrHacker.DirectCast<List<BaseClass>, List<SubClass>>(b)) {
if (item.GetType() != typeof(SubClass)) {
continue;
}
Console.WriteLine(String.Join(",", item.Prefix, item.N, item.S1, item.S2));
}
Of course, this is cumbersome, incomplete (we also filtered sub classes of SubClass
by the above means) and inefficient, comparing to the version not using DirectCast
ed List
.
foreach (BaseClass item in b) {
var sub = item as SubClass;
if (sub == null) {
continue;
}
Console.WriteLine(String.Join(",", sub.Prefix, sub.N, sub.S1, sub.S2));
}
The DirectCast
of collections should only be used when you are 100% sure that the type of items are right. It is really too dangerous. If there are many calls to DirectCast
scattering around your code, you will hardly find out the source of a crash.
- Calling
GetType()
against the disguised DirectCast
object will reveal its real type. But for value type objects, a new instance of the target type will be created after DirectCast
. So, is it safe to DirectCast
small value types to larger value types, such as int
to Guid
? - It is very possible that an
AccessViolationException
will be thrown if we DirectCast
a small class to a large class, since the latter one requires more memory slots which does not exist for the former one. But what will happen if we DirectCast
a large class to a small class? - It seems that we can quickly obtain the memory address of a reference type instance by
DirectCast
ing the reference object to long
, like DirectCast<string, long>(anInstanceOfString)
. - Actually, this type of trick had already been out there for quite some time. We can find some more in the assembly
System.Runtime.CompilerServices.Unsafe
out of the .NET Core libraries. The source code can be found at GitHub which contains the CS decoy and the corresponding IL implementation. The assembly library could be downloaded via NuGet. They used IL and ilasm to make such a library. Whereas, in this article, we achieved similiar things with the DynamicAssembly
and the assembly we got could even be ported back to older .NET platforms, as old as .NET Framework 2.0.
- 2018-9-17: Initial publication
- 2018-9-19: +
DirectCast
ing Collections - 2019-4-16: Added information about similiar implementations and more functions from the .NET Core library
- 2019-10-23: Added information about direct casting primitive value types
- 2019-10-24: Added information about direct casting custom value types to primitive value types via an overload of
DirectCast
method