Source code in MSIL:
Download GenericPointers.zip - 770 B
Tests with garbage collector:
Download GCtests.zip - 34.86 KB
Introduction
In this article I'll address some discoveries I've done in Microsoft's implementation of the CLR and how to take advantage of them, and what risks they involve as far as I know.
Background
I'll keep two parts of the topics I'll discuss about, the first will be the explanation of the problem, the
second is the code implementation in MSIL or C#. I will point here too that I don't mean you to use this because of its risks.
So I expect some knowledge about what is MSIL and how to compile it with ilasm.exe atleast.
I also would like if anybody among the readers can test this stuff in Mono and share his experience.
Problem and solution
I've walked to some limitations about the use of pointers in C#, most of then I though where crazy thing, until I
needed them. The problem was that I had wrote a large lib full of unsafe blocks and it was becoming a problem
to maintain.
The way I solved it was based a discovery I did thanks to CodeProject, and once I had it up and running, I wanted to
be able to do all those crazy thing I had in my mind... and guess what? they can be done! I'm not using all that
stuff soon, and since it was thanks to CodeProject, I'm giving it back to the community.
To give a raw idea of what we are dealing with, I'll say that you will be able to do generic pointers and also
some arithmetic forgiving about the type and one or two details more you may have forgot about.
Points of Interest
As I said before we will handle some weird things about pointers and types, in general we will explode as deep as
we can unsafe code... about that, there is people that miss unsafe with unmanaged, and that's not true. The unmanaged code runs outside of the CLR such as P/Invoke. Unsafe in the other hand runs inside the CLR this means that you don't have a perfomance hit of leaving and returning to unmanaged code, but also that It's not that fast as it's unmanaged equivalent.
I have to confess that I thought about using protection glasses in case the the CPU explotes or something, but
since I'm writing this you can be sure that the risk is not that big. But surely we will do things that may end in
the app crashed like in the days of VarPtr in VB 6.0.
If we go back to that age... we had VarPtr
, ObjPtr
, StrPtr
and a weird hack named ArrayPtr
, Let's start doing them
in C#, I know there are solutions for mashalling, but I don't mean that. I want to mention only for historical reason that there was a ProcPtr
too. And to note that we will not go into ObjPtr directly here, but I can tell that VarPtr
will also do what ObjPtr
used to do.
Why the GC can't collect strings?
First off, the GC can collect the objects, I mean the instance of the System.String class that points to the string's
content. It is able to move it around the heap and collect it like any other object. Period.
What the GC can't do is to collect the memory of the string it self when this is interned, I mean the palce where
all the characters are stored for an interned string. Does that mean that it can't collect any string? no. It only
means that it can't collect interned strings, and those includes the string literals of .NET. Also string literals may be
frozen strings more on that later.
Now what's a interned string? for those who had the chance to work with Win32, you probable know about something
called ATOMs, the ATOMs was objects (in the abstract of the age) that allows us to store and handle strings. The
ATOMs was good idea to store commonly used strings, such as the texts of the windows of your application. The good
news for the developers was that they only needed to store the string once. All the strings with the same text
actually pointed to the same area in memory, making it less expensive in memory space. Also this string memory was
managed by the operating system, and any proccess could take advantage of the text there.
Ok, that's exactly what System.Intern does, I can't tell if that uses ATOMs, but that it behave the same way, and
thus all the strings with the same texts points to the same location in memory (may be memory managed by the OS (...) we may ask people from Mono to see how they did it). Think about what it takes to recover
that memory with GC (...) you would look for all the string that may point to there, and that includes strings that
points to partial parts, and that overlap (...) so GC just doesn't care about internal strings. But surely it can
recover the rest of strings, such as those created with StringBuilder, that's why using StringBuilder for string
manipulation is a good practice encoraged by Micrososft. Remember that you can call Intern on any string.
Talking about StringBuilder, there are reasons for the strings to be immutable, the first reason comes from the need to get sub-strings. If you need to get a sub-string and strings aren't immutable, what would happen if you point to a internal part of another string and that one moves to another place or changes it's contents? The answer that the your sub-string will be damaged and you don't want that.
So languages like the old VB created a copy of the string to give the sub-string, this behavior makes easy to do concatenations, replace and write. What's the problem with that? It's that both sub-string and concatenations operations requires to copy memory, so the designers decided to make then immutable so the sub-string case will be faster. But that leads to a problem when you write (Called C.O.W, copy on write) and concatenation will be less efficient.
Today we have StringBuilder to solve that part. between others adventages of immutable strings we can find that they are supposed to be thread safe and that they can be interned saving memory.
With that said, consider what it would involve to use StrPtr with a non internal string.
About Frozen Strings, it's something added to .NET in the version 2.0 that allowed to share memory among varius apps made in .NET. I'll give a link to an article that talks about the matter: http://weblog.ikvm.net/PermaLink.aspx?guid=b28aa8b7-87e3-49d7-b0aa-3cc2cb5dbac9
1) How to do StrPtr
This one is the easiest and has no discoveries involved... so, here you have it, granted:
unsafe public static int StrPtr(string a)
{
fixed (char* p = a)
{
return (int)p;
}
}
I have not found any undocumented risks on this code. The only failure case I know is passing null as argument,
in which case you will get 0 as return value. If you try to read or write a pointer pointing to 0 you will get
NullReferenceException.
First, yes it's dangerous, secound, no the GC will not take the string.
Another thing... if what you need is a byte array with info of a string you can delegate that work to Encoding
types in System.Text
namespace, if you need a char array, use String.ToCharArray()
.
I have had complains about StrPtr
, so I'll talk about those risks I
thought you already knew:
using System;
using System.Runtime;
namespace Test
{
class Program
{
static void Main(string[] args)
{
unsafe
{
char* p = (char*)Test();
int g = GC.MaxGeneration;
GCSettings.LatencyMode = GCLatencyMode.Batch;
GC.Collect(g, GCCollectionMode.Forced);
GC.WaitForPendingFinalizers();
Console.WriteLine(p[0]);
Console.WriteLine(p[1]);
Console.WriteLine(p[2]);
Console.WriteLine(p[3]);
}
Console.ReadKey();
}
unsafe public static int StrPtr(string a)
{
fixed (char* p = a)
{
return (int)p;
}
}
static private int Test()
{
string a = "SOME";
return StrPtr(a);
}
}
}
In the above code main calls the funtion
Test
which declares and initializes a string named
a
and then retrieves the pointer with the
StrPtr
function showed above, the next step is to call the garbage collector to get that string that just went out off scope garbage collected (something that actually will not happen). The next step is to read the area where the pointer was looking for, and the output
is the following:
S
O
M
E
Which means that the string is till available. Ok, that looks ok, the next step was to repeat the experiment with a
larger string (it was a copy of the code above). and the output was:
u
s
i
n
By the way this is how StrPtr
looks in MSIL if you are curious:
.method public hidebysig static int32 StrPtr(string a) cil managed
{
.maxstack 2
.locals init ([0] char* p,
[1] int32 CS$1$0000,
[2] string pinned CS$519$0001)
IL_0000: ldarg.0
IL_0001: stloc.2
IL_0002: ldloc.2
IL_0003: conv.i
IL_0004: dup
IL_0005: brfalse.s IL_000d
IL_0007: call int32 [mscorlib]System.Runtime.CompilerServices.RuntimeHelpers::get_OffsetToStringData()
IL_000c: add
IL_000d: stloc.0
IL_000e: ldloc.0
IL_000f: conv.i4
IL_0010: stloc.1
IL_0011: leave.s IL_0013
IL_0013: ldloc.1
IL_0014: ret
}
Do you notice that call and those local variables? the only thing I will say about it is that, if you though
you skipped using .NET framework functions using fixed to get the data of a string instead of ToCharArray(), you
are wrong. Also note that this implementation may change in the future.
See
System.Runtime.CompilerServices.RuntimeHelpers.OffsetToStringData()
in MSDN.
This is because the string is being interned and that has some consecuences. Consider the following
code:
using System;
namespace Test
{
class Program
{
static void Main(string[] args)
{
unsafe
{
char* p = (char*)Test();
string b = "SOME";
p[0] = 'N';
p[1] = 'U';
p[2] = 'L';
p[3] = 'L';
Console.WriteLine(b);
}
Console.ReadKey();
}
unsafe public static int StrPtr(string a)
{
fixed (char* p = a)
{
return (int)p;
}
}
static private int Test()
{
string a = "SOME";
return StrPtr(a);
}
}
}
What do you think the output will be? if you said "SOME", you are worng, because strings are inmutable,
.NET takes the freedom to share the same pointer among all the string literals that are equal. So the output
is "NULL", to avoid this use StringBuilder class from System.Text namespace which will prevent your
string to be interned.
2) How to do VarPtr (and walkthrough)
This is actually a documented one. You can use use
&
to get your pointer, just like in C++ (in syntax atleast), just remember to be in a
unsafe
block and had
/unsafe
option on. But can
you write
VarPtr
function?
Before going to write the funtion, let me repeat that we are doing something dangerous, I don't want to be blamed because you forgive that.
In order to create this function keep in mind that you need to use a parameter by-reference, because if you use
a by-value, the whole idea is nonsense. Once that said, the problem is simple: if you write a funtion that takes
a value to retrieve it's address, what type will you declare the parameter?
- If you said object, you may think you will have to deal with boxing - unboxing, but actually the compiler will
just say: "can't conver to ref object".
- If you said the same as the value, then you will have to write the function for each posible data type. That isn't the idea.
- If you said to use a generic type then you have got the idea, now you how to create a generic pointer?
For long I though it was unpossible, the reason is that the following code ends with two CS0208 errors,
"Cannot take the address of, get the size of, or declare a pointer to a managed type ('type')":
unsafe public static int VarPtr<t>(T a)
{
fixed (T* p = &a)
{
return (int)p;
}
}
</t>
After trying it, of couse so I looked for VarPtr
alternative on the web, and found a solution:
static int Ptr(object o)
{
System.Runtime.InteropServices.GCHandle GC =
System.Runtime.InteropServices.GCHandle.Alloc(o,
System.Runtime.InteropServices.GCHandleType.Pinned);
int ret = GC.AddrOfPinnedObject().ToInt32();
return ret;
}
I needed to test it, so I designed the next scenario:
int a = 9;
unsafe
{
int b = (int)&a;
int c = VarPtr(a);
Console.WriteLine(b);
Console.WriteLine(c);
int* p = (int*)b;
int* q = (int*)c;
*p = 1;
*q = 2;
}
Console.WriteLine(a);
This is one of the results I got (the two firsts numbers changes from test to test, since they are memory addresses):
1242220
20191136
1
The number 1 at the end tells me that the second pointer is the wrong one.
You may think it's the end of the story, but it's just the beggining, the solution for VarPtr
came in the shape
of MSIL, this is the way to write it:
.method public hidebysig static !!T* AddressOf<T>(!!T& var) cil managed
{
.maxstack 1
ldarg.0
conv.i
ret
}
It's included in a library I called "GenericPointers" and in a class named "PointerBuilder". The
source code in MSIL is attached to this article.
Now using this Lib compiled from MSIL, I'll need to change:
int c = Ptr(a);
with:
int c = (int)PointerBuilder.AddressOf(ref a)
The result was the following:
70642928
70642928
2
So this is out VarPtr
, but in order do create it we discovered that Microsoft's implementation of the CLR supports
generic pointers... what else could we do with that?
About the risk. you will have the same that you have with any pointer, so spectacular crashes are up to you.
3) How to do ArrayPtr?
There was a problem in VB 6.0 and it was that the arrays didn't allocate their elementes where the pointer you get
with VarPtr
points... so people managed to get the address to that point using a "hack" they called
ArrayPtr
. In .NET happens the same thing with arrays. I coded a solution in MSIL, which I'm not including here,
because after testing it I discovered that the procedure Ptr I showed in numeral 2 gives the address for Strings
and Arrays of any type.
These are news, array didn't pass all the tests, so if you are accessing an array make sure to have a copy of the array too, because the GC may take the array.
4) How to do a "StuctPtr"?
There has been a lot of people that has asked how to get a pointer to a struct they declare in C# and handle
it byte by byte, I solved this problem too, the implementation in MSIL is the following:
.method public hidebysig static !!T* Convert<T>(native int ptr) cil managed
{
.maxstack 1
ldarga.s ptr
call instance void* [mscorlib]System.IntPtr::ToPointer()
ret
}
This routine will allow us to convert a pointer to a generic pointer, with this power we will convert a pointer to
a byte array into a pointer to the struct we want, and then asign to it's value our struct. a sample code to
take advantage of this in C# is the following:
public unsafe byte[] ToByteArray<T>(T a) where T : struct
{
int length = PointerBuilder.SizeOf<T>();
byte[] buffer = new byte[length];
fixed (byte* p = &buffer[0])
*PointerBuilder.Convert<T>(new IntPtr((void*)p)) = a;
return buffer;
}
This code will allow you to handle as a byte array a copy of your struct, of course you will need to convert that
array back to struct, to do so the following code comes in handy:
public static unsafe T ToStruct<T>(byte[] a) where T : struct
{
fixed (byte* p = &a[0])
return *(PointerBuilder.Convert<T>(new IntPtr((void*)p)));
}
Ok, now beware of passing a byte array wich length is less than the size of the struct, if you need a sizeof for
generic types there is one in the MSIL code I'm including with the articule.
The reason because of which you may need a generic sizeof is that if you use
sizeof in a generic type it will say error CS0233 "'identifier' does not have
a predefined size, therefore sizeof can only be used in an unsafe context
(consider using System.Runtime.InteropServices.Marshal.SizeOf)". I've told this
to Microsoft and there is risk that sizeof will work on generics with struct
constrain in the future.
About taking out the struct constrain is up to you doing this with reference types, do it at your own risk. I'm not
interested myself in such situations, since anything you may accomplish with that may not work in another CLR
implementation form Microsoft or third party.
I want to note that I've done tests with garbage collection and I haven't found any problems, looks like GC is
smart enough to know that there are more than one thing allocated in the given memory space, just as happens in the
C# version of union. I'll post here in case you forgot about it:
using System.Runtime.InteropServices;
[StructLayout(LayoutKind.Explicit)]
public struct union
{
[FieldOffset(0)]
public int word;
[FieldOffset(0)]
public short lowrd;
[FieldOffset(1)]
public short hiword;
}
Talking about Union, if you don't need generic types, there is another
approach you can take:
using System;
using System.Runtime.InteropServices;
namespace Test
{
[StructLayout(LayoutKind.Explicit)]
public unsafe struct tester
{
[FieldOffset(0)]
public fixed byte data[16];
[FieldOffset(0)]
public Decimal structure;
}
class Program
{
static void Main(string[] args)
{
tester X = new tester();
X.structure = -1234567890.01234567890123456789M;
unsafe
{
for (int I = 0; I < 16; I++)
{
Console.WriteLine(X.data[I]);
}
}
Console.ReadKey();
}
}
}
The code above shows how to use unsafe with unions to get the bytes of a
given sturct. This code will not work in generic version, because it will
need to have a variable size of the field data
, and it will
response with an error CS0133 "The expression being assigned to 'variable'
must be constant". (and also you will need that sizeof).
5) How to do Array to Array copy?
We all now that we can use CopyTo to copy info into another array of the same type, but what do we do if we need
to copy the binary data to an array of another type? (I don't mean to convert the items). So far we have seen how
to handle some weird pointers, now I'll show an example of a funtion that makes memory copy much faster (no need
to walk with a pointer all the array). The code is the following:
public static unsafe T[] CopyToArray<T>(this byte[] a, int telements) where T : struct
{
T[] R = new T[telements];
int Z = PointerBuilder.SizeOf<T>() * telements;
var p = PointerBuilder.AddressOf<T>(R, 0);
System.Runtime.InteropServices.Marshal.Copy(a, 0, new IntPtr((void*)p), Z);
return R;
}
public static unsafe byte[] CopyToByteArray<T>(this T[] a) where T : struct
{
int length = a.Length * PointerBuilder.SizeOf<T>();
byte[] buffer = new byte[length];
var p = PointerBuilder.AddressOf<T>(a, 0);
System.Runtime.InteropServices.Marshal.Copy(new IntPtr((void*)p), buffer, 0, length);
return buffer;
}
The star in this code is Marshal.Copy which will do the work we did with RtlMoveMemory (CopyMemory) in those
"hack" days of VB 6.0 if you are looking forward to graphic work, Marshal.Copy and LockBits are a
good team.
This code is wrote to work as extension methods, if you are uilding to .NET 2.0 add the following code in order to
make it run without changes (again in case you forgot about it):
namespace System.Runtime.CompilerServices
{
[AttributeUsage(AttributeTargets.Method)]
public sealed class ExtensionAttribute : Attribute
{
public ExtensionAttribute() { }
}
}
6) How to do generic arithmetic?
If you have done a vector type, a list type or a matrix type, you may have came to the problem of generic
arithmetic. First off, What's generic arithmetic? Generic arithmetic is a term to refer the feature you need to
add, multiply, divide, substract, negate, shift ot get the remainder of a division or any other arithmetic
operation derived from those with numeric values without caring if they are int, short, long or whatever is a
numeric type.
Now that granted, when do you need it? It's needed in such rare but useful situations that we have learned to
think they are imposible. For example consider writing a function that calculate the average of the numbers in
a generic enumerator, do you want to write it once for each numeric type? of course not, you want to have a
single generic function handle the problem, this is the scenario for which I have the solution (except for decimal
an any other numeric not primitive type).
The idea is quite simple, now that we have seen how to use MSIL to handle generics, what happens if I tell him to
add two values of a generic type? for instance:
.method public hidebysig static !!T Add<T>(!!T a, !!T b) cil managed
{
.maxstack 2
ldarg.0
ldarg.1
add
ret
}
The code above uses add to sum up the values a and b,note that add does not check overflow, to get a overflow
checked version change add with add.ovf. the others operations are done unther the same principia.
Guess what? it works! but it's pretty, pretty dangerous, if you happen to pass objects, strings, or even decimals,
it will end with an AccessViolationException or with a more expectacular FatalExecutionEngineError, which is
a fatal error, in other word it crashes your code not matter how many try-catch statements do you have.
Ok, to avoid this danger include the following test around this code:
if (typeof(decimal).IsPrimitive)
{
}
I would had include it in the MSIL, but checking this again and again may run in a bottleneck.
Again for curious readers, here is the same done from C# (it ends with a CS0019 error "Operator 'operator'
cannot be applied to operands of type 'type' and 'type'):
public static T Ptr<T>(T a, T b)
{
return a + b;
}
To close the matter of generic arithmetic, I want to add that there is another
way to acomplish gemeric arithmetics. It's slower but more secure and will
require LINQ, the following code is a working example of this approach:
using System;
using System.Collections.Generic;
using System.Linq.Expressions;
namespace Test
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine(add(12, 15));
Console.ReadKey();
}
private static T add<t>(T a, T b)
{
Type t = typeof(T);
ParameterExpression _a = Expression.Parameter(t, "a");
ParameterExpression _b = Expression.Parameter(t, "b");
var R = Expression.Lambda<func><t,>>(
Expression.Add(
_a, _b
)
, _a, _b).Compile();
return R(a, b);
}
}
}
</t,></func></t>
Disclaimer
There must be a question around, and it is, why do I say that Microsoft didn't want us to know?, well It happens
that this things has been asked Microsoft to be added to .NET or Visual Studio, and MIcrosoft responded
with a "won't fix - by design", may be even they don't know they did this, but if they know, they didn't
want us to know. Let's try to think why? It wont be actually unsafe to do generic arithmetic if they had support for
a "numeric" generic constrain about what I want to quote this:
"
Bruce Eckel
: Will I be able to do a template function, in other words, a function where the argument is the unknown type? You
are adding stronger type checking to the containers, but can I also get a kind of weaker typing as I can get with
C++ templates? For example, will I be able to write a function that takes parameters
A a and B b
, and then in the code say,
a + b?
Can I say that I don't care what
A
and
B
so long as there's an
operator+
for them, because that's kind of a weak typing.
Anders Hejlsberg
: What you are really asking is, what can you say in terms of constraints? Constraints, like any other feature,
can become arbitrarily complex if taken to their ultimate extreme. When you think about it, constraints are a
pattern matching mechanism. You want to be able to say, "This type parameter must have a constructor that
takes two arguments, implement
operator+
, have this static method, has these two instance methods, etc." The question is, how complicated do you want
this pattern matching mechanism to be?
There's a whole continuum from nothing to grand pattern matching. We think it's too little to say nothing, and the
grand pattern matching becomes very complicated, so we're in- between. We allow you to specify a constraint that
can be one class, zero or more interfaces, and something called a constructor constraint. You can say, for example,
"This type must implement
IFoo
and
IBar
" or "This type must inherit from base class
X
." Once you do that, we type check everywhere, at both compile and run time, that the constraint is true. Any
methods implied by that constraint are directly available on values of that type parameter type.
Now, in C#, operators are static members. So, an operator can never be a member of an interface, and therefore an
interface constraint could never endow you with an
operator+
. The only way you can endow yourself with an
<cod>
operator+
is by having a class constraint that says you must inherit from, say,
Number
, and
Number
has an
operator+
of two
Numbers
. But you could not in the abstract say, "Must have an
operator+
," and then we polymorphically resolve what that means.
"
Sorry by the long quote, but the context is importat here, for abstract it implies that we don't have support
of numeric constrains because of the complexity involved in such expesific constrain. now think about the code I
show here... it does it! but it have strong risks, which I'm afraid I may have not found them all. I not saying:
"Go hack .NET", I'm saying to Microsoft to support this in safe way, and to the developers to use this
carefully, I want to show to Microsoft that we may need this featues in the language and CLR. Another reason why
I'm not saying to hack .NET is that I like the idea of having code running in tird party implementations, and if
we want that then hacks are nonsense.
After that another question may have raised, and it is what is all this useful to? I can't go the specific kind
of work we may need it (and remember that I said that this wasn't for marshalling), First use I may do with this
is all that work we do reading stuff from streams, secound would be working on graphics or sound for realtime
applications, and derivated from that one another use is video games, ok you will tell me two things on the matter:
1) it's not secure
Like any other way of doing things there are good practices and mistakes to avoid, I suggest to work with
this only inside a component, and test, test, test.
2) there are other ways to do it
I know, "don't reinvent the wheel", but remember that sometimes it's good to know how the wheel
was done, and that wheel was done by humans so it's not perfect and there is room to improve. So I have two things
to say, first is that I will not go to C++ to have a more unsecure scenario just because I want to handle
some dangerous things, nothing more away from my scope, I want to keep the security up in any place I can
while getting enough speed and power to accomplish my requirements. And secound is that even as if I know that
there are third parties products that solve my problems, some developer worked with that, and did the dirty work
for you to have the hands clean. What if I tell you that I'm heading to be one of those? or if you are
needing some dirty work for first time, and there is not third party product? so atleast think about it.
Lastly if you are still thinking that this doesn't help at all, remember that you know that it's posible thanks to
me, so you may take this article as a curiosity expo, and skip sending me messages saying: "what is
this for?".
Legal note
This article is published under the Creative Commons Attribution-Share Alike license, since this license
says: "Any of the above conditions can be waived if you get permission from the copyright holder.",
I'll allow you to forgive the "Share Aike" part, releasing this license from it's viral nature. I only
care about those that doesn't create derivate product that use this code for commercial use to include my name in
the "special thanks"