Hardcore Microsoft .Net

Theraot

3.00/5 (20 votes)

22 Feb 2009CC (ASA 2.5)20 min read

99.5K

235

What Microsoft didn't want you to know

Source code in MSIL: Download GenericPointers.zip - 770 B

Tests with garbage collector: Download GCtests.zip - 34.86 KB

Introduction

In this article I'll address some discoveries I've done in Microsoft's implementation of the CLR and how to take advantage of them, and what risks they involve as far as I know.

Background

I'll keep two parts of the topics I'll discuss about, the first will be the explanation of the problem, the second is the code implementation in MSIL or C#. I will point here too that I don't mean you to use this because of its risks.

So I expect some knowledge about what is MSIL and how to compile it with ilasm.exe atleast.

I also would like if anybody among the readers can test this stuff in Mono and share his experience.

Problem and solution

I've walked to some limitations about the use of pointers in C#, most of then I though where crazy thing, until I needed them. The problem was that I had wrote a large lib full of unsafe blocks and it was becoming a problem to maintain.

The way I solved it was based a discovery I did thanks to CodeProject, and once I had it up and running, I wanted to be able to do all those crazy thing I had in my mind... and guess what? they can be done! I'm not using all that stuff soon, and since it was thanks to CodeProject, I'm giving it back to the community.

To give a raw idea of what we are dealing with, I'll say that you will be able to do generic pointers and also some arithmetic forgiving about the type and one or two details more you may have forgot about.

Points of Interest

As I said before we will handle some weird things about pointers and types, in general we will explode as deep as we can unsafe code... about that, there is people that miss unsafe with unmanaged, and that's not true. The unmanaged code runs outside of the CLR such as P/Invoke. Unsafe in the other hand runs inside the CLR this means that you don't have a perfomance hit of leaving and returning to unmanaged code, but also that It's not that fast as it's unmanaged equivalent.

I have to confess that I thought about using protection glasses in case the the CPU explotes or something, but since I'm writing this you can be sure that the risk is not that big. But surely we will do things that may end in the app crashed like in the days of VarPtr in VB 6.0.

If we go back to that age... we had VarPtr, ObjPtr, StrPtr and a weird hack named ArrayPtr, Let's start doing them in C#, I know there are solutions for mashalling, but I don't mean that. I want to mention only for historical reason that there was a ProcPtr too. And to note that we will not go into ObjPtr directly here, but I can tell that VarPtr will also do what ObjPtr used to do.

Why the GC can't collect strings?

First off, the GC can collect the objects, I mean the instance of the System.String class that points to the string's content. It is able to move it around the heap and collect it like any other object. Period.

What the GC can't do is to collect the memory of the string it self when this is interned, I mean the palce where all the characters are stored for an interned string. Does that mean that it can't collect any string? no. It only means that it can't collect interned strings, and those includes the string literals of .NET. Also string literals may be frozen strings more on that later.

Now what's a interned string? for those who had the chance to work with Win32, you probable know about something called ATOMs, the ATOMs was objects (in the abstract of the age) that allows us to store and handle strings. The ATOMs was good idea to store commonly used strings, such as the texts of the windows of your application. The good news for the developers was that they only needed to store the string once. All the strings with the same text actually pointed to the same area in memory, making it less expensive in memory space. Also this string memory was managed by the operating system, and any proccess could take advantage of the text there.

Ok, that's exactly what System.Intern does, I can't tell if that uses ATOMs, but that it behave the same way, and thus all the strings with the same texts points to the same location in memory (may be memory managed by the OS (...) we may ask people from Mono to see how they did it). Think about what it takes to recover that memory with GC (...) you would look for all the string that may point to there, and that includes strings that points to partial parts, and that overlap (...) so GC just doesn't care about internal strings. But surely it can recover the rest of strings, such as those created with StringBuilder, that's why using StringBuilder for string manipulation is a good practice encoraged by Micrososft. Remember that you can call Intern on any string.

Talking about StringBuilder, there are reasons for the strings to be immutable, the first reason comes from the need to get sub-strings. If you need to get a sub-string and strings aren't immutable, what would happen if you point to a internal part of another string and that one moves to another place or changes it's contents? The answer that the your sub-string will be damaged and you don't want that.

So languages like the old VB created a copy of the string to give the sub-string, this behavior makes easy to do concatenations, replace and write. What's the problem with that? It's that both sub-string and concatenations operations requires to copy memory, so the designers decided to make then immutable so the sub-string case will be faster. But that leads to a problem when you write (Called C.O.W, copy on write) and concatenation will be less efficient.

Today we have StringBuilder to solve that part. between others adventages of immutable strings we can find that they are supposed to be thread safe and that they can be interned saving memory.

With that said, consider what it would involve to use StrPtr with a non internal string.

About Frozen Strings, it's something added to .NET in the version 2.0 that allowed to share memory among varius apps made in .NET. I'll give a link to an article that talks about the matter: http://weblog.ikvm.net/PermaLink.aspx?guid=b28aa8b7-87e3-49d7-b0aa-3cc2cb5dbac9

1) How to do StrPtr

This one is the easiest and has no discoveries involved... so, here you have it, granted:

unsafe public static int StrPtr(string a)
{
    fixed (char* p = a)
    {
        return (int)p;
    }
}

I have not found any undocumented risks on this code. The only failure case I know is passing null as argument, in which case you will get 0 as return value. If you try to read or write a pointer pointing to 0 you will get NullReferenceException.

First, yes it's dangerous, secound, no the GC will not take the string.

Another thing... if what you need is a byte array with info of a string you can delegate that work to Encoding types in System.Text namespace, if you need a char array, use String.ToCharArray().

I have had complains about StrPtr, so I'll talk about those risks I thought you already knew:

using System;
using System.Runtime;
namespace Test
{
    class Program
    {
        static void Main(string[] args)
        {
            unsafe 
            {
                char* p = (char*)Test();
                int g = GC.MaxGeneration;
                GCSettings.LatencyMode = GCLatencyMode.Batch;
                GC.Collect(g, GCCollectionMode.Forced);
                GC.WaitForPendingFinalizers();
                Console.WriteLine(p[0]);
                Console.WriteLine(p[1]);
                Console.WriteLine(p[2]);
                Console.WriteLine(p[3]);
            }
            Console.ReadKey();
        }
        unsafe public static int StrPtr(string a)
        {
            fixed (char* p = a)
            {
                return (int)p;
            }
        } 
        static private int Test()
        {
            string a = "SOME";
            return StrPtr(a);
        }
    }
}

In the above code main calls the funtion

Test

which declares and initializes a string named

and then retrieves the pointer with the StrPtr function showed above, the next step is to call the garbage collector to get that string that just went out off scope garbage collected (something that actually will not happen). The next step is to read the area where the pointer was looking for, and the output is the following:

S
O
M
E

Which means that the string is till available. Ok, that looks ok, the next step was to repeat the experiment with a larger string (it was a copy of the code above). and the output was:

u
s
i
n

By the way this is how StrPtr looks in MSIL if you are curious:

MSIL

.method public hidebysig static int32  StrPtr(string a) cil managed
{
  // Code size       21 (0x15)
  .maxstack  2
  .locals init ([0] char* p,
           [1] int32 CS$1$0000,
           [2] string pinned CS$519$0001)
  IL_0000:  ldarg.0
  IL_0001:  stloc.2
  IL_0002:  ldloc.2
  IL_0003:  conv.i
  IL_0004:  dup
  IL_0005:  brfalse.s  IL_000d
  IL_0007:  call       int32 [mscorlib]System.Runtime.CompilerServices.RuntimeHelpers::get_OffsetToStringData()
  IL_000c:  add
  IL_000d:  stloc.0
  IL_000e:  ldloc.0
  IL_000f:  conv.i4
  IL_0010:  stloc.1
  IL_0011:  leave.s    IL_0013
  IL_0013:  ldloc.1
  IL_0014:  ret
} // end of method Program::StrPtr

Do you notice that call and those local variables? the only thing I will say about it is that, if you though you skipped using .NET framework functions using fixed to get the data of a string instead of ToCharArray(), you are wrong. Also note that this implementation may change in the future.

See System.Runtime.CompilerServices.RuntimeHelpers.OffsetToStringData() in MSDN.

This is because the string is being interned and that has some consecuences. Consider the following code:

using System;
namespace Test
{
    class Program
    {
        static void Main(string[] args)
        {
            unsafe 
            {
                char* p = (char*)Test();
                string b = "SOME";
                p[0] = 'N';
                p[1] = 'U';
                p[2] = 'L';
                p[3] = 'L';
                Console.WriteLine(b);
            }
            Console.ReadKey();
        }
        unsafe public static int StrPtr(string a)
        {
            fixed (char* p = a)
            {
                return (int)p;
            }
        } 
        static private int Test()
        {
            string a = "SOME";
            return StrPtr(a);
        }
    }
}

What do you think the output will be? if you said "SOME", you are worng, because strings are inmutable, .NET takes the freedom to share the same pointer among all the string literals that are equal. So the output is "NULL", to avoid this use StringBuilder class from System.Text namespace which will prevent your string to be interned.

2) How to do VarPtr (and walkthrough)

This is actually a documented one. You can use use

to get your pointer, just like in C++ (in syntax atleast), just remember to be in a unsafe block and had /unsafe option on. But can you write

VarPtr

function?

Before going to write the funtion, let me repeat that we are doing something dangerous, I don't want to be blamed because you forgive that.

In order to create this function keep in mind that you need to use a parameter by-reference, because if you use a by-value, the whole idea is nonsense. Once that said, the problem is simple: if you write a funtion that takes a value to retrieve it's address, what type will you declare the parameter?

- If you said object, you may think you will have to deal with boxing - unboxing, but actually the compiler will just say: "can't conver to ref object".

- If you said the same as the value, then you will have to write the function for each posible data type. That isn't the idea.

- If you said to use a generic type then you have got the idea, now you how to create a generic pointer?

For long I though it was unpossible, the reason is that the following code ends with two CS0208 errors, "Cannot take the address of, get the size of, or declare a pointer to a managed type ('type')":

unsafe public static int VarPtr<t>(T a)
{
    fixed (T* p = &a)
    {
        return (int)p;
    }
}
</t>

After trying it, of couse so I looked for VarPtr alternative on the web, and found a solution:

static int Ptr(object o)
{
    System.Runtime.InteropServices.GCHandle GC =
    System.Runtime.InteropServices.GCHandle.Alloc(o,
    System.Runtime.InteropServices.GCHandleType.Pinned);
    int ret = GC.AddrOfPinnedObject().ToInt32();
    return ret;
}

I needed to test it, so I designed the next scenario:

int a = 9;
unsafe
{
    int b = (int)&a;
    int c = VarPtr(a);
    Console.WriteLine(b);
    Console.WriteLine(c);
    int* p = (int*)b;
    int* q = (int*)c;
    *p = 1;
    *q = 2;
}
Console.WriteLine(a);

This is one of the results I got (the two firsts numbers changes from test to test, since they are memory addresses):

1242220
20191136
1

The number 1 at the end tells me that the second pointer is the wrong one.

You may think it's the end of the story, but it's just the beggining, the solution for VarPtr came in the shape of MSIL, this is the way to write it:

MSIL

.method public hidebysig static !!T* AddressOf<T>(!!T& var) cil managed
{
  .maxstack  1
  ldarg.0
  conv.i
  ret
}

It's included in a library I called "GenericPointers" and in a class named "PointerBuilder". The source code in MSIL is attached to this article.

Now using this Lib compiled from MSIL, I'll need to change:

int c = Ptr(a);

with:

int c = (int)PointerBuilder.AddressOf(ref a)

The result was the following:

70642928
70642928
2

So this is out VarPtr, but in order do create it we discovered that Microsoft's implementation of the CLR supports generic pointers... what else could we do with that?

About the risk. you will have the same that you have with any pointer, so spectacular crashes are up to you.

3) How to do ArrayPtr?

There was a problem in VB 6.0 and it was that the arrays didn't allocate their elementes where the pointer you get with VarPtr points... so people managed to get the address to that point using a "hack" they called ArrayPtr. In .NET happens the same thing with arrays. I coded a solution in MSIL, which I'm not including here, because after testing it I discovered that the procedure Ptr I showed in numeral 2 gives the address for Strings and Arrays of any type.

These are news, array didn't pass all the tests, so if you are accessing an array make sure to have a copy of the array too, because the GC may take the array.

4) How to do a "StuctPtr"?

There has been a lot of people that has asked how to get a pointer to a struct they declare in C# and handle it byte by byte, I solved this problem too, the implementation in MSIL is the following:

MSIL

.method public hidebysig static !!T* Convert<T>(native int ptr) cil managed
 {
  .maxstack  1
  ldarga.s   ptr
  call       instance void* [mscorlib]System.IntPtr::ToPointer()
  ret
 }

This routine will allow us to convert a pointer to a generic pointer, with this power we will convert a pointer to a byte array into a pointer to the struct we want, and then asign to it's value our struct. a sample code to take advantage of this in C# is the following:

public unsafe byte[] ToByteArray<T>(T a) where T : struct
{
    int length = PointerBuilder.SizeOf<T>();
    byte[] buffer = new byte[length];
    fixed (byte* p = &buffer[0])
        *PointerBuilder.Convert<T>(new IntPtr((void*)p)) = a;
    return buffer;
}

This code will allow you to handle as a byte array a copy of your struct, of course you will need to convert that array back to struct, to do so the following code comes in handy:

public static unsafe T ToStruct<T>(byte[] a) where T : struct
{
    fixed (byte* p = &a[0])
        return *(PointerBuilder.Convert<T>(new IntPtr((void*)p)));
}

Ok, now beware of passing a byte array wich length is less than the size of the struct, if you need a sizeof for generic types there is one in the MSIL code I'm including with the articule.

The reason because of which you may need a generic sizeof is that if you use sizeof in a generic type it will say error CS0233 "'identifier' does not have a predefined size, therefore sizeof can only be used in an unsafe context (consider using System.Runtime.InteropServices.Marshal.SizeOf)". I've told this to Microsoft and there is risk that sizeof will work on generics with struct constrain in the future.

About taking out the struct constrain is up to you doing this with reference types, do it at your own risk. I'm not interested myself in such situations, since anything you may accomplish with that may not work in another CLR implementation form Microsoft or third party.

I want to note that I've done tests with garbage collection and I haven't found any problems, looks like GC is smart enough to know that there are more than one thing allocated in the given memory space, just as happens in the C# version of union. I'll post here in case you forgot about it:

using System.Runtime.InteropServices;

[StructLayout(LayoutKind.Explicit)]
public struct union
{
    [FieldOffset(0)]
    public int word;
    [FieldOffset(0)]
    public short lowrd;
    [FieldOffset(1)]
    public short hiword;
}

Talking about Union, if you don't need generic types, there is another approach you can take:

using System;
using System.Runtime.InteropServices;
namespace Test
{
    [StructLayout(LayoutKind.Explicit)]
    public unsafe struct tester
    {
        [FieldOffset(0)]
        public fixed byte data[16];
        [FieldOffset(0)]
        public Decimal structure;
    }
    class Program
    {
        static void Main(string[] args)
        {
            tester X = new tester();
            X.structure = -1234567890.01234567890123456789M;
            unsafe
            {
                for (int I = 0; I < 16; I++)
                {
                    Console.WriteLine(X.data[I]);
                }
            }
            Console.ReadKey();
        }        
    }
}

The code above shows how to use unsafe with unions to get the bytes of a given sturct. This code will not work in generic version, because it will need to have a variable size of the field data, and it will response with an error CS0133 "The expression being assigned to 'variable' must be constant". (and also you will need that sizeof).

5) How to do Array to Array copy?

We all now that we can use CopyTo to copy info into another array of the same type, but what do we do if we need to copy the binary data to an array of another type? (I don't mean to convert the items). So far we have seen how to handle some weird pointers, now I'll show an example of a funtion that makes memory copy much faster (no need to walk with a pointer all the array). The code is the following:

public static unsafe T[] CopyToArray<T>(this byte[] a, int telements) where T : struct
{
    T[] R = new T[telements];
    int Z = PointerBuilder.SizeOf<T>() * telements;
    var p = PointerBuilder.AddressOf<T>(R, 0);
    System.Runtime.InteropServices.Marshal.Copy(a, 0, new IntPtr((void*)p), Z);
   return R;
}

public static unsafe byte[] CopyToByteArray<T>(this T[] a) where T : struct
{
    int length = a.Length * PointerBuilder.SizeOf<T>();
    byte[] buffer = new byte[length];
    var p = PointerBuilder.AddressOf<T>(a, 0);
    System.Runtime.InteropServices.Marshal.Copy(new IntPtr((void*)p), buffer, 0, length);
    return buffer;
}

The star in this code is Marshal.Copy which will do the work we did with RtlMoveMemory (CopyMemory) in those "hack" days of VB 6.0 if you are looking forward to graphic work, Marshal.Copy and LockBits are a good team.

This code is wrote to work as extension methods, if you are uilding to .NET 2.0 add the following code in order to make it run without changes (again in case you forgot about it):

namespace System.Runtime.CompilerServices
{
    [AttributeUsage(AttributeTargets.Method)]
    public sealed class ExtensionAttribute : Attribute
    {
        public ExtensionAttribute() { }
    }
}

6) How to do generic arithmetic?

If you have done a vector type, a list type or a matrix type, you may have came to the problem of generic arithmetic. First off, What's generic arithmetic? Generic arithmetic is a term to refer the feature you need to add, multiply, divide, substract, negate, shift ot get the remainder of a division or any other arithmetic operation derived from those with numeric values without caring if they are int, short, long or whatever is a numeric type.

Now that granted, when do you need it? It's needed in such rare but useful situations that we have learned to think they are imposible. For example consider writing a function that calculate the average of the numbers in a generic enumerator, do you want to write it once for each numeric type? of course not, you want to have a single generic function handle the problem, this is the scenario for which I have the solution (except for decimal an any other numeric not primitive type).

The idea is quite simple, now that we have seen how to use MSIL to handle generics, what happens if I tell him to add two values of a generic type? for instance:

MSIL

.method public hidebysig static !!T Add<T>(!!T a, !!T b) cil managed
 {
  .maxstack 2
  ldarg.0
  ldarg.1
  add
  ret
 }

The code above uses add to sum up the values a and b,note that add does not check overflow, to get a overflow checked version change add with add.ovf. the others operations are done unther the same principia.

Guess what? it works! but it's pretty, pretty dangerous, if you happen to pass objects, strings, or even decimals, it will end with an AccessViolationException or with a more expectacular FatalExecutionEngineError, which is a fatal error, in other word it crashes your code not matter how many try-catch statements do you have.

Ok, to avoid this danger include the following test around this code:

if (typeof(decimal).IsPrimitive)
{
 /*...*/
}

I would had include it in the MSIL, but checking this again and again may run in a bottleneck.

Again for curious readers, here is the same done from C# (it ends with a CS0019 error "Operator 'operator' cannot be applied to operands of type 'type' and 'type'):

public static T Ptr<T>(T a, T b)
{
    return a + b;
}

To close the matter of generic arithmetic, I want to add that there is another way to acomplish gemeric arithmetics. It's slower but more secure and will require LINQ, the following code is a working example of this approach:

using System;
using System.Collections.Generic;
using System.Linq.Expressions;
namespace Test
{
    class Program
    {
        static void Main(string[] args)
        {
            Console.WriteLine(add(12, 15));
            Console.ReadKey(); 
        }
        private static T add<t>(T a, T b)
        {
            Type t = typeof(T);
            ParameterExpression _a = Expression.Parameter(t, "a");
            ParameterExpression _b = Expression.Parameter(t, "b");
            var R = Expression.Lambda<func><t,>>(
                Expression.Add(
                _a, _b
                )
                , _a, _b).Compile();
            return R(a, b);
        }
    }
}
</t,></func></t>

Disclaimer

There must be a question around, and it is, why do I say that Microsoft didn't want us to know?, well It happens that this things has been asked Microsoft to be added to .NET or Visual Studio, and MIcrosoft responded with a "won't fix - by design", may be even they don't know they did this, but if they know, they didn't want us to know. Let's try to think why? It wont be actually unsafe to do generic arithmetic if they had support for a "numeric" generic constrain about what I want to quote this:

Bruce Eckel : Will I be able to do a template function, in other words, a function where the argument is the unknown type? You are adding stronger type checking to the containers, but can I also get a kind of weaker typing as I can get with C++ templates? For example, will I be able to write a function that takes parameters

A a and B b

, and then in the code say,

a + b?

Can I say that I don't care what

and

so long as there's an

operator+

for them, because that's kind of a weak typing.

Anders Hejlsberg : What you are really asking is, what can you say in terms of constraints? Constraints, like any other feature, can become arbitrarily complex if taken to their ultimate extreme. When you think about it, constraints are a pattern matching mechanism. You want to be able to say, "This type parameter must have a constructor that takes two arguments, implement

operator+

, have this static method, has these two instance methods, etc." The question is, how complicated do you want this pattern matching mechanism to be?

There's a whole continuum from nothing to grand pattern matching. We think it's too little to say nothing, and the grand pattern matching becomes very complicated, so we're in- between. We allow you to specify a constraint that can be one class, zero or more interfaces, and something called a constructor constraint. You can say, for example, "This type must implement

IFoo

and

IBar

" or "This type must inherit from base class

." Once you do that, we type check everywhere, at both compile and run time, that the constraint is true. Any methods implied by that constraint are directly available on values of that type parameter type.

Now, in C#, operators are static members. So, an operator can never be a member of an interface, and therefore an interface constraint could never endow you with an

operator+

. The only way you can endow yourself with an <cod> operator+ is by having a class constraint that says you must inherit from, say,

Number

, and

Number

has an

operator+

of two

Numbers

. But you could not in the abstract say, "Must have an

operator+

," and then we polymorphically resolve what that means.

Sorry by the long quote, but the context is importat here, for abstract it implies that we don't have support of numeric constrains because of the complexity involved in such expesific constrain. now think about the code I show here... it does it! but it have strong risks, which I'm afraid I may have not found them all. I not saying: "Go hack .NET", I'm saying to Microsoft to support this in safe way, and to the developers to use this carefully, I want to show to Microsoft that we may need this featues in the language and CLR. Another reason why I'm not saying to hack .NET is that I like the idea of having code running in tird party implementations, and if we want that then hacks are nonsense.

After that another question may have raised, and it is what is all this useful to? I can't go the specific kind of work we may need it (and remember that I said that this wasn't for marshalling), First use I may do with this is all that work we do reading stuff from streams, secound would be working on graphics or sound for realtime applications, and derivated from that one another use is video games, ok you will tell me two things on the matter:

1) it's not secure

Like any other way of doing things there are good practices and mistakes to avoid, I suggest to work with this only inside a component, and test, test, test.

2) there are other ways to do it

I know, "don't reinvent the wheel", but remember that sometimes it's good to know how the wheel was done, and that wheel was done by humans so it's not perfect and there is room to improve. So I have two things to say, first is that I will not go to C++ to have a more unsecure scenario just because I want to handle some dangerous things, nothing more away from my scope, I want to keep the security up in any place I can while getting enough speed and power to accomplish my requirements. And secound is that even as if I know that there are third parties products that solve my problems, some developer worked with that, and did the dirty work for you to have the hands clean. What if I tell you that I'm heading to be one of those? or if you are needing some dirty work for first time, and there is not third party product? so atleast think about it.

Lastly if you are still thinking that this doesn't help at all, remember that you know that it's posible thanks to me, so you may take this article as a curiosity expo, and skip sending me messages saying: "what is this for?".

Legal note

This article is published under the Creative Commons Attribution-Share Alike license, since this license says: "Any of the above conditions can be waived if you get permission from the copyright holder.", I'll allow you to forgive the "Share Aike" part, releasing this license from it's viral nature. I only care about those that doesn't create derivate product that use this code for commercial use to include my name in the "special thanks"

License

This article, along with any associated source code and files, is licensed under The Creative Commons Attribution-ShareAlike 2.5 License