This is a chapter excerpt from C++/CLI in Action authored by Nishant Sivakumar
and published by Manning Publications. The content has been reformatted for
CodeProject and may differ in layout from the printed book and the e-book.
4.1 Using interior and pinning pointers
You can't use native pointers with CLI objects on the managed heap. That is
like trying to write Hindi text using the English alphabet—they're two different
languages with entirely different alphabets. Native pointers are essentially
variables that hold memory address locations. They point to a memory location
rather than to a specific object. When we say a pointer points to an object, we
essentially mean that a specific object is at that particular memory location.
This approach won't work with CLI objects because managed objects in the CLR
heap don't remain at the same location for the entire period of their lifetime.
Figure 4.1 shows a diagrammatic view of this problem. The Garbage Collector (GC)
moves objects around during garbage-collection and heap-compaction cycles. A
native pointer that points to a CLI object becomes garbage once the object has
been relocated. By then, it's pointing to random memory. If an attempt is made
to write to that memory, and that memory is now used by some other object, you
end up corrupting the heap and possibly crashing your application.
C++/CLI provides two kinds of pointers that work around this problem. The
first kind is called an interior pointer, which is updated by the runtime to
reflect the new location of the object that's pointed to every time the object
is relocated. The physical address pointed to by the interior pointer never
remains the same, but it always points to the same object. The other kind is
called a pinning pointer, which prevents the GC from relocating the object; in
other words, it pins the object to a specific physical location in the CLR heap.
With some restrictions, conversions are possible between interior, pinning, and
native pointers.
Pointers by nature aren't safe, because they allow you to directly manipulate
memory. For that reason, using pointers affects the type-safety and
verifiability of your code. I strongly urge you to refrain from using CLI
pointers in pure-managed applications (those compiled with /clr:safe
or
/clr:pure
)
and to use them strictly to make interop calls more convenient.
4.1.1 Interior pointers
An interior pointer is a pointer to a managed object or a member of a
managed object that is updated automatically to accommodate for
garbage-collection cycles that may result in the pointed-to object being
relocated on the CLR heap. You may wonder how that's different from a managed
handle or a tracking reference; the difference is that the interior pointer
exhibits pointer semantics, and you can perform pointer operations such as
pointer arithmetic on it. Although this isn't an exact analogy, think of it like
a cell phone. People can call you on your cell phone (which is analogous to an
interior pointer) wherever you are, because your number goes with you—the mobile
network is constantly updated so that your location is always known. They
wouldn't be able to do that with a landline (which is analogous to a native
pointer), because a landline's physical location is fixed.
Interior pointer declarations use the same template-like syntax that is used
for CLI arrays, as shown here:
interior_ptr< type > var = [address];
Listing 4.1 shows how an interior pointer gets updated when the object it
points to is relocated.
ref struct CData
{
int age;
};
int main()
{
for(int i=0; i<100000; i++) gcnew CData();
CData^ d = gcnew CData();
d->age = 100;
interior_ptr<int> pint = &d->age;
printf("%p %d\r\n",pint,*pint);
for(int i=0; i<100000; i++) gcnew CData();
printf("%p %d\r\n",pint,*pint); return 0;
}
Listing 4.1 Code that shows how an interior pointer is updated by the CLR
In the sample code, you create 100,000 orphan CData
objects
((1)) so that you can
fill up a good portion of the CLR heap. You then create a CData
object that's
stored in a variable and ((2)) an interior pointer to the int
member
age
of this CData
object. You then print out the pointer address as well as the
int
value
that is pointed to. Now, ((3)) you create another 100,000 orphan CData
objects;
somewhere along the line, a garbage-collection cycle occurs (the orphan objects
created earlier ((1)) get collected because they aren't referenced anywhere). Note
that you don't use a GC::Collect
call because that's not guaranteed to force a
garbage-collection cycle. As you've already seen in the discussion of the
garbage-collection algorithm in the previous chapter, the GC frees up space by
removing the orphan objects so that it can do further allocations. At the end of
the code (by which time a garbage collection has occurred), you again ((4)) print
out the pointer address and the value of age
. This is the output I got on my
machine (note that the addresses will vary from machine to machine, so your
output values won't be the same):
012CB4C8 100
012A13D0 100
As you can see, the address pointed to by the interior pointer has changed.
Had this been a native pointer, it would have continued to point to the old
address, which may now belong to some other data variable or may contain random
data. Thus, using a native pointer to point to a managed object is a disastrous
thing to attempt. The compiler won't let you do that: You can't assign the
address of a CLI object to a native pointer, and you also can't convert from an
interior pointer to a native pointer.
Passing by reference
Assume that you need to write a function that accepts an integer (by
reference) and changes that integer using some predefined rule. Here's what such
a function looks like when you use an interior pointer as the pass-by-reference
argument:
void ChangeNumber(interior_ptr<int> num, int constant)
{
*num += constant * *num;
}
And here's how you call the function:
CData^ d = gcnew CData();
d->age = 7;
interior_ptr<int> pint = &d->age;
ChangeNumber(pint, 3);
Console::WriteLine(d->age);
Because you pass an interior pointer, the original variable (the age member
of the CData
object) gets changed. Of course, for this specific
scenario, you may as well have used a tracking reference as the first argument
of the ChangeNumber
function; but one advantage of using an
interior pointer is that you can also pass a native pointer to the function,
because a native pointer implicitly converts to an interior pointer (although
the reverse isn't allowed). The following code works:
int number = 8;
ChangeNumber(&number, 3); Console::WriteLine(number);
It's imperative that you remember this. You can pass a native pointer to
function that expects an interior pointer as you do here ((1)), because
there is an implicit conversion from the interior pointer to the native pointer.
But you can't pass an interior pointer to a native pointer; if you try that,
you'll get a compiler error. Because native pointers convert to interior
pointers, you should be aware that an interior pointer need not necessarily
always point to the CLR heap: If it contains a converted native pointer, it's
then pointing to the native C++ heap. Next, you'll see how interior pointers can
be used in pointer arithmetic (something that can't be done with a tracking
reference).
Pointer arithmetic
Interior pointers (like native pointers) support pointer arithmetic; thus,
you may want to optimize a performance-sensitive piece of code by using direct
pointer arithmetic on some data. Here's an example of a function that uses
pointer arithmetic on an interior pointer to quickly sum the contents of an
array of int
s:
int SumArray(array<int>^% intarr)
{
int sum = 0;
interior_ptr<int> p = &intarr[0];
while(p != &intarr[0]+ intarr->Length) sum += *p++;
return sum;
}
In this code, p
is an interior pointer to the array ((1))
(the address of the first element of the array is also the address of the
array). You don't need to worry about the GC relocating the array in the CLR
heap. You iterate through the array by using the ++ operator
on the
interior pointer ((2)), and you add each element to the variable
sum
as you do so. This way, you avoid the overhead of going through the
System::Array
interface to access each array element.
It's
not just arrays that can be manipulated using an interior pointer. Here's
another example of using an interior pointer to manipulate the contents of a
System::String
object:
StString^ str = "Nish wrote this book for Manning Publishing";
interior_ptr<Char> ptxt = const_cast< interior_ptr<Char> >(
PtrToStringChars(str)); interior_ptr<Char> ptxtorig = ptxt; while((*ptxt++)++); Console::WriteLine(str); while((*ptxtorig++)--); Console::WriteLine(str);
You use the PtrToStringChars
helper function ((1)) to get
an interior pointer to the underlying string buffer of a
System::String
object. The
PtrToStringChars
function is a helper function
declared in
<vcclr.h> that returns a
const
interior pointer
to the first character of a
System::String
. Because it returns a
const
interior pointer, you have to use
const_cast
to
convert it to a non-
const
pointer. You go through the string using
a
while
-loop
((3)) that increments the pointer as well as
each character until a
nullptr
is encountered, because the
underlying buffer of a
String
object is always
nullptr
-terminated.
Next, when you use
Console::WriteLine
on the
String
object
((4)), you can see that the string has changed to:
Ojti!xspuf!uijt!cppl!gps!Nboojoh!Qvcmjtijoh
You've achieved encryption! (Just kidding.) Because you saved the original
pointer in ptxtorig
((2)), you can use it to convert the
string back to its original form using another while
loop. The second while
loop ((5)) increments the pointer but decrements each character until it
reaches the end of the string (determined by the nullptr
). Now,
((6)) when you do a Console::WriteLine
, you get the original string:
Nish wrote this book for Manning Publishing
A dangerous side-effect of using interior pointers to manipulate
String objects
The CLR performs something called string interning on managed
strings, so that multiple variables or literal occurrences of the same
textual string always refer to a single instance of the
System::String
object. This is possible because
System::String
is immutable—the moment you change one of those
variables, you change the reference, which now refers to a new
String
object (quite possibly another interned string). All this
is fine as long as the strings are immutable. But when you use an
interior or pinning pointer to directly access and change the underlying
character array, you break the immutability of String
objects. Here's some code that demonstrates what can go wrong:
String^ s1 = "Nishant Sivakumar";
String^ s2 = "Nishant Sivakumar";
interior_ptr<Char> p1 = const_cast<interior_ptr<Char> >(
PtrToStringChars(s1)); while(*p1) (*p1++) = 'X';
Console::WriteLine("s1 = {0}\r\ns2 = {1}",s1,s2);
The output of that is as follows:
s1 = XXXXXXXXXXXXXXXXX
s2 = XXXXXXXXXXXXXXXXX
You only changed one string, but both strings are changed. If you
don't understand what's happening, this can be incredibly puzzling. You
have two String handle variables, s1 and
s2
, both containing the same string literal. You get an interior
pointer p1 to the string s1 and change each
character in s1 to X (basically blanking out
the string with the character X ). Common logic would say
that you have changed the string s1 , and that's that. But
because of string interning, s1 and s2 were
both handles to the same String object on the CLR heap.
When you change the underlying buffer of the string s1
through the interior pointer, you change the interned string. This means
any string handle to that String object now points to an
entirely different string (the X-string in this case). The output of the
Console::WriteLine should now make sense to you.
In this case, figuring out the problem was easy, because both string
handles were in the same block of code, but the CLR performs string
interning across application domains. This means changing an interned
string can result in extremely hard-to-debug errors in totally
disconnected parts of your application. My recommendation is to try to
avoid directly changing a string through a pointer, except when you're
sure you won't cause havoc in other parts of the code. Note that it's
safe to read a string through a pointer; it's only dangerous when you
change it, because you break the "strings are immutable" rule of the CLR.
Alternatively, you can use the String::IsInterned function
to determine if a specific string is interned, and change it only if it
isn't an interned string. |
Whenever you use an interior pointer, it's represented as a managed pointer
in the generated MSIL. To distinguish it from a reference (which is also
represented as a managed pointer in IL), a modopt
of type
IsExplicitlyDereferenced
is emitted by the compiler. A modopt
is an optional
modifier that can be applied to a type's signature. Another interesting point in
connection with interior pointers is that the this
pointer of an instance of a
value
type is a non-const
interior pointer to the type. Look at the
value
class
shown here, which obtains an interior pointer to the class by assigning it to
the this
pointer:
value class V
{
void Func()
{
interior_ptr<V> pV1 = this;
}
};
As is obvious, in a value
class, if you need to get a pointer to
this
, you should use an interior pointer, because the compiler
won't allow you to use a native pointer. If you specifically need a native
pointer to a value
object that's on the managed heap, you have to
pin the object using a pinning pointer and then assign it to the native pointer.
We haven't discussed pinning pointers yet, but that's what we'll talk about in
the next section.
4.1.2 Pinning pointers
As we discussed in the previous section, the GC moves CLI objects around the
CLR heap during garbage-collection cycles and during heap-compaction operations.
Native pointers don't work with CLI objects, for reasons previously mentioned.
This is why we have interior pointers, which are self-adjusting pointers that
update themselves to always refer to the same object, irrespective of where the
object is located in the CLR heap. Although this is convenient when you need
pointer access to CLI objects, it only works from managed code. If you need to
pass a pointer to a CLI object to a native function (which runs outside the CLR),
you can't pass an interior pointer, because the native function doesn't know
what an interior pointer is, and an interior pointer can't convert to a native
pointer. That's where pinning pointers come into play.
A pinning pointer pins a CLI object on the CLR heap; as long as the pinning
pointer is alive (meaning it hasn't gone out of scope), the object remains
pinned. The GC knows about pinned objects and won't relocate pinned objects. To
continue the phone analogy, imagine a pinned pointer as being similar to your
being forced to remain stationary (analogous to being pinned). Although you have
a cell phone, your location is fixed; it's almost as if you had a fixed
landline.
Because pinned objects don't move around, it's legal to convert a pinned
pointer to a native pointer that can be passed to the native caller that's
running outside the control of the CLR. The word pinning or pinned is a good
choice; try to visualize an object that's pinned to a memory address, just like
you pin a sticky note to your cubicle's side-board.
The syntax used for a pinning pointer is similar to that used for an interior
pointer:
pin_ptr< type > var = [address];
The duration of pinning is the lifetime of the pinning pointer. As long as
the pinning pointer is in scope and pointing to an object, that object remains
pinned. If the pinning pointer is set to nullptr
, then the object
isn't pinned any longer; or if the pinning pointer is set to another object, the
new object becomes pinned and the previous object isn't pinned any more.
Listing 4.2 demonstrates the difference between interior and pinning
pointers. To simulate a real-world scenario within a short code snippet, I used
for
loops to create a large number of objects to bring the GC into
play.
for(int i=0; i<100000; i++)
gcnew CData();
CData^ d1 = gcnew CData(); for(int i=0; i<1000; i++)
gcnew CData();
CData^ d2 = gcnew CData();
interior_ptr<int> intptr = &d1->age; pin_ptr<int> pinptr = &d2->age;
printf("intptr=%p pinptr=%p\r\n", intptr, pinptr);
for(int i=0; i<100000; i++) gcnew CData();
printf("intptr=%p pinptr=%p\r\n",
intptr, pinptr);
Listing 4.2 Code that compares an interior pointer with a pinning pointer
In the code, you create two CData
objects with a gap in between
them ((1)) and associate one of them with an interior pointer to the
age
member of the first object ((2)). The other is
associated with a pinning pointer to the age
member of the second
object ((3)). By creating a large number of orphan objects, you force a
garbage-collection cycle ((4)) (again, note that calling
GC::Collect
may not always force a garbage-collection cycle; you need to
fill up a generation before a garbage-collection cycle will occur). The output I
got was
intptr=012CB4C8 pinptr=012CE3B4
intptr=012A13D0 pinptr=012CE3B4
Your pointer addresses will be different, but after the garbage-collection
cycle, you'll find that the address held by the pinned pointer (pinptr
)
has not changed, although the interior pointer (intptr
) has
changed. This is because the CLR and the GC see that the object is pinned and
leave it alone (meaning it doesn't get relocated on the CLR heap). This is why
you can pass a pinned pointer to native code (because you know that it won't be
moved around).
Passing to native code
The fact that a pinning pointer always points to the same object (because the
object is in a pinned state) allows the compiler to provide an implicit
conversion from a pinning pointer to a native pointer. Thus, you can pass a
pinning pointer to any native function that expects a native pointer, provided
the pointers are of the same type. Obviously, you can't pass a pinning pointer
to a float
to a function expecting a native pointer to a char
.
Look at the following native function that accepts a wchar_t*
and
returns the number of vowels in the string pointed to by the wchar_t*
:
#pragma unmanaged
int NativeCountVowels(wchar_t* pString)
{
int count = 0;
const wchar_t* vowarr = L"aeiouAEIOU";
while(*pString)
if(wcschr(vowarr,*pString++))
count++;
return count;
}
#pragma managed
#pragma managed/unmanaged
These are #pragma compiler directives that give you
function-level control for compiling functions as managed or unmanaged.
If you specify that a function is to be compiled as unmanaged, native
code is generated, and the code is executed outside the CLR. If you
specify a function as managed (which is the default), MSIL is generated,
and the code executes within the CLR. Note that if you have an unmanaged
function that you've marked as unmanaged, you should remember to
re-enable managed compilation at the end of the function |
Here's how you pass a pointer to a CLI object, after first pinning it, to the
native function just defined:
String^ s = "Most people don't know that the CLR is written in C++";
pin_ptr<Char> p = const_cast< interior_ptr<Char> >(
PtrToStringChars(s));
Console::WriteLine(NativeCountVowels(p));
PtrToStringChars
returns a const
interior pointer,
which you cast to a non-const
interior pointer; this is implicitly
converted to a pinning pointer. You pass this pinning pointer, which implicitly
converts to a native pointer, to the NativeCountVowels
function.
The ability to pass a pinning pointer to a function that expects a native
pointer is extremely handy in mixed-mode programming, because it gives you an
easy mechanism to pass pointers to objects on the CLR heap to native functions.
Figure 4.2 illustrates the various pointer conversions that are available.
As you can see in the figure, the only pointer conversion that is illegal is
that from an interior pointer to a native pointer; every other conversion is
allowed and implicitly done. You have seen how pinning pointers make it
convenient for you to pass pointers to CLI objects to unmanaged code. I now have
to warn you that pinning pointers should be used only when they're necessary,
because tactless usage of pinning pointers results in what is called the heap
fragmentation problem.
The heap fragmentation problem
Objects are always allocated sequentially in the CLR heap. Whenever a garbage
collection occurs, orphan objects are removed, and the heap is compacted so it
won't remain in a fragmented condition. (We covered this in the previous chapter
when we discussed the multigenerational garbage-collection algorithm used by the
CLR.) Let's assume that memory is allocated from a simple heap that looks like
figures 4.3 through 4.6. Of course, this is a simplistic representation of the
CLR's GC-based memory model, which involves a more complex algorithm. But the
basic principle behind the heap fragmentation issue remains the same, and thus
this simpler model will suffice for the present discussion. Figure 4.3 depicts
the status of the heap before a garbage-collection cycle occurs.
There are presently three objects in the heap. Assume that Obj2
(with the gray shaded background) is an orphan object, which means it will be
cleaned up during the next garbage-collection cycle. Figure 4.4 shows what the
heap looks like after the garbage-collection cycle.
The orphan object has been removed and a heap compaction has been performed,
so Obj1
and Obj3
are now next to each other. The idea
is to maximize the free space available in the heap and to put that free space
in a single contiguous block of memory. Figure 4.5 shows what the heap would
look like if there was a pinned object during the garbage-collection cycle.
Assume that Obj3
is a pinned object (the circle represents the
pinning). Because the GC won't move pinned objects, Obj3
remains
where it was. This results in fragmentation because the space between Obj1
and Obj2
cannot be added to the large continuous free block of
memory. In this particular case, it's just a small gap that would have contained
only a single object, and thus isn't a major issue. Now, assume that several
pinned objects exist on the CLR heap when the garbage-collection cycle occurs.
Figure 4.6 shows what happens in such a situation.
None of those pinned objects can be relocated. This means the compaction
process can't be effectively implemented. When there are several such pinned
objects, the heap is severely fragmented, resulting in slower and less efficient
memory allocation for new objects. This is the case because the GC has to try
that much harder to find a block that's large enough to fit the requested
object. Sometimes, although the total free space is bigger than the requested
memory, the fact that there is no single continuous block of memory large enough
to hold that object results in an unnecessary garbage-collection cycle or a
memory exception. Obviously, this isn't an efficient scenario, and it's why you
have to be extremely cautious when you use pinning pointers.
Recommendations for using pinning pointers
Now that you've seen where pinning pointers can be handy and where they can
be a little dodgy, I'm going to give you some general tips on effectively using
pinning pointers.
Unless you absolutely have to, don't use a pinning pointer! Whenever you
think you need to use a pinning pointer, see if an interior pointer or a
tracking reference may be a better option. If an interior pointer is
acceptable as an alternative, chances are good that this is an improper
place for using a pinning pointer.
If you need to pin multiple objects, try to allocate those objects
together so that they're in an adjacent area in the CLR heap. That way, when
you pin them, those pinned objects will be in a contiguous area of the heap.
This reduces fragmentation compared to their being spread around the heap.
When making a call into native code, check to see if the CLR marshalling
layer (or the target native code) does any pinning for you. If it does, you
don't need to pin your object before passing it, because you'd be writing
unnecessary (though harmless) code by adding an extra pinning pointer to the
pinned object (which doesn't do anything to the pinned state of the object).
Newly allocated objects are put into Generation-0 of the CLR heap. You
know that garbage-collection cycles happen most frequently in the
Generation-0 heap. Consequently, you should try to avoid pinning recently
allocated objects; chances are that a garbage-collection cycle will occur
while the object is still pinned.
Reduce the lifetime of a pinning pointer. The longer it stays in scope,
the longer the object it points to remains pinned and the greater the
chances of heap fragmentation. For instance, if you need a pinning pointer
inside an if
block, declare it inside the if
block so the pinning ends when
the if
block exits.
Whenever you pass a pinning pointer to a native pointer, you have to
ensure that the native pointer is used only if the pinning pointer is still
alive. If the pinning pointer goes out of scope, the object becomes
unpinned. Now it can be moved around by the GC. Once that happens, the
native pointer is pointing to some random location on the CLR heap. I've
heard the term GC hole used to refer to such a scenario, and it can be a
tough debugging problem. Although it may sound like an unlikely contingency,
think of what may happen if a native function that accepts a native pointer
stores this pointer for later use. The caller code may have passed a pinning
pointer to this function. Once the function has returned, the pinning will
quickly stop, because the original pinning pointer won't be alive much
longer. However, the saved pointer may be used later by some other function
in the native code, which may result in some disastrous conditions (because
the location the pointer points to may contain some other object now or even
be free space). The best you can do is to know what the native code is going
to do with a pointer before you pass a pinning pointer to it. That way, if
you see that there is the risk of a GC hole, you avoid calling that function
and try to find an alternate solution.
Note that these are general guidelines and not hard rules to be blindly
followed at all times. It's good to have some basic strategies and to understand
the exact consequences of what happens when you inappropriately use pinning
pointers. Eventually, you have to evaluate your coding scenario and use your
judgment to decide on the best course.