Introduction
80% of questions I see related to C++/CLI on the internet are bugs or confusion related to the language environment itself (also, 49% of facts on the internet are 62.385% correct in 26.4% of cases - teehee). I hope this article will help some people avoid those issues and for others possibly point them in the right direction.
Even though some topics in this article are language-agnostic, this article assumes you understand the basic syntax of C++/CLI and some basics of .NET. Enjoy!
Glossary
- CIL - Common Intermediate Language
- CLR - Common Language Runtime
- CTS - Common Type System
- GC - garbage collector, garbage-collected
Contents
/CLR - Mixed, Pure, and Safe
C++/CLI has three compiler options. These options are important to understand as they affect not only what external code can consume your code in the case of a DLL but can determine what native code options are available to you during development as well as some aspects of the environment itself such as global values.
/CLR
generates a mixed-mode assembly. A mixed-mode assembly is an assembly which contains both machine code and CIL. Why would you want machine code in the assembly? For native entry-points for functions. In C++/CLI there are two entry-point types - managed and native. By default with /CLR
, functions are managed and both a managed and native entry-point are supplied. The native entry-point uses the __cdecl
calling-convention and forwards the call to the managed entry-point (double thunking occurs). This allows native code to call managed functions in your assembly. Global values are stored per-process since native (machine) code is not domain-aware.
/CLR:pure
generates a CIL-only assembly. Here's where a lot of confusion starts due to overloaded terms. CIL is not only managed code but native code as well. This means /CLR:pure
does not prevent the use of native code inside your assembly as that code is compiled to CIL anyways. What it does is cause all entry-point functions to use the __clrcall
calling convention which prevents the creation of native entry-points (can't have machine code in a CIL-only assembly). This means only managed code can call in to your assembly. Here global values are stored per-AppDomain
since all code is run through a managed context therefore domain-aware. Also the C Run-Time library is still available as there is a CIL version available in .NET.
/CLR:safe
generates a verifiable CIL-only assembly. This prevents the use of any native code since native code is unverifiable. This option, while still valid, is considered deprecated and may be removed in the future.
This is a basic overview of the differences, but for more information I recommend reading Pure and Verifiable Code.
Another topic closely related is #pragma managed([push,] on|off)
, #pragma managed(pop)
, and #pragma unmanaged
. I say topic because they are all related to specifying managed or native per-function. These pragmas are ignored if /CLR
is not used. This is because under the other two compilation options the pragmas have no meaning since all entry-points are managed (__clrcall
).
#pragma managed(push, off) //or #pragma unmanaged
void UnmanagedFunction() {
printf("Unmanaged");
}
#pragma managed(pop) //or #pragma managed
void ManagedFunction() {
Console::WriteLine("Managed");
}
Specifying a function to be native in this way means it's entirely native - no managed code support. The function is compiled to machine code so when executed the CLR simply passes it off to the native platform.
Memory Management
Memory management in C++/CLI works basically the same as it does in other languages (at least the general details). You've got a call-stack and you've got a heap. Well actually in C++/CLI you've got two heaps. We'll call them the native heap and the garbage-collected heap.
The call-stack functions like any normal call-stack. Stack frames are pushed on function calls with argument data and space for local variables. When stack frames are popped the data is destroyed. Both managed and native code uses the same call-stack. Simple and to-the-point.
The native heap, as you may have guessed, is used for native objects. Reference-types created with new
or value-types boxed with new
are stored here. The native heap allocates and de-allocates memory as requested with no bells and whistles so memory leaks and fragmentation can cause out-of-memory issues even if enough memory technically exists to create an object just like good ol' C++. Also, implicit boxing is not supported for native value-types which we can verify with a quick test.
struct Box { };
Box* Test() {
Box b;
return b; }
The garbage-collected heap is used for managed objects. Reference-types created with gcnew
and value-types either boxed by gcnew
or implicit boxing are stored here. The GC heap in .NET is called a garbage-collected, compacting heap. What this means is while memory is allocated for objects on request, objects are de-allocated (collected) automatically, and allocated memory is compacted periodically to reduce un-allocated memory fragmentation. Think of it like this: If you lay a group of pencils side-by-side and remove a couple, the GC heap will compact the pencils back together, so that extra space is now in one big chunk at the end. This reduces fragmentation and the possibility of out-of-memory issues when available memory still exists for another object - for example an eraser.
You may be wondering now how the garbage-collector knows when to de-allocate memory. The concept is pretty simple. The CLR GC is a reference tracking GC. What this means is when garbage collection occurs all heap objects are marked for deletion (a special bit is set to 0). Then the GC scans all reference-type variables. If the reference is null
the GC moves on. If not, the GC "un-marks" the reference's object by setting the bit to 1 then scans all reference-type variables inside the object by performing the same previous steps. Anytime the GC encounters an already un-marked heap object (bit is 1) it simply moves on. This prevents a circular reference between two objects from causing an infinite loop. This garbage collection process occurs at the GC's convenience which is why the finalizer for objects is non-deterministic.
The last topic I'll cover in this section will be boxing which I mentioned earlier. Explicit boxing is using new
or gcnew
on a value-type object to force boxing to occur. Implicit boxing is when this occurs without specifying the new
or gcnew
memory allocators. As mentioned previously, implicit boxing is not available to native objects.
struct Box { }; value struct MBox { };
void main() {
Box b;
MBox mb;
System::Object^ o = mb; MBox^ mbh = mb;
System::Object^ p = b; Box* c = b; }
Are the objects truely moved to the heap though? Well, let's find out.
value struct MBox {
Box(int v) :x(v) { }
int x;
};
MBox^ Boxing() {
MBox b(1); return b; }
void StackFrame() { int a; char b; }
void main() {
MBox^ b = Boxing();
Console::WriteLine(b->x); StackFrame(); Console::WriteLine(b->x); }
1
1
So yes, the stack object was moved to the heap when boxed otherwise it would have been destroyed after the stack frame for Boxing()
was popped.
If you're interested in more details about garbage collection in .NET and just how complex it really is, check out this MSDN article.
Reference and Value Types
I felt it was necessary to talk about memory first even though both this section and memory are very intertwined. This is because learning in the reverse order leads to a lot of confusion I see around the internet.
So what are reference-types and value-types? Value-types are assigned by-value. A copy is made so you end up with separate objects. Reference-types are assigned by-reference. The reference to the object is copied. Still one object. Conceptually, that's it! But of course there's more to their implementation details. Let's start with value-types.
Value-types are optimized for copying. This is why it's recommended value-types be small in byte-size and immutable. Also, value-type variables directly contain data which is why they can not be null
(there is an exception to this with nullable types). The main advantage of value-types is that they can be stored on either the stack or the heap. That's right, contrary to a lot of information out there, value-types are not inherently allocated on the stack. You may have picked up on this earlier when I mentioned boxing but value-types are also allocated on the heap when contained within a heap object. This heap object can be a boxed value-type or a reference-type.
value struct Box {
Box(int v) :x(v) { }
int x;
};
value struct Container {
Container(int v) :b(v) { }
Box b;
};
Container^ CreateContainer() { return gcnew Container(5); }
void main() {
Container^ c = CreateContainer();
Console::WriteLine(c->b.x);
}
5
We can simulate what would happen if value-types were not allocated on the heap if contained within a heap object.
struct Box {
Box(int v) :x(v) { }
int x;
};
class Container { public: Box* b; };
void Kaboom(Container* c) {
Box b(5); c->b = &b; }
void StackFrame() { int a; char b; }
void main() {
Container* c = new Container();
Kaboom(c);
Console::WriteLine(c->b->x);
StackFrame();
Console::WriteLine(c->b->x);
}
5
0
This also shows why using StackFrame()
is important for these examples. Without overwriting the invalid stack memory pointed to by c->b
, you can retrieve the old value which gives the false appearance of the code being safe.
Reference-types are pretty straightforward. Their object is created on the heap while the reference to the object is created on the stack unless contained inside a heap object. They're created using a memory allocator (new
, gcnew
). Not much more to say I haven't already said really.
ref class Test { };
void main() {
Test^ t = gcnew Test(); }
Class, Struct, Ref, and Value
If you come from a C# background this can get a little confusing because the terms class
and struct
have been overloaded. C++/CLI uses the basic C++ definitions. A class
is an object whose members (direct and inherited) are private by default. A struct
is an object whose members (direct and inherited) are public by default. For native objects, reference-type or value-type is determined by the instantiation semantics used. These are commonly referred to as value semantics and reference semantics.
struct StructFoo { };
class ClassFoo { };
void main() {
StructFoo a; StructFoo* b = new StructFoo();
ClassFoo c; ClassFoo* d = new ClassFoo(); }
On the managed side of things, you get the addition of the ref
and value
keywords. Both are valid for both class
and struct
. They define reference-type or value-type but have some unique caveats since they both also support reference and value semantics. Stay with me here.
ref class RefFoo { };
value class ValFoo { };
void main() {
RefFoo a; RefFoo^ b = gcnew RefFoo();
ValFoo c; ValFoo^ d = gcnew ValFoo();
c = (ValFoo) d; c = *d;
a = (RefFoo) b; a = *b; }
So nothing out of the ordinary except one thing. How can a reference-type support value semantics? The quick answer is it can't, sorta. The long answer is that this is a convenience provided and behind the scenes the RefFoo
object is still created on the GC heap. The convenience is that when a
goes out-of-scope the destructor is called automatically. None of the other value-type benefits are gained - namely a default copy constructor and assignment operator overload.
ref class Foo { };
void Copy(Foo f) { }
void main() {
Foo f1;
Foo f2 = f1; Copy(f1);
Foo f3;
f1 = f3; Foo^ h = %f1; }
Be careful though, this automatic destructor call will happen even if a handle to the object is passed out of the scope. This handle will prevent destruction of the object by the garbage collector but the object could be invalidated by the destructor call if it has a destructor.
ref class Foo {
~Foo() { x = 5; }
public: int x;
}
Foo^ Test() {
Foo a;
a.x = 10;
return %a;
}
void main() {
Foo^ a = Test();
GC::Collect(); Console::WriteLine(a->x);
}
5
There's one more unique aspect to value
types that meet a specific criteria. If the object does not reference the GC heap (no handles), it can be initialized as a native object.
value class Foo { };
value struct Foo2 { };
value struct Foo3 { Foo^ f; };
void main() {
Foo* f = new Foo();
Foo2* f2 = new Foo2();
Foo3* f3 = new Foo3(); }
So here's a quick rundown of all the available mixes of class, struct, ref, and value along with the various ways of initializing them.
class NativeClass { };
struct NativeStruct { };
ref class ManagedRefClass { };
ref struct ManagedRefStruct { };
value class ManagedValueClass { };
value struct ManagedValueStruct { };
void main() {
NativeClass a;
NativeClass* b = new NativeClass();
NativeStruct c;
NativeStruct* d = new NativeStruct();
ManagedRefClass e; ManagedRefClass^ f = gcnew ManagedRefClass();
ManagedRefStruct g; ManagedRefStruct^ h = gcnew ManagedRefStruct();
ManagedValueClass i;
ManagedValueClass^ j = gcnew ManagedValueClass(); ManagedValueClass* k = new ManagedValueClass(); ManagedValueStruct l;
ManagedValueStruct^ m = gcnew ManagedValueStruct(); ManagedValueStruct* n = new ManagedValueStruct(); }
My best-practice advice to avoid confusion with managed objects would be to use ref class
and value struct
only and instantiate them in the standard manner for their type to avoid the earlier unique behavior for ref class
with value semantics and to avoid unnecessary boxing for value struct
with reference semantics. At the end of the day it's up to you though! Certainly a lot of options.
Pointers, Handles, and References
If you've gotten this far, you may be wondering why I'd be talking about these. You'd have to know about pointers and handles to get this far, right? I assumed you knew what pointers and handles were used for but now I want to explain why they both exist.
Handles (^
) are essentially pointers to the GC heap. As explained earlier, the GC heap is both a garbage-collecting and compacting heap. This means allocated objects are being moved around and any references to them need updating. This is the special function of handles. They're automatically updated by the CLR when objects are moved in the GC heap. The caveat to this, however, is that handles can only be updated if the CLR is aware of the handle. The CLR is not aware of handles not registered with an AppDomain
. This is why value
types instantiated as a native object can not hold handles.
Pointers (*
) are just like in C++. They point to a memory address - native heap or stack. Since the native heap doesn't move objects pointers have no need to be updated. I'd advise against returning memory addresses to the stack, however, as they will point to invalid memory as shown in multiple examples earlier. Being able to hold stack addresses means that ^*
is a valid combination though.
ref struct Foo {
Foo(int v) :x(v) { }
int x;
};
struct Foo2 {
Foo^* g;
};
void main() {
Foo^ f = gcnew Foo(5);
Foo2 f2;
f2.g = &f;
Console::WriteLine((*f2.g)->x); }
In the last part of this section, I'd like to touch briefly on a topic of much confusion in general - references and tracking references. Both references (&
) and tracking references (%
) can be thought of as aliases. They represent another name for the value referenced.
The question I see the most is "Well why not just use a pointer or handle?" I'll demonstrate with an example.
ref class Foo { };
void main() {
Foo foo;
Foo^ fooHandle = gcnew Foo();
Foo% fooTrackingRef = foo; Foo^% fooHandleTrackingRef = fooHandle;
Foo% nullReference; Foo^% nullHandleTrackingRef = nullptr; }
Both references and tracking references operate mostly the same, the difference being the same as pointers and handles - tracking references are automatically updated by the CLR if objects move. As you can see, both references and tracking references are not as flexible as pointers and handles. This is important because it can give clarity to your code's intent while preventing some bugs.
ref class Foo {
public: int x = 1;
};
void Test(Foo^ temp) {
temp = gcnew Foo();
temp->x = 10;
}
void Test2(Foo^% temp) {
temp = gcnew Foo();
temp->x = 10;
}
void Test3(Foo% temp) {
temp.x = 5;
}
void main() {
Foo^ f = gcnew Foo();
Test(f); Test2(f); Test(nullptr); Test2(nullptr); Test3(*f); }
The above happens because when you use a reference, it effectively is the object it was assigned. In Test2
, temp = gcnew Foo()
is equivalent to f = gcnew Foo()
since temp
acts as not just a copy of the f
handle but as the handle itself.
As a final note, a tracking reference to a ref
type object instantiated with value semantics does not behave the same way as the base object. The destructor is not called when leaving the scope of the tracking reference. This is also true for both tracking and normal references to standard value types.
ref struct Foo {
Foo(int v) : x(v) { }
~Foo() { x = 2; }
int x;
};
void InnerTest(Foo% f) { } Foo^ Test() {
Foo f(5);
InnerTest(f);
Console::WriteLine(f.x);
return %f; }
void main() {
Foo^ f = Test();
Console::WriteLine(f->x);
}
5
2
Mixed-mode Objects
This is where it all comes together. A mixed-mode object is a native or managed object which contains managed or native data, respectively. Little did you know, you've already learned a lot about what a valid mixed-mode object can contain and why. Let's dive right into it.
value struct Test;
ref struct Foo;
class NClass { };
struct NStruct {
char c;
char* chr;
NStruct ns; NStruct* nsp;
NClass nc;
NClass ncp;
Test t;
Test* tp;
Test^ th; Foo f; Foo* fp; Foo^ fh; };
value struct Test { };
ref struct Foo {
char c;
char* chr;
NStruct ns; NStruct* nsp;
NClass nc; NClass* ncp;
Test t;
Test* tp;
Test^ th;
Foo f; Foo* fp; Foo^ fh;
};
So first off, the only reason both a native class
and struct
were included since the only difference is default member access is to not only re-iterate that point (as shown in Foo
) but to demonstrate the "incomplete type not allowed" error. This error is caused because in order to create an object of type NStruct
, you must then create an object of type NStruct
, leading to infinite constructor recursion.
To summarize, both native and managed types can contain simple types such as char
and int
. Native types can not contain any managed types including handles, ref
types, interface
types, or any object which references the GC heap. The exception is simple value
types which do not violate any of the previous rules. Managed types can not contain native objects directly. Neither can contain any objects/types which violate rules from previous sections such as a pointer to a GC heap object.
Viola. A topic so misunderstood and causes so much confusion is pretty straightforward when you understand the underlying reasons why things are valid or invalid in the context of how managed and native code works.
Destructor and Finalizer
Regardless of if you come from a C++ or C# background, this topic can cause some head-scratching because the implementation in C++/CLI is an amalgamation of both languages' methods and a lot of hidden sorcery in the background.
So if you come from C#, this is basically a C++ syntax implementation (with one new, nifty addition) of the disposable pattern with which you should be intimately familiar. If you come from C++, don't worry, it'll make sense if you've made it this far! So in C#, the disposable pattern looks like this:
public class Foo : IDisposable {
~Foo() {
Dispose(false);
}
private bool _disposed = false;
public void Dispose() {
Dispose(true);
GC.SuppressFinalize(this);
}
protected virtual void Dispose(bool disposing) {
if (_disposed) return;
if (disposing) {
}
_disposed = true;
}
}
This pattern is useful when you want to explicitly clean-up expensive resources. Specifically when you handle native resources to avoid memory leaks. Why not just clean-up everything you can deterministically instead of waiting for the GC to get around to destroying the flagged object by calling its finalizer? That's why this pattern is awesome.
So with C++/CLI having a GC heap, you need a finalizer for non-deterministic cleanup yet you'd also like the dispose pattern for deterministic clean-up. Well, this is where the C++/CLI destructor and finalizer syntax comes into play.
ref class Foo {
~Foo() { if (_disposed) return;
this->!Foo();
_disposed = true;
}
!Foo() { }
bool _disposed = false;
}
Wow, that is a lot cleaner in my opinion. Since you can directly call a finalizer in C++/CLI (you can't in C#) we no longer need the Dispose(bool)
helper function, plus the GC.SuppressFinalize(this)
and base destructor call are automatically handled for us. You may have guessed this already, but this C++/CLI syntax is converted into a dispose pattern automatically for you. It's a shorthand of sorts. Why have we not gotten this yet for C#? I cry everytime.
As a final note, destructors are protected by default (and this can not be changed) and therefore can not be directly called. There are two cases in which a destructor is called - if the object is destroyed by scope or if delete
is used.
Marshalling
If you search "string to char*" in google you'll see pages of questions. Marshalling can be a complex topic and I won't go into much detail here because it warrants its own article. All I want to do is show you a simple way to marshal string representations in C++/CLI (also applies to VC++).
You've no doubt seen various ways of marshalling System::String
to a char*
. Using a pinned pointer might look like this:
#include <stdio.h>
#include <stdlib.h>
#include <vcclr.h>
void main() {
String^ mdata = gcnew String("Test");
char* udata;
pin_ptr<const wchar_t> ptr = PtrToStringChars(mdata);
size_t convertedChars = 0;
size_t sizeInBytes = (mdata->Length + 1) * 2;
udata = new char[sizeInBytes];
wcstombs_s(&convertedChars, udata, sizeInBytes, ptr, _TRUNCATE);
printf("%s", udata); delete[] udata;
}
Or StringToHGlobalAnsi()
:
#include <stdio.h>
#include <string.h>
void main() {
String^ mdata = gcnew String("Test");
char* udata;
IntPtr strPtr = Marshal::StringToHGlobalAnsi(mdata);
char* wchPtr = static_cast<char*>(strPtr.ToPointer());
size_t sizeInBytes = mdata->Length + 1;
udata = new char[sizeInBytes];
strncpy_s(udata, sizeInBytes, wchPtr, _TRUNCATE);
printf("%s", udata); delete[] udata;
wchPtr = null;
Marshal::FreeHGlobal(strPtr);
}
There are a couple problems with these in my opinion. First, much of the time you probably won't need the level of control these methods provide. Second, they provide ample opportunity for errors. For example incorrectly setting the sizeInBytes
or using delete
on the IntPtr
created by StringToHGlobalAnsi()
which will cause errors in debug mode yet work completely fine in release mode.
Most of the time we just want a simple conversion. This is where marshal_as<T>
comes to save the day.
#include <stdio.h>
#include <msclr\marshal.h>
using namespace msclr::interop;
void main() {
String^ mdata = gcnew String("Test");
marshal_context context;
char* udata = const_cast<char*>(context.marshal_as<const char*>(mdata));
printf("%s\n", udata);
Console::WriteLine(marshal_as<String^>(udata));
}
Test
Test
That looks a lot better to me. If you're wondering why converting back to String^
didn't require a marshal_context
, I'll just quote the MSDN since I can't word it any better:
Marshaling requires a context only when you marshal from managed to native data types and the native type you are converting to does not have a destructor for automatic clean up. The marshaling context destroys the allocated native data type in its destructor. Therefore, conversions that require a context will be valid only until the context is deleted. To save any marshaled values, you must copy the values to your own variables.
This is one instance where I like using value semantics with a ref
type (marshal_context
). This ensures the context will be cleaned up when it leaves scope. Since there's no interest in the context object itself - only the data - there shouldn't be a situation where any of the side-effects of instantiating in this manner would matter. Instantiating in the normal manner is perfectly valid though.
The list of available conversions is basically any representation of a string you can imagine plus a few extras. As someone that prefers to use std::string
instead of char*
if I'm not interoping with C code this is pretty awesome.
Bonus Topic: auto_gcroot
Thanks for reading! Now I'll show you a little "cheat" not covered in the mixed-mode object section that just came to mind. Normally a native object can't contain a reference to a managed object. However, there's a magic tool called auto_gcroot<T>
which uses a System::Runtime::InteropServices::GCHandle
to allow unmanaged memory to access a managed object.
#include <msclr\auto_gcroot.h>
using namespace msclr;
ref struct Box {
Box(int v) :x(v) { }
int x;
};
struct Foo {
Foo(int v) :box(gcnew Box(v)) { }
auto_gcroot<Box^> box;
}
void main() {
Foo f(5);
Console::WriteLine(f.box->x);
}
5
GCHandle
works by directly adding a handle to the current AppDomain
. You can then retrieve an IntPtr
to this handle, call ToPointer()
on it, and cast the returned value to the correct pointer type. auto_gcroot<T>
encapsulates all the code required for this while ensuring Free()
is called on the GCHandle
. If Free()
is not called this will result in a memory leak since objects created using GCHandle
are only freed for garbage collection if Free()
is called or the AppDomain
is destroyed.
Final Thoughts
If you want to learn more about the native side of C++/CLI check out information regarding Visual C++. The managed side is usually covered in C++/CLI guides. The least documented portion in my research has been the hazy middle-ground where they come together and interact. It comes as no surprise this is the area where people have the most issues so this is where this article tries to target.
If you see any errors or would like to see a topic I missed covered in an updated article, please leave a comment. Any corrections will get full credit. Hope you enjoyed the read since it was definitely fun writing it! Cheers!
History
10/24/2016: Initial release.
10/28/2016:
2/5/2017: Updated Memory Management section. As Zodiacon pointed out in the comments, the CLR GC is a reference tracking, not reference counting, GC.