C++/CLI: Under the Hood

Jon McKee

4.97/5 (35 votes)

5 Feb 2017CPOL18 min read

37.4K

Examining some topics that often cause confusion in C++/CLI.

Introduction

80% of questions I see related to C++/CLI on the internet are bugs or confusion related to the language environment itself (also, 49% of facts on the internet are 62.385% correct in 26.4% of cases - teehee). I hope this article will help some people avoid those issues and for others possibly point them in the right direction.

Even though some topics in this article are language-agnostic, this article assumes you understand the basic syntax of C++/CLI and some basics of .NET. Enjoy!

Glossary

CIL - Common Intermediate Language
CLR - Common Language Runtime
CTS - Common Type System
GC - garbage collector, garbage-collected

/CLR - Mixed, Pure, and Safe
Memory Management
Reference and Value Types
Class, Struct, Ref and Value
Pointers, Handles, and References
Mixed-mode Objects
Destructor and Finalizer
Marshalling

/CLR - Mixed, Pure, and Safe

C++/CLI has three compiler options. These options are important to understand as they affect not only what external code can consume your code in the case of a DLL but can determine what native code options are available to you during development as well as some aspects of the environment itself such as global values.

/CLR generates a mixed-mode assembly. A mixed-mode assembly is an assembly which contains both machine code and CIL. Why would you want machine code in the assembly? For native entry-points for functions. In C++/CLI there are two entry-point types - managed and native. By default with /CLR, functions are managed and both a managed and native entry-point are supplied. The native entry-point uses the __cdecl calling-convention and forwards the call to the managed entry-point (double thunking occurs). This allows native code to call managed functions in your assembly. Global values are stored per-process since native (machine) code is not domain-aware.

/CLR:pure generates a CIL-only assembly. Here's where a lot of confusion starts due to overloaded terms. CIL is not only managed code but native code as well. This means /CLR:pure does not prevent the use of native code inside your assembly as that code is compiled to CIL anyways. What it does is cause all entry-point functions to use the __clrcall calling convention which prevents the creation of native entry-points (can't have machine code in a CIL-only assembly). This means only managed code can call in to your assembly. Here global values are stored per-AppDomain since all code is run through a managed context therefore domain-aware. Also the C Run-Time library is still available as there is a CIL version available in .NET.

/CLR:safe generates a verifiable CIL-only assembly. This prevents the use of any native code since native code is unverifiable. This option, while still valid, is considered deprecated and may be removed in the future.

This is a basic overview of the differences, but for more information I recommend reading Pure and Verifiable Code.

Another topic closely related is #pragma managed([push,] on|off), #pragma managed(pop), and #pragma unmanaged. I say topic because they are all related to specifying managed or native per-function. These pragmas are ignored if /CLR is not used. This is because under the other two compilation options the pragmas have no meaning since all entry-points are managed (__clrcall).

MC++

#pragma managed(push, off) //or #pragma unmanaged
void UnmanagedFunction() {
  //Compiler will give you a build error if you try to use managed code
  //Console::WriteLine("Woops");
  printf("Unmanaged");
}
#pragma managed(pop) //or #pragma managed

void ManagedFunction() {
  Console::WriteLine("Managed");
}

Specifying a function to be native in this way means it's entirely native - no managed code support. The function is compiled to machine code so when executed the CLR simply passes it off to the native platform.

Memory Management

Memory management in C++/CLI works basically the same as it does in other languages (at least the general details). You've got a call-stack and you've got a heap. Well actually in C++/CLI you've got two heaps. We'll call them the native heap and the garbage-collected heap.

The call-stack functions like any normal call-stack. Stack frames are pushed on function calls with argument data and space for local variables. When stack frames are popped the data is destroyed. Both managed and native code uses the same call-stack. Simple and to-the-point.

The native heap, as you may have guessed, is used for native objects. Reference-types created with new or value-types boxed with new are stored here. The native heap allocates and de-allocates memory as requested with no bells and whistles so memory leaks and fragmentation can cause out-of-memory issues even if enough memory technically exists to create an object just like good ol' C++. Also, implicit boxing is not supported for native value-types which we can verify with a quick test.

MC++

struct Box { };

Box* Test() {
  Box b;
  return b; //ERROR: no conversion between 'Box' and 'Box*' exists.
}

The garbage-collected heap is used for managed objects. Reference-types created with gcnew and value-types either boxed by gcnew or implicit boxing are stored here. The GC heap in .NET is called a garbage-collected, compacting heap. What this means is while memory is allocated for objects on request, objects are de-allocated (collected) automatically, and allocated memory is compacted periodically to reduce un-allocated memory fragmentation. Think of it like this: If you lay a group of pencils side-by-side and remove a couple, the GC heap will compact the pencils back together, so that extra space is now in one big chunk at the end. This reduces fragmentation and the possibility of out-of-memory issues when available memory still exists for another object - for example an eraser.

You may be wondering now how the garbage-collector knows when to de-allocate memory. The concept is pretty simple. The CLR GC is a reference tracking GC. What this means is when garbage collection occurs all heap objects are marked for deletion (a special bit is set to 0). Then the GC scans all reference-type variables. If the reference is null the GC moves on. If not, the GC "un-marks" the reference's object by setting the bit to 1 then scans all reference-type variables inside the object by performing the same previous steps. Anytime the GC encounters an already un-marked heap object (bit is 1) it simply moves on. This prevents a circular reference between two objects from causing an infinite loop. This garbage collection process occurs at the GC's convenience which is why the finalizer for objects is non-deterministic.

The last topic I'll cover in this section will be boxing which I mentioned earlier. Explicit boxing is using new or gcnew on a value-type object to force boxing to occur. Implicit boxing is when this occurs without specifying the new or gcnew memory allocators. As mentioned previously, implicit boxing is not available to native objects.

MC++

struct Box { }; //native value-type
value struct MBox { }; //managed value-type

void main() {
  Box b;
  MBox mb;

  System::Object^ o = mb; //mb is implicitly boxed
  MBox^ mbh = mb; //mb is implicitly boxed

  System::Object^ p = b; //ERROR: no conversion available (can't box into a managed type)
  Box* c = b; //ERROR: no conversion available - it's trying to convert instead of box 
              //since native boxing isn't supported.
}

Are the objects truely moved to the heap though? Well, let's find out.

MC++

value struct MBox { 
  Box(int v) :x(v) { } 
  int x; 
};

MBox^ Boxing() { 
  MBox b(1); //created on the stack
  return b; //b is boxed and handle is returned
}

void StackFrame() { int a; char b; }

void main() {
  MBox^ b = Boxing();
  Console::WriteLine(b->x); //Should be 1 if MBox is still a valid object
  StackFrame(); //Overwrite the previous stack frame memory from Boxing() just in case the
                //valid value is still hanging out there.
  Console::WriteLine(b->x); //Check if x is still 1
}

//Output
1
1

So yes, the stack object was moved to the heap when boxed otherwise it would have been destroyed after the stack frame for Boxing() was popped.

If you're interested in more details about garbage collection in .NET and just how complex it really is, check out this MSDN article.

Reference and Value Types

I felt it was necessary to talk about memory first even though both this section and memory are very intertwined. This is because learning in the reverse order leads to a lot of confusion I see around the internet.

So what are reference-types and value-types? Value-types are assigned by-value. A copy is made so you end up with separate objects. Reference-types are assigned by-reference. The reference to the object is copied. Still one object. Conceptually, that's it! But of course there's more to their implementation details. Let's start with value-types.

Value-types are optimized for copying. This is why it's recommended value-types be small in byte-size and immutable. Also, value-type variables directly contain data which is why they can not be null (there is an exception to this with nullable types). The main advantage of value-types is that they can be stored on either the stack or the heap. That's right, contrary to a lot of information out there, value-types are not inherently allocated on the stack. You may have picked up on this earlier when I mentioned boxing but value-types are also allocated on the heap when contained within a heap object. This heap object can be a boxed value-type or a reference-type.

MC++

value struct Box { 
  Box(int v) :x(v) { } 
  int x; 
};
value struct Container { 
  Container(int v) :b(v) { } 
  Box b; 
};

Container^ CreateContainer() { return gcnew Container(5); }

void main() {
  Container^ c = CreateContainer();
  Console::WriteLine(c->b.x);
}

//Output
5

We can simulate what would happen if value-types were not allocated on the heap if contained within a heap object.

MC++

struct Box { 
  Box(int v) :x(v) { } 
  int x; 
};
class Container { public: Box* b; };

void Kaboom(Container* c) {
  Box b(5); //create Box on stack
  c->b = &b; //no boxing, only getting the address of b
}
void StackFrame() { int a; char b; }

void main() {
  Container* c = new Container();
  Kaboom(c);
  Console::WriteLine(c->b->x);
  StackFrame();
  Console::WriteLine(c->b->x);
}

//Output
5
0

This also shows why using StackFrame() is important for these examples. Without overwriting the invalid stack memory pointed to by c->b, you can retrieve the old value which gives the false appearance of the code being safe.

Reference-types are pretty straightforward. Their object is created on the heap while the reference to the object is created on the stack unless contained inside a heap object. They're created using a memory allocator (new, gcnew). Not much more to say I haven't already said really.

MC++

ref class Test { };

void main() {
  Test^ t = gcnew Test(); //t exists on the stack, the Test object exists on the heap
} //t gets destroyed, object's reference count equals zero, object is flagged for destruction

Class, Struct, Ref, and Value

If you come from a C# background this can get a little confusing because the terms class and struct have been overloaded. C++/CLI uses the basic C++ definitions. A class is an object whose members (direct and inherited) are private by default. A struct is an object whose members (direct and inherited) are public by default. For native objects, reference-type or value-type is determined by the instantiation semantics used. These are commonly referred to as value semantics and reference semantics.

MC++

struct StructFoo { };
class ClassFoo { };

void main() {
  StructFoo a; //value-type
  StructFoo* b = new StructFoo(); //reference-type

  ClassFoo c; //value-type
  ClassFoo* d = new ClassFoo(); //reference-type
}

On the managed side of things, you get the addition of the ref and value keywords. Both are valid for both class and struct. They define reference-type or value-type but have some unique caveats since they both also support reference and value semantics. Stay with me here.

MC++

ref class RefFoo { };
value class ValFoo { };

void main() {
  RefFoo a; //Not what you expect, explained below
  RefFoo^ b = gcnew RefFoo(); //reference-type

  ValFoo c; //value-type
  ValFoo^ d = gcnew ValFoo(); //boxed value-type

  //We can show d is boxed but b is just a normal reference-type
  //by attempting to unbox them.
  c = (ValFoo) d; //Works as expected
  c = *d; //Works as expected

  a = (RefFoo) b; //ERROR: no conversion from RefFoo^ to RefFoo exists
  a = *b; //ERROR: no operator '=' matches operands of RefFoo and RefFoo
}

So nothing out of the ordinary except one thing. How can a reference-type support value semantics? The quick answer is it can't, sorta. The long answer is that this is a convenience provided and behind the scenes the RefFoo object is still created on the GC heap. The convenience is that when a goes out-of-scope the destructor is called automatically. None of the other value-type benefits are gained - namely a default copy constructor and assignment operator overload.

MC++

ref class Foo { };

void Copy(Foo f) { }

void main() {
  Foo f1;
  Foo f2 = f1; //ERROR, no suitable copy constructor
  Copy(f1); //ERROR, no suitable copy constructor

  Foo f3; 
  f1 = f3; //ERROR, no assignment operator for type Foo = Foo
  Foo^ h = %f1; //Get the the handle for f1 by using the unary % operator
} //Once out of scope f1, f2, and f3 destructors are called

Be careful though, this automatic destructor call will happen even if a handle to the object is passed out of the scope. This handle will prevent destruction of the object by the garbage collector but the object could be invalidated by the destructor call if it has a destructor.

MC++

ref class Foo {
  ~Foo() { x = 5; }
  public: int x;
}

Foo^ Test() {
  Foo a;
  a.x = 10;
  return %a;
} //destructor is called on a, setting x to 5

void main() {
  Foo^ a = Test();
  GC::Collect(); //Force a garbage collection to show a is not flagged for destruction
  Console::WriteLine(a->x);
}

//Output
5

There's one more unique aspect to value types that meet a specific criteria. If the object does not reference the GC heap (no handles), it can be initialized as a native object.

MC++

//Remember, class and struct only determine default member access
//It's the 'value' that determines value-type for managed objects
value class Foo { };
value struct Foo2 { };
value struct Foo3 { Foo^ f; };

void main() {
  Foo* f = new Foo();
  Foo2* f2 = new Foo2();
  Foo3* f3 = new Foo3(); //ERROR: 'new' can only be used with simple value types
}

So here's a quick rundown of all the available mixes of class, struct, ref, and value along with the various ways of initializing them.

MC++

//Native types
class NativeClass { };
struct NativeStruct { };

//Managed types
ref class ManagedRefClass { };
ref struct ManagedRefStruct { }; 
value class ManagedValueClass { };
value struct ManagedValueStruct { };

void main() {
  NativeClass a;
  NativeClass* b = new NativeClass();
  NativeStruct c;
  NativeStruct* d = new NativeStruct();

  ManagedRefClass e; //Looks like a value-type, but isn't
  ManagedRefClass^ f = gcnew ManagedRefClass();
  ManagedRefStruct g; //Looks like a value-type, but isn't
  ManagedRefStruct^ h = gcnew ManagedRefStruct();

  ManagedValueClass i;
  ManagedValueClass^ j = gcnew ManagedValueClass(); //Boxed
  ManagedValueClass* k = new ManagedValueClass(); //No GC heap references
  ManagedValueStruct l;
  ManagedValueStruct^ m = gcnew ManagedValueStruct(); //Boxed
  ManagedValueStruct* n = new ManagedValueStruct(); //No GC heap references
}

My best-practice advice to avoid confusion with managed objects would be to use ref class and value struct only and instantiate them in the standard manner for their type to avoid the earlier unique behavior for ref class with value semantics and to avoid unnecessary boxing for value struct with reference semantics. At the end of the day it's up to you though! Certainly a lot of options.

Pointers, Handles, and References

If you've gotten this far, you may be wondering why I'd be talking about these. You'd have to know about pointers and handles to get this far, right? I assumed you knew what pointers and handles were used for but now I want to explain why they both exist.

Handles (^) are essentially pointers to the GC heap. As explained earlier, the GC heap is both a garbage-collecting and compacting heap. This means allocated objects are being moved around and any references to them need updating. This is the special function of handles. They're automatically updated by the CLR when objects are moved in the GC heap. The caveat to this, however, is that handles can only be updated if the CLR is aware of the handle. The CLR is not aware of handles not registered with an AppDomain. This is why value types instantiated as a native object can not hold handles.

Pointers (*) are just like in C++. They point to a memory address - native heap or stack. Since the native heap doesn't move objects pointers have no need to be updated. I'd advise against returning memory addresses to the stack, however, as they will point to invalid memory as shown in multiple examples earlier. Being able to hold stack addresses means that ^* is a valid combination though.

MC++

ref struct Foo { 
  Foo(int v) :x(v) { }
  int x; 
};

struct Foo2 {
  Foo^* g;
};

void main() {
  Foo^ f = gcnew Foo(5);
  Foo2 f2;
  f2.g = &f;
  Console::WriteLine((*f2.g)->x); //Using it is a bit unruly though, kinda like
                                  //pre-arrow-syntax pointers
}

In the last part of this section, I'd like to touch briefly on a topic of much confusion in general - references and tracking references. Both references (&) and tracking references (%) can be thought of as aliases. They represent another name for the value referenced.

The question I see the most is "Well why not just use a pointer or handle?" I'll demonstrate with an example.

MC++

ref class Foo { };

void main() {
  Foo foo;
  Foo^ fooHandle = gcnew Foo(); 
  Foo% fooTrackingRef = foo; //create a tracking reference to foo
  Foo^% fooHandleTrackingRef = fooHandle; //create a tracking reference to fooHandle

  Foo% nullReference; //ERROR, initialization required
  Foo^% nullHandleTrackingRef = nullptr; //ERROR, can't be null
}

Both references and tracking references operate mostly the same, the difference being the same as pointers and handles - tracking references are automatically updated by the CLR if objects move. As you can see, both references and tracking references are not as flexible as pointers and handles. This is important because it can give clarity to your code's intent while preventing some bugs.

MC++

ref class Foo {
  public: int x = 1;
};

void Test(Foo^ temp) {
  temp = gcnew Foo();
  temp->x = 10;
} 

void Test2(Foo^% temp) {
  temp = gcnew Foo();
  temp->x = 10;
}

void Test3(Foo% temp) {
  temp.x = 5;
}

void main() {
  Foo^ f = gcnew Foo();
  Test(f); //f still points to the same Foo, f->x will return 1
  Test2(f); //f now points to the new Foo, f->x will return 10
  Test(nullptr); //Works fine, but might not be what was intended
  Test2(nullptr); //ERROR
  Test3(*f); //f->x returns 5 
}

The above happens because when you use a reference, it effectively is the object it was assigned. In Test2, temp = gcnew Foo() is equivalent to f = gcnew Foo() since temp acts as not just a copy of the f handle but as the handle itself.

As a final note, a tracking reference to a ref type object instantiated with value semantics does not behave the same way as the base object. The destructor is not called when leaving the scope of the tracking reference. This is also true for both tracking and normal references to standard value types.

MC++

ref struct Foo {
  Foo(int v) : x(v) { }
  ~Foo() { x = 2; }
  int x;
};

void InnerTest(Foo% f) { } //Doesn't cause a destructor call
Foo^ Test() { 
  Foo f(5);
  InnerTest(f);
  Console::WriteLine(f.x);
  return %f; //Prevents de-allocation of f but still calls destructor
}

void main() {
  Foo^ f = Test();
  Console::WriteLine(f->x);
}

//Output 
5
2

Mixed-mode Objects

This is where it all comes together. A mixed-mode object is a native or managed object which contains managed or native data, respectively. Little did you know, you've already learned a lot about what a valid mixed-mode object can contain and why. Let's dive right into it.

MC++

//Forward declarations
value struct Test;
ref struct Foo;

//Native types
class NClass { };
struct NStruct {
  char c;
  char* chr;
  NStruct ns; //ERROR, incomplete type not allowed
  NStruct* nsp;
  NClass nc;
  NClass ncp;
  Test t;    
  Test* tp;
  Test^ th; //ERROR, can't contain handles
  Foo f; //ERROR, can't contain ref or interface type objects
  Foo* fp; //ERROR, pointers can't reference GC heap
  Foo^ fh; //ERROR, can't contain handles
};

//Managed types
value struct Test { };
ref struct Foo {
  char c; 
  char* chr;
  NStruct ns; //ERROR, can't contain native object
  NStruct* nsp;
  NClass nc; //ERROR, can't contain native object
  NClass* ncp;
  Test t;
  Test* tp;
  Test^ th;
  Foo f; //ERROR, incomplete type not allowed
  Foo* fp; //ERROR, normal pointer to ref or interface type not allowed
  Foo^ fh;
};

So first off, the only reason both a native class and struct were included since the only difference is default member access is to not only re-iterate that point (as shown in Foo) but to demonstrate the "incomplete type not allowed" error. This error is caused because in order to create an object of type NStruct, you must then create an object of type NStruct, leading to infinite constructor recursion.

To summarize, both native and managed types can contain simple types such as char and int. Native types can not contain any managed types including handles, ref types, interface types, or any object which references the GC heap. The exception is simple value types which do not violate any of the previous rules. Managed types can not contain native objects directly. Neither can contain any objects/types which violate rules from previous sections such as a pointer to a GC heap object.

Viola. A topic so misunderstood and causes so much confusion is pretty straightforward when you understand the underlying reasons why things are valid or invalid in the context of how managed and native code works.

Destructor and Finalizer

Regardless of if you come from a C++ or C# background, this topic can cause some head-scratching because the implementation in C++/CLI is an amalgamation of both languages' methods and a lot of hidden sorcery in the background.

So if you come from C#, this is basically a C++ syntax implementation (with one new, nifty addition) of the disposable pattern with which you should be intimately familiar. If you come from C++, don't worry, it'll make sense if you've made it this far! So in C#, the disposable pattern looks like this:

public class Foo : IDisposable {
  ~Foo() { //Finalizer - non-deterministic
    Dispose(false);
  }

  //IDisposable implementation
  private bool _disposed = false;

  public void Dispose() { //Equivalent of a destructor from C++ - deterministic
    Dispose(true);
    GC.SuppressFinalize(this); //No need for finalizer to be called by GC since we cleaned
                               //the object up ourselves.
  }

  protected virtual void Dispose(bool disposing) {
    if (_disposed) return;
    if (disposing) {
      //clean up managed resources that implement IDisposable
    }
    //clean up unmanaged resources
    _disposed = true;
    //base.Dispose(disposing) if class Foo's parent is IDisposable
  }
}

This pattern is useful when you want to explicitly clean-up expensive resources. Specifically when you handle native resources to avoid memory leaks. Why not just clean-up everything you can deterministically instead of waiting for the GC to get around to destroying the flagged object by calling its finalizer? That's why this pattern is awesome.

So with C++/CLI having a GC heap, you need a finalizer for non-deterministic cleanup yet you'd also like the dispose pattern for deterministic clean-up. Well, this is where the C++/CLI destructor and finalizer syntax comes into play.

MC++

ref class Foo {
  ~Foo() { //Destructor - deterministic
    if (_disposed) return;
    //Clean-up managed resources
    this->!Foo();
    _disposed = true;
    //GC.SuppressFinalize(this) is automatically added here
    //Base destructor is automatically called too if needed
  }

  !Foo() { //Finalizer - non-deterministic when called by GC
    //Clean-up unmanaged resources
  }

  bool _disposed = false;
}

Wow, that is a lot cleaner in my opinion. Since you can directly call a finalizer in C++/CLI (you can't in C#) we no longer need the Dispose(bool) helper function, plus the GC.SuppressFinalize(this) and base destructor call are automatically handled for us. You may have guessed this already, but this C++/CLI syntax is converted into a dispose pattern automatically for you. It's a shorthand of sorts. Why have we not gotten this yet for C#? I cry everytime.

As a final note, destructors are protected by default (and this can not be changed) and therefore can not be directly called. There are two cases in which a destructor is called - if the object is destroyed by scope or if delete is used.

Marshalling

If you search "string to char*" in google you'll see pages of questions. Marshalling can be a complex topic and I won't go into much detail here because it warrants its own article. All I want to do is show you a simple way to marshal string representations in C++/CLI (also applies to VC++).

You've no doubt seen various ways of marshalling System::String to a char*. Using a pinned pointer might look like this:

MC++

#include <stdio.h>
#include <stdlib.h>
#include <vcclr.h>

void main() {
  String^ mdata = gcnew String("Test");
  char* udata;

  pin_ptr<const wchar_t> ptr = PtrToStringChars(mdata);
  size_t convertedChars = 0;
  size_t sizeInBytes = (mdata->Length + 1) * 2;
  udata = new char[sizeInBytes];
  wcstombs_s(&convertedChars, udata, sizeInBytes, ptr, _TRUNCATE);
  printf("%s", udata); //Output: Test
  delete[] udata;
}

Or StringToHGlobalAnsi():

MC++

#include <stdio.h>
#include <string.h>

void main() {
  String^ mdata = gcnew String("Test");
  char* udata;

  IntPtr strPtr = Marshal::StringToHGlobalAnsi(mdata);
  char* wchPtr = static_cast<char*>(strPtr.ToPointer());
  size_t sizeInBytes = mdata->Length + 1;
  udata = new char[sizeInBytes];
  strncpy_s(udata, sizeInBytes, wchPtr, _TRUNCATE);
  printf("%s", udata); //Output: Test
  delete[] udata;
  wchPtr = null;
  Marshal::FreeHGlobal(strPtr);
}

There are a couple problems with these in my opinion. First, much of the time you probably won't need the level of control these methods provide. Second, they provide ample opportunity for errors. For example incorrectly setting the sizeInBytes or using delete on the IntPtr created by StringToHGlobalAnsi() which will cause errors in debug mode yet work completely fine in release mode.

Most of the time we just want a simple conversion. This is where marshal_as<T> comes to save the day.

MC++

#include <stdio.h>
#include <msclr\marshal.h>

using namespace msclr::interop;

void main() {
  String^ mdata = gcnew String("Test");
  marshal_context context;
  //A const_cast<T> to a char* isn't necessary here, const char* is fine with printf.
  //Just wanted to show how to accomplish this.
  char* udata = const_cast<char*>(context.marshal_as<const char*>(mdata));
  printf("%s\n", udata);
  //Convert back
  Console::WriteLine(marshal_as<String^>(udata));
}
//Output
Test
Test

That looks a lot better to me. If you're wondering why converting back to String^ didn't require a marshal_context, I'll just quote the MSDN since I can't word it any better:

Quote: MSDN

Marshaling requires a context only when you marshal from managed to native data types and the native type you are converting to does not have a destructor for automatic clean up. The marshaling context destroys the allocated native data type in its destructor. Therefore, conversions that require a context will be valid only until the context is deleted. To save any marshaled values, you must copy the values to your own variables.

This is one instance where I like using value semantics with a ref type (marshal_context). This ensures the context will be cleaned up when it leaves scope. Since there's no interest in the context object itself - only the data - there shouldn't be a situation where any of the side-effects of instantiating in this manner would matter. Instantiating in the normal manner is perfectly valid though.

The list of available conversions is basically any representation of a string you can imagine plus a few extras. As someone that prefers to use std::string instead of char* if I'm not interoping with C code this is pretty awesome.

Bonus Topic: auto_gcroot

Thanks for reading! Now I'll show you a little "cheat" not covered in the mixed-mode object section that just came to mind. Normally a native object can't contain a reference to a managed object. However, there's a magic tool called auto_gcroot<T> which uses a System::Runtime::InteropServices::GCHandle to allow unmanaged memory to access a managed object.

MC++

//Needed for auto_gcroot
#include <msclr\auto_gcroot.h>
using namespace msclr;

ref struct Box { 
  Box(int v) :x(v) { } 
  int x; 
};

struct Foo {
  Foo(int v) :box(gcnew Box(v)) { }
  auto_gcroot<Box^> box;
}

void main() {
  Foo f(5);
  Console::WriteLine(f.box->x);
}

//Output
5

GCHandle works by directly adding a handle to the current AppDomain. You can then retrieve an IntPtr to this handle, call ToPointer() on it, and cast the returned value to the correct pointer type. auto_gcroot<T> encapsulates all the code required for this while ensuring Free() is called on the GCHandle. If Free() is not called this will result in a memory leak since objects created using GCHandle are only freed for garbage collection if Free() is called or the AppDomain is destroyed.

Final Thoughts

If you want to learn more about the native side of C++/CLI check out information regarding Visual C++. The managed side is usually covered in C++/CLI guides. The least documented portion in my research has been the hazy middle-ground where they come together and interact. It comes as no surprise this is the area where people have the most issues so this is where this article tries to target.

If you see any errors or would like to see a topic I missed covered in an updated article, please leave a comment. Any corrections will get full credit. Hope you enjoyed the read since it was definitely fun writing it! Cheers!

History

10/24/2016: Initial release.

10/28/2016:

Added explanation for auto_gcroot<T>/GCHandle.
Added additional note for tracking references and ref types in Pointers, Handles, and References.
Fixed an error in Pointers, Handles, and References that incorrectly stated you cannot use the address-of operator on handles. Added explanation to section.
Added topic Marshalling.

2/5/2017: Updated Memory Management section. As Zodiacon pointed out in the comments, the CLR GC is a reference tracking, not reference counting, GC.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

C++/CLI: Under the Hood

Introduction

Glossary

Contents

/CLR - Mixed, Pure, and Safe

Memory Management

Reference and Value Types

Class, Struct, Ref, and Value

Pointers, Handles, and References

Mixed-mode Objects

Destructor and Finalizer

Marshalling

Bonus Topic: auto_gcroot

Final Thoughts

History

License