Introduction
Recently I was assigned a task to trace some memory leaks/crashes caused by .NET/COM interop. And I think you will understand what a tough task it is if you have been through it.
After Googling articles, reading them, doing some small trials, I've got some of my own thoughts/understandings. Things that were so daunting to me have now began to makes sense. I want to post them here and share with those who are still struggling.
Background
The general scenario is like this:
- We create a COM object using ATL (it's fine if you don't want to utilize ATL and do everything by yourself, but this will be a very tough task). We define those interfaces in IDL files, and we implement them using C++. Then a COM server is done.
- We want to use them in .NET. We use tlbimp.exe to generate the RCW (which are actually the generated DLL files).
- We begin to consume the COM object by creating the COM object instance and invoking its methods.
The problem we are facing and trying to resolve is:
- After the .NET application is done,, some COM objects are not released/disposed/destructed as expected; thus we have memory leaks.
Or/And:
- Sometimes we begin to have an exception thrown by the .NET Framework saying "xxx can't be used because the underlying COM object has been separated".
Main Content
Before the exploration starts, I want to stress the fact that always remember there are three layers in case of .NET/COM interoperation: your .NET code, the RCW, and the native COM server. You have full control of your .NET code, but there are lots of rules/disciplines living in the RCW/COM world which you have to understand and follow. Otherwise you'll hit the troubles listed in the above section.
To understand those rules/disciplines in COM & RCW world, I'm going to use some sample code.In this example, I defined 2 COM objects, one is "CQTTest" , representing a test object, the other one is "CQTAction", representing an action object. A test object can contain a lot of action objects, among which there is an active one. Both Test and Action have a method named "Run" (If you have got some experience using Quick Test Professional, you know what i'm talking about). Those idl files and implemementation code are shown as below:
import "oaidl.idl";
import "ocidl.idl";
[
object,
uuid(425E8992-D4C3-4054-9307-4E3AD0C088F4),
helpstring("IQTAction Interface"),
pointer_default(unique)
]
interface IQTAction : IUnknown{
[helpstring("method Run")] HRESULT Run(void);
};
[
object,
uuid(D796C6FC-4DAE-4222-ACB9-F0DAAF53F4C8),
helpstring("IQTTest Interface"),
pointer_default(unique)
]
interface IQTTest : IUnknown{
[helpstring("method GetActiveAction")] HRESULT GetActiveAction([out] IQTAction** pActionOut);
[helpstring("method Run")] HRESULT Run(void);
[helpstring("Run a specific action")] HRESULT RunAction([in] IQTAction* pActionIn);
};
[
uuid(D56E2107-DCF0-4039-9127-7B5B89EFF4C2),
version(1.0),
helpstring("COMServer 1.0 Type Library")
]
library COMServerLib
{
importlib("stdole2.tlb");
[
uuid(C3FA9039-39F2-411F-8555-8063EF07E17A),
helpstring("QTTest Class")
]
coclass QTTest
{
[default] interface IQTTest;
};
[
uuid(EFF340AC-86DE-4127-B20B-4B96B64C5064),
helpstring("QTAction Class")
]
coclass QTAction
{
[default] interface IQTAction;
};
};
// CQTTest.h
class ATL_NO_VTABLE CQTTest :
public CComObjectRootEx<CComSingleThreadModel>,
public CComCoClass<CQTTest, &CLSID_QTTest>,
public IQTTest
{
public:
CQTTest()
{
m_pActiveAction.CoCreateInstance(CLSID_QTAction); //construct an action;
}
DECLARE_REGISTRY_RESOURCEID(IDR_QTTEST)
DECLARE_NOT_AGGREGATABLE(CQTTest)
BEGIN_COM_MAP(CQTTest)
COM_INTERFACE_ENTRY(IQTTest)
END_COM_MAP()
DECLARE_PROTECT_FINAL_CONSTRUCT()
HRESULT FinalConstruct()
{
return S_OK;
}
void FinalRelease()
{
::MessageBox(NULL,L"#################QTTest is being disposed",L"info",0);
}
public:
STDMETHOD(GetActiveAction)(IQTAction** pActionOut);
STDMETHOD(Run)(void);
STDMETHOD(RunAction)(/*in*/IQTAction* pActionIn);
private:
CComPtr<IQTAction> m_pActiveAction;
public:
};
OBJECT_ENTRY_AUTO(__uuidof(QTTest), CQTTest)
//CQTTest.cpp
STDMETHODIMP CQTTest::GetActiveAction(IQTAction** pActionOut)
{
*pActionOut = m_pActiveAction;
(*pActionOut)->AddRef(); // this line is *very important*
return S_OK;
}
STDMETHODIMP CQTTest::RunAction(IQTAction * pActionIn)
{
pActionIn->Run();
return S_OK;
}
and my action object:
// CQTAction.h
class ATL_NO_VTABLE CQTAction :
public CComObjectRootEx<CComSingleThreadModel>,
public CComCoClass<CQTAction, &CLSID_QTAction>,
public IQTAction
{
public:
CQTAction()
{
}
DECLARE_REGISTRY_RESOURCEID(IDR_QTACTION)
DECLARE_NOT_AGGREGATABLE(CQTAction)
BEGIN_COM_MAP(CQTAction)
COM_INTERFACE_ENTRY(IQTAction)
END_COM_MAP()
DECLARE_PROTECT_FINAL_CONSTRUCT()
HRESULT FinalConstruct()
{
return S_OK;
}
void FinalRelease()
{
::MessageBox(NULL,L"********************************** QTAction is being disposed",L"info",0);
}
public:
STDMETHOD(Run)(void);
};
OBJECT_ENTRY_AUTO(__uuidof(QTAction), CQTAction)
// CQTAction.cpp
STDMETHODIMP CQTAction::Run(void)
{
::MessageBox(NULL,L"Action is running...",L"info",0);
return S_OK;
}
Note that I prompt some message box within ::FinalRelease() method to indicate that the com object is being disposed. This is the way how we can confirm no memory leak happens.
To understand how RCW interact with COM, first take a look at how native C++ talks with COM. Here is a small example:
#include "stdAfx.h"
#include "COMServer_i.h"
#include "COMServer_i.c"
void RunActiveAction(IQTTest * pTest)
{
CComPtr<IQTAction> pAction;
pTest->GetActiveAction(&pAction);
pTest->RunAction(pAction);
}
int main()
{
CoInitializeEx( NULL, COINIT_APARTMENTTHREADED );
CComPtr<IQTTest> pTest;
HRESULT hr = ::CoCreateInstance(CLSID_QTTest,NULL,CLSCTX_INPROC_SERVER,IID_IQTTest,(void **)&pTest);
ATLASSERT(SUCCEEDED(hr));
RunActiveAction((IQTTest*)pTest);
}
Let's call it our 'NativeComConsumer'. Run it and we are supposed to see message boxes saying
1. Action is running;
2. Test object is being disposed;
3.Action Object is being disposed;
which is exactly what we have expected.
Take a look at RunActiveAction, we use a CComPtr instance to hold the reference to the active action, we increase the reference count of the active action inside CQTTest::GetActiveAction, we decrease the reference count of active action when the CComPtr gets destructed when this method invoke is done. So after the call to RunActiveAction returns, the undering reference count of active action remains one, which is the same as when the QTAction object is initially created. This shows basical COM coding rule: You need to increase the reference count of the object if you want to pass that com object out.
If we don't follow this rule, say, we don't have the
(*pActionOut)->AddRef();
inside CQTTest::GetActiveAction, what will happen?
well it totally depends on how client consumes it. Use the 'NativeComConsumer' as example, since we didn't call CQTAction::AddRef() in either COM Server side or COM Client side, , while we still call that active action's CQTAction::Release() method when RunActiveAction returns (remember that when this method returns, the pAction will call ~CComPtr which release the reference count of the com obj which it points to), so soon after this RunActiveAction call returns, the active action com object in our test object will be disposed. we can confirm this by seeing that messagebox saying action is bing destroyed. So in this case our application will crash because when CQTTest is being disposed, it will destroy its m_pActiveAction which is also trying to release active action.
Of course we can explictly call the CQTAction::AddRef in client side. i.e. in RunActiveAction call, we add
(*pAction)->AddRef();
this can also fix the crash issue, *but* , it doesn't make too much sense because now the COM Server is basically of no use. We can't expect each of our COM client to know that we have done something wrong thus you need to do some extra things to fix it. And besides, when interoperating with Scripted Lanugages like Paython, VBScript, JavaScript etc, COM client don't have the ability to do it. That's why it's so important to follow this rule when coding COM.
Doing COM Interoperation in .NET is somewhat like doing it in Paython, VBScript or JS. You don't have too much control to the real com object like you do in Native C++. .NET framework has alreay helped you wrapped the com object to a wrapper, and you are supposed to dealing with that wrapper in your code. The wrapper is named RCW.
RCW acts like a proxy to help your .NET code commnuicate with COM Server. As a .NET developer, we deal with RCW instead of real COM objects. Microsoft forces developers to do this to prevent us from making memory mistakes. But even though, we make mistakes if we don't know the exact way RCW works. Here are some of the most basic rules:
- For each COM object, there is only one corresponding RCW instance. But the reference count it holds to underlying COM object can be more than one!
- .NET framework maintains RCW in a similar way as COM maitains its com objects. They both use reference count to indicate how many clients are using me. If there are multiple places this RCW is used, then the RCW ref count will be increased, but the COM object refer count keeps the same.
- All references to COM Object that RCW is holding, will be released when the RCW ref count hits zero, or RCW gets GC collected.
So we control the COM object ref count by dealing with RCW ref count. And if we do it wrong, the underlying COM object may still keep alive when we hope it has been released (memory leak) or , it has been released when we hope it's still alive (crash in some cases, or that .net exception saying underying com object has been seperated..etc).
1. When will RCW Ref Count being Increased?
The RCW reference count will be increased by .NET framework when it thinks it's needed; We have no control over this at all. All we need to know is that in what kind of situation .NET will increase the RCW ref count. Here is a good answer i found in stackoverflow forum:
<cite author= "Arthur">
Short answer: Every time the COM object is passed from COM environment to .NET.
Long answer:
- For each COM object there is one RCW object [Test 1] [Ref 4]
- Reference count is incremented each time the object is requested from within COM object (calling property or method on COM object that return COM object, the returned COM object reference count will be incremented by one) [Test 1]
- Reference count is not incremented by casting to other COM interfaces of the object or moving the RCW reference around [Test 2]
- Reference count is incremented each time an object is passed as a parameter in event raised by COM [Ref 1]
</cite>
2. When will RCW Ref count being decreased?
We need to manually decreased the RCW reference count by calling Marshal.ReleaseComObject, passing the RCW reference you want to decrease. .NET will NEVER help you do this , i.e. .NET is responsible to increase the count, and you are responsible to decrease the count.
Why? Why the designer of .NET frx doesn't decrease the RCW Ref count when ,,, say, a RCW reference is out of scope? Like we did in COM world (Remember in the "NativeCOMConsumer" example, each CComPtr will release the underlying com object when it is out of scope). Why does .NET deal with RCW using such a differnt strategy?
void RunActiveAction(IQTTest * pTest)
{
CComPtr<IQTAction> pAction;
pTest->GetActiveAction(&pAction);
pTest->RunAction(pAction);
}
in .NET,
static void RunActiveAction(IQTTest pTestRCW)
{
IQTAction pActionRCW;
pTestRCW.GetActiveAction(out pActionRCW);
pTestRCW.RunAction(pActionRCW);
}
thus it's important for us to call
Marshal.FinalReleaseComObject(pActionRCW);
in above .NET example.
The reason underneath, is like Ian Griffiths said
1. COM assumes you won’t be holding onto the object reference when the method returns;
2. The RCW assumes you *will* be holding onto the object reference when the method returns.
Acknowledgement
below 2 articles/posts are extremely helpful for me to understand the trick behind: