Introduction
A project that I carried out for a client took an unexpected turn which left me, a software engineer whose main experience is coding JAVA, facing the challenge to make an application that would focus on C++, C# and COM Interop. I have some experience in C++ and C#, but COM Interop was uncharted terrain, and I was at the time thus blissfully unaware of terms such as 'DLL hell' and definitions with similar connotations. Glancing the various COM blogs on the Internet didn't prepare me for the issues I was to face only days later, for yeah, there were some issues you'd have to take in account, but hey.. just work with this great bit of code and your Office automation would work like a charm! The example code provided with the stories looked simple enough, so the promise of exploiting the power of office automation seemed appealing.
I am now three months further and wiser and no, I do not regret the choice to plunge into a project that would require a connection between unmanaged and managed code, speckled with COM calls not only to Microsoft Office applications, but also with proprietary software. However, I would have liked to know all the things I know now before I started and, considering the many forums and newsgroups I have had to work through in order to find answers to the many vague errors and exceptions I had to deal with, I have decided to write this article that may help others to circumvent all the pits I happily jumped into. I did COM Interop the hard way and hopefully this article may prevent other newbies following in my stead.
The Problem
The project I was hired for, aimed to provide simple reporting functionality in Microsoft Word for an existing proprietary application. The application is a mathematical modelling tool (a bit like Mat lab), but my client missed functionality to export the graphs, tables and bitmaps to a Word processor. The application supported a plug-in structure that consisted of a subdirectory with a number of DLLs that conformed to a certain structure. My reporting tool would therefore be implemented as an additional plug-in that would add a menu item to the application which, upon clicking would open a form that allowed the user to prepare the report and export the required graphs and tables to Microsoft Word. As an additional advantage, the application provided a type library that allowed it work as a COM server. This server was quite extended and so this would allow me fine-grained control over the information that was contained in the application. Of course COM Interop was also the preferred choice on the side of Office automation, so the global architecture was quickly determined:
Figure 1: Global Architecture of the Reporting Tool
The first setback in this simple plan dawned when I couldn't get the plug-in to work in managed C++. The plugin required a number of libraries that caused all kinds of alien compiler and linker errors, so it became clear that this would have to done in unmanaged (i.e. old-fashioned C++) code. As I didn't want to opt out on the neat functionality that is provided in .NET, I decided that I would implement an interface between the unmanaged plug-in and the actual reporting functionality, which would be coded in (managed) .NET, using C#. The interface would be as simple as possible and would consist of a command structure (strings) that would make requests to the reporting tool. This tool would itself be implemented as a COM object that would coordinate the calls to Microsoft Office and the application COM server. The application therefore would consist of a unmanaged DLL (the plugin) and a managed DLL (assembly) that provided the actual reporting functionality. The following sections describes the various issues that I have dealt with in order to make this work, including all the vague errors and exceptions that are related to connecting the various parts together. This includes deployment issues on the client's target computer that ran on a different Windows version and used a different version of Microsoft Office.
1.1. Headache 1: Unmanaged and Managed Code
The first step of my project would be to implement a very simple interface that would pass the menu event to the managed environment in order to open a slick form. The first potential pitfall I managed to circumvent by buying a good book on COM Interoperability, that included a good discussion on combining managed and unmanaged code. Andrew Troelsen's "COM and .NET Interoperability", was generally highly recommended and indeed proved to be a valuable aid, especially since the forum articles on the Internet (for instance in the Code Project) were usually targeted for very specific applications. The general tendency of these articles were very optimistic; do this, do that and 'It Just Works'.
The first issue that one needs to grasp is that developing an interface between managed and unmanaged domains starts in the managed domain. This is easy enough, as this means defining an interface IMyInterface
in .NET (using, say C#) and adding attributes that allow the methods to be registered with COM:
namespace MyNamespace
{
[Guid("D4660088-308E-49fb-AB1A-72724F3F8F51")]
[ComVisible(true)]
[InterfaceType(ComInterfaceType.InterfaceIsIUnknown)]
public interface IMyInterface
{
void openForm();
}
[ClassInterface(ClassInterfaceType.None)]
[Guid("46A951AC-C2D9-48e0-97BE-9F13C9E70B65")]
[ComVisible(true)]
public class MyImplementingClass : IMyInterface
{
ManagedForm form;
public MyImplementingClass ()
{
}
public void openForm()
{
if (this.form != null)
return;
try{
this.form = new ManagedForm();
this.form.Disposed += new EventHandler(form_Disposed);
this.form.Show();
}
catch(Exception ex ){
MessageBox.Show( ex.Message + "\n" + ex.StackTrace );
}
}
void form_Disposed(object sender, EventArgs e)
{
this.form.Disposed -= new EventHandler(form_Disposed);
this.form = null;
}
}
}
Snippet 1: Example Interface and Implementation
As I am focusing on the pitfalls, I will not explain the code or the various COM attributes, as there are loads of articles on COM Interop on the Internet. The most important attribute is the so-called 'guid
', which has to be a unique id that is used to register the interface (and its implementations) in the Windows registry. These are usually copied and pasted from the example code you base your implementation on. I usually swap four random digits in order to prevent the unlikely chance that existing DLLs are also based on the same sample code I use. There also used to be a tool provided by Microsoft (I believe it was shipped with older versions of Visual Studio), called something like guid.exe that created a unique GUID
, but I found it difficult to find it (actually it is guidgen.exe, see replies below...that's why it was so hard to find...). Besides this, the swapping strategy is, although not a recommended approach, fairly secure if creating COM libraries is not a regular activity. If the interface is in a C# assembly of the type class library (project properties => application in Visual Studio, the solution should build a DLL that can be accessed by others� in theory.
The first problem one runs into is the question how the DLL will make its existence known to those other DLLs. One option (and the recommended one) is to 'upgrade' your DLL to become a full-fledged COM object itself. As the combination of managed code and GUID
s already has taken most of the necessary steps to achieve this, the only additional step required is to register your DLL to the Global Assembly Cache (GAC) of Windows. Yes�, this is a folder in Windows, but no, I will not try to explain where it is, for the simple reason that it would give people wrong ideas on how registering should occur. Instead, rely on the gacutil.exe tool that is provided with Visual Studio (e.g. c:\Program Files\Microsoft Visual Studio 8\SDK\bin\gacutil.exe). After setting the correct paths, open a command prompt and enter:
gacutil /i MyLibrary.dll /f
and your assembly is added to the cache. Piece of cake, huh? Well�the problems have now started.
One of the mantras of software development is 'thin coupling' and COM Interop is a nice example of an attempt to create thin coupling between different software libraries. Most of us experienced programmers will be enthusiastic advocates of this design principle, but we often forget that there is an interesting bifurcation point between thin coupling and no coupling whatsoever! Consider yourself blessed when you register your DLL to GAC and you get strange and unexpected behaviour, for at least you are seeing something happening! The chances are much higher that you will register and see nothing at all. There are a number of pitfalls at this point:
- Do not trust the 'register for COM Interop' option (project properties=>build). Although it is good to check this option, I have seen it fail to register assemblies quite often. Gacutil is more reliable (the
/f
option, in the previous example forces a new DLL to override any possible DLL with the same guid
that is currently in GAC, which is very useful during the development stage). But even there, an existing DLL may be 'stuck' in GAC if for instance, it is being used by another application. Gacutil will not always notify a failure if this is the case, giving you the false impression that all went well.
- Use the ComVisible attribute. Snippet one shows the use of
ComVisible
in both the interface definition as the implementation. There is an awful lot of confusion about this attribute, but the fact of the matter is that newer versions of Visual Studio set this attribute 'false
' by default in the AssembyInfo.cs file in your project's Properties folder. The result is that your interface is not exposed to the other DLLs if you don't add the ComVisible
attribute to your interface.
- Use exceptions extensively in your methods. If an exception occurs when a DLL calls your interface's methods, this will not be routed to the default logs, so adding a system trace or, as in snippet 1, showing a message box will be a great help in pinpointing exceptions in managed code.
- Remember to sign your code (project properties=>signing). This prevents some runtime errors that warn you that the DLL cannot be used.
TIP
When developing a class library assembly, it may be a good idea to add a second test project (for instance a windows or console application project) to your solution that calls the interface you are developing:
static class Program
{
[STAThread]
[SecurityPermission(SecurityAction.Demand,
UnmanagedCode = true)]
static void Main()
{
Application.EnableVisualStyles();
Application.SetCompatibleTextRenderingDefault(false);
Application.Run( new MyTestApplication.TestInterfaceForm() );
}
When running this code in Visual Studio, the following warning is an indication that the newly built assembly has not been added to the GAC (yet):
Run the previous gacutil after building your project (from the debug / release directory of your project, or any other location where the most recent assembly is located) and the warning message will go away. Ignoring this will usually (not always) find you running a previous assembly.
The oleview.exe tool that is also provided with Visual Studio (e.g. c:\Program Files\Microsoft Visual Studio 8\Common7\tools\bin\oleview.exe) can help in checking whether the DLL was added to GAC successfully.
Headache 2: Integrating the Managed DLL and the Unmanaged Environment
The assembly is now ready for use, but the unmanaged code still needs to connect to it. In order to achieve this, the unmanaged code needs to know the structure of the interface that has been made. This can be done by creating a so-called "type library" that corresponds with the implemented interface. The tool
tlbexp.exe (e.g.
C:\Program Files\Microsoft Visual Studio 8\SDK\bin\tlbexp.exe) that is shipped with Visual Studio, or
regasm.exe (
c:\windows\Microsoft.NET\framework\v�\regasm.exe) that is provided with the .NET framework can create type libraries from an existing DLL:
regasm MyLibrary.dll /tlb: MyTypeLibrary.tlb
This will create a type library with the name MyTypeLibrary.tlb in the currently active folder. This often gives rise to problems when an older version of your DLL is already registered with Windows, for then that type library will be used rather than the new one. This can be checked with the 'view typelib' option of oleview, which is represented by the button with three red arrows. As the type library shows the structure of the interface that was implemented, this structure will be displayed in oleview. If there is a mismatch, then it is likely that regasm.exe or tlbexp.exe did not use the newly built DLL. Regasm and tlbexp do not give any useful warning or error messages that indicate a failure to register. The best way to prevent type library problems is to close all the applications that may be connected to your library, and then perform an unregister operation before registering the new DLL:
regasm MyLibrary.dll /u
This obviously only needs to be done when the interface structure is changed, but as by now we are doing quite a bit of typing after every build of our assembly, we can just as well create a batch file (e.g. register.bat) that we call everytime the assembly is built:
regasm MyLibrary.dll /u
regasm MyLibrary.dll /tlb: MyTypeLibrary.tlb
gacutil /i MyLibrary.dll /f
Snippet 2: register.bat
This approach is the best way to ensure that the type library always corresponds with every updated DLL we build. The type library can now be imported in the Visual Studio C++ project.
#ifndef MY_INTERFACE_H
#define MY_INTERFACE_H
class MyInterface
{
public:
long OpenForm();
};
#endif
Snippet 3: mylib.h
#import ".\VC8\managed\MyLibrary.tlb" raw_interfaces_only named_guids
#include "resource.h"
#include "mylib.h"
MyNamespace::IMyInterfacePtr pDotNetCOMPtr;
bool APIENTRY DllMain( HANDLE hModule, DWORD ul_reason_for_call,
LPVOID lpReserved)
{
switch (ul_reason_for_call)
{
case DLL_PROCESS_ATTACH:
MessageBox
(NULL, "Dll is loading!", "DllMain() says...", MB_OK);
break;
case DLL_PROCESS_DETACH:
MessageBox
(NULL, "Dll is UNloading!", "DllMain() says...", MB_OK);
break;
}
return true;
}
long MyInterface::OpenForm( void )
{
CoInitialize(NULL);
HRESULT hRes = pDotNetCOMPtr.CreateInstance
(MyNamespace::CLSID_MyInterface );
if (hRes == S_OK)
hRes = pDotNetCOMPtr->openForm();
pDotNetCOMPtr = NULL;
CoUninitialize ();
return hRes;
}
Snippet 4: mylib.cpp
When this project is built in Visual Studio C++, the type library is converted to a *.tlh file which represents the interface. In the example above, the interface is wrapped in a class that closely resembles the interface. The C++ project needs to be rebuilt every time the structure of the interface changes (in effect, when a new type library is copied to the C++ project). Building this project creates an (unmanaged)DLL that was used as a plug-in by the proprietary application.
With all the tooling and work structures in place, cross-development between managed and unmanaged environments is quite stable and good. On one or two occasions it was necessary to completely remove all references of my library from the Windows registry (using regedit.exe provided by Windows), but as long as I kept to a strict routine of closing all applications that might be connected to my DLL (this obviously includes Office applications when developing Office Automation applications) and consistently using the batch file after building the library, everything went quite good. Sadly it had taken me three weeks to get to that point.
1.1. Headache 3: COM Interop
By now, I had managed to open a .NET form by clicking a menu item from the proprietary application. The next step consisted of developing the reporting functionality, which consisted of COM calls to both Microsoft office applications and the COM interface of the proprietary application. In all honesty, developing this was rather straightforward. There are lots of good examples on the Internet of automating Microsoft Word, and the proprietary application's COM interface worked quite well also. Testing the functionality is a rather slow process, but luckily most calls to Microsoft Office could be tested from the test project, so it didn't require me to continuously open and close applications during testing. The headaches during this phase were therefore minor.
Microsoft's preferred policy to Office automation is to open a new document, worksheet, etc, in which the reporting can be done. The alternative approach, to connect to an already opened document or worksheet is therefore hard to find on the Internet. The System.Runtime.InteropServices.Marshal.GetActiveObject method can be used to achieve this, for instance in a 'connect' method that looks for an open Office document or worksheet, connects to this if it is found, or opens a new application object if not. Alternatively, the 'Disconnect
' method uses Marshal.ReleaseComObject
to release the object. This consistent use of 'connect
' and 'disconnect
' methods with respect to COM Interop improves the development cycle greatly, as the chance becomes much smaller that other applications are connected to the assembly when a new DLL is registered to GAC. I also decided to implement the classes that contained these methods as singletons (one for every COM library), which also greatly reduced the chance of blocking the assembly in GAC.
With the development speed greatly improved, confidence returned that I had made the right decisions. Full of regained confidence I therefore brought my reporting tool to the client for deployment of the application�
1.2. Splitting Headaches: Deployment
The term 'DLL Hell' manifests itself in full glory when deploying an application. To give a rough sketch, I had been developing my application on a Windows XP OS, using Visual Studio 2005 and Microsoft Office 2003. The client used Windows 2000 and Microsoft Office 2000. We both used the same proprietary application. I believe that many a reader who has been through the hell is already smirking at this prospect�
Installing the reporting tool consisted of installing .NET, adding the plug-in to the plug-in directory of the proprietary application and finally adding the reporting tool to GAC. It seemed simple and should be simple, but alas, it wasn't.
This section addresses the deployment along the various vague error messages I encountered. I have seen many software developers struggle with similar messages on various Internet forums and most of them will get replies telling them what the errors mean, without any mention of the cause of these errors. That is, if they get replies at all!
Vague Exception: "Mscorlib80.dll Not Found"
The first vague exception I encountered referred to a mysterious mscorlib80.dll. The cause of this error is that Visual Studio C++ 2005 includes references to a number of libraries that are included in Visual Studio C++ 2005. If this is not available on the client's computer, which normally is the case, it starts begging for these DLLs. Mscorlib80.dll is the most likely library it will request (mscorlib80d.dll will be requested if the application was built in debug mode).
Although there are a number of remedies that are suggested on the Internet, the most practical one is to download the required libraries from Microsoft (for PCs this is vcreditst_x86.exe). This executable copies the required library files to their designated locations, but� It will not work immediately on a pre-Windows XP OS such as Windows 2000.
In the old days, DLLs were added to the Sytem32 folder under the Windows root. With XP, this policy has changed. Instead a WinSxS folder has been included that is the root of a tree structure that contains a number of application specific subfolders. Vcredist.exe adheres to this new convention, which results in nothing happening on a Windows 2000 OS after the installation is complete. Obviously Windows 2000 does not recognise winsxs and so the libraries are not found. The best ways to circumvent this is to either add paths pointing to the newly added folders in winsxs (recommended) or to copy them into the system32 folder. A detailed description of these issues can be found here.
Vague System.AccessViolationException
"Attempted to read or write protected memory. This is often an indication that other memory is corrupt."
This error occurred when a COM call was made to Microsoft Office. After extensive Internet research, it appeared that this error was related to the version of Microsoft Office that was used. Contrary to intuition, Microsoft Office DLLs are not downwards compatible. If one develops an automation project and uses the COM Interop of Microsoft Office products that are newer than that of the target machine (so the references you add in your .NET project point to these newer COM objects), the system is likely to throw the above exception, or similar ones, when your application is deployed. An Office application is upward compatible with respect to COM objects, so newer versions of Microsoft Office will accept automation libraries of older Windows versions (which also means that you are restricted to the available functionality of that older version). It is therefore important that the application you develop uses the COM objects that represent the oldest version of Office that it should support.
An additional complication is that the Microsoft.Office.Interop DLLs are only shipped since Office 2003. With older versions of Microsoft Office, you will have to generate the DLLs yourself using the type libraries that were included in the installation. These libraries have extensions .olb (e.g. excel8.olb, msword9.olb, etc) and the corresponding DLLs can be generated with tlbimp.exe tool that is shipped with Visual Studio (e.g. c:\Program Files\Microsoft Visual Studio 8\SDK\bin\tlbimp.exe):
Tlbimp Excel8.tlb /keyfile=MyApplicationKeyPair.snk
/out:Microsoft.Office.Interop.Excel
Note that in this case, a signed DLL is made using the information found in MyApplicationKeyPair.snk file. This is due to the fact that this DLL is used by .NET, which requires signed (or strongly typed) libraries. The required DLLs can be created this way. Remember that these DLLs all have to be added to GAC on the target computer. It is also likely that source code needs to be changed after adding these DLL to the .NET project as some COM calls may have changed in newer versions.
Vague Exception: "The Type Initializer for � threw an Exception"
Yes, this exception is actually raised in the managed environment by an exception that you have programmed to catch somewhere. You may even pinpoint this exception to a line of code in your C# project. If you do you will probably notice that the application is trying to make a call to one of those libraries we just made from the type libraries.
The exception is actually raised when the libraries have not been registered in GAC on the target computer or, less likely, when they need to be updated. This is hardly likely to occur for the Office DLLs, but the proprietary application I used was a COM server, and so the .NET project created a new Interop.ProprietaryApplication.dll every time the project was rebuilt. I had assumed that the proprietary software had registered its COM server in GAC when the application was installed on the target machine, but actually only the type library had been registered in the Windows registry.
This mistake became transparent when I first started working with the Microsoft Office 2000 DLLs I had just created and forgot to register one of them. Suddenly I got the exception that had been nagging me for a few days (but which at the time wasn't high on my priority list) with another library and for which I immediately knew what was wrong.
It goes to show that sometimes sloppiness pays off, I guess.
The lesson that is learnt here is to remember to register all the Interop libraries your application needs with the target machine's GAC and to update them on the rare occasion that this could be required. As this does not need be done often (usually once per target machine), I decided to create a batch file called install.bat that basically is the same as register.bat but then with a number of additional calls to Gacutil.exe:
regasm MyLibrary.dll /u
regasm MyLibrary.dll /tlb: MyTypeLibrary.tlb
gacutil /i MyLibrary.dll /f
gacutil /i Interop.ProprietaryApplication.dll /f
gacutil /i Microsoft.Office.Interop.Word.dll /f
gacutil /i Microsoft.Office.Interop.Excel.dll /f
gacutil /i Microsoft.Office.Interop.PowerPoint.dll /f
And with this I finally got everything running the way it should be�, with a four week delay on my original estimates.
Final Remarks
I usually work in a JAVA environment, and therefore I can imagine that very experienced .NET and COM programmers may frown at some of the explanations that are given here, or the solutions that I came up with. I can also imagine that other programmers who were facing the same daunting journey through DLL hell may have additional problems that have not been described here.. I have no pretence or ambition to be a .NET or COM expert, in fact this article reflects the issues of someone who faced COM interoperability for the first time, with very little experience in that area, and found himself facing an enormous gap between the 'It Just Works' hallelujah on one side, and the enormous fragmented forum discussions on DLL hell on the other, especially related to the exceptions I had to deal with. By focusing on errors and exceptions instead of programming, I hope to make this gap a bit smaller for all those others who have to deal with COM Interop.
For, in all honesty, once everything works it really adds a tremendous range of functionality to your programs.