Introduction
“MS Office Automation using C++” - this is what I started searching over the internet a few weeks back to plot a graph in an Excel sheet through my program. Fortunately, I got a few - actually very few - inputs from the cyber world, might be because I am a poor searcher in the internet. This article is for those who are still searching the internet with the same keywords.
Object Linking and Embedding
In early days, I wondered how much easier it was to use Visual Basic than any other programming language. Create a media player just by including the Media Player component with our project, create a web browser just by including the Web Browser component to the project etc. When I tried the same with Visual C++, I realized it is not as easy as in VB. I met with lot of linker errors as I was a newbie. The above story is eight years old. Object Linking and Embedding, popularly called as OLE, is a COM based architecture which provides flexibility and reusability while developing software applications. As I said, if you need to develop a media player application, there is not much to do with code. Include the needed components which are already developed by experts. With OLE, we are linking to the component and embedding it to our application. OLE in Windows is everywhere, you can copy paste images, videos, or music files to a Word document, you can open a PDF, Excel, or Word file in Internet Explorer, and so on…
You can find lots of registered components of different applications under HKEY_CLASSES_ROOT\CLSID\{<___CLSID___>}, where {<___CLSID___>} is variant (unique class ID for each registered component).
COM and Interfaces
I won't be able to say anything new about COM and interfaces here. A COM object, as its name suggests, is a component which can be easily attached to any application using its interfaces. A COM component may have any number of interfaces, and it is not necessary for an application to use all its interfaces. An interface is nothing but a pure virtual class. It has no implementation code, and is used only for communication between applications with a COM object.
MS Office Automation Using C++
Let’s start with what Microsoft has to say about “MS Office Automation using C++”:
“Automation (formerly OLE Automation) is a technology that allows you to take advantage of an existing program's functionality and incorporate it into your own applications.”
- With MFC, use the Visual C++ ClassWizard to generate "wrapper classes" from the Microsoft Office type libraries. These classes, as well as other MFC classes, such as
COleVariant
, COleSafeArray
, and COleException
, simplify the tasks of Automation. This method is usually recommended over the others, and most of the Microsoft Knowledge Base examples use MFC. #import
, a new directive that became available with Visual C++ 5.0, creates VC++ "smart pointers" from a specified type library. It is very powerful, but often not recommended because of reference-counting problems that typically occur when used with the Microsoft Office applications.- C/C++ Automation is much more difficult, but sometimes necessary to avoid overhead with MFC, or problems with
#import
. Basically, you work with such APIs as CoCreateInstance()
, and COM interfaces such as IDispatch
and IUnknown
.
The above statements are purely taken from the Microsoft website Office Automation Using Visual C++. This article is all about the third point mentioned above, i.e., C/C++ Automation using COM interfaces, and the article only takes MS Word to explain in detail. Refer to the demo source code for similar MS Excel stuff.
Initialize an MSWord Application
CoInitialize(NULL);
CLSID clsid;
HRESULT hr = CLSIDFromProgID(L"Word.Application", &clsid);
IDispatch *pWApp;
if(SUCCEEDED(hr))
{
hr = CoCreateInstance(clsid, NULL, CLSCTX_LOCAL_SERVER,
IID_IDispatch, (void **)&pWApp);
}
Call CoInitialize()
to initialize the COM library for the current thread, i.e., the current thread will be loaded with COM library DLLs. Later, we need to call CoUnInitialize()
to unload the loaded COM DLLs from memory. As I mentioned earlier, all registered components can be found under “HKCR\CLSID\” in the Registry. You can also find the PROGID (program ID) for the component. Use the CLSIDFromProgID()
API to get the class ID for the component, since we can get the COM object only by using the CLSID. For MS Word, “Word.Application.xx” is the version dependent PROGID, where xx is version of the MS Word installed in the system. For our convenience, to write code independent of the version, MSWord provides another PROGID “Word.Application”, which is under “VersionIndependentProgID
”. Call CoCreateInstance()
with the MS Word CLSID to get an instance of an MS Word application. pWApp
(IDispatch
interface) should receive a valid MS Word component interface object.
IDispatch Interface
IDispatch
is an interface derived from IUnknown
(the base interface), using which applications will expose methods and properties to other applications (our program) to make use of its features. Simply, the IDispatch
pointer we got using CoCreateInstance()
for MS Word is the interface object which will help us to use MS Word methods and properties through our program. In addition to the IUnknown
members, IDispatch
has four more member functions to support OLE Automation.
GetTypeInfoCount()
GetTypeInfo(
)GetIDsOfNames()
Invoke()
The client application (our program) will use the IDispatch::Invoke()
method to call MS Word (or any other component) methods and properties. But, IDispatch::Invoke()
cannot receive or understand the actual method names or property names of the MS Word component. It can understand only the DISPID. A DISPID is a 32-bit value which represents the actual methods or properties of a component. GetIDsOfName()
is the function we can use to get the DISPID for a method or property of the component. For example, refer to the following code which sets the “Visible
” property of an MS Word object:
DISPID dispID;
VARIANT pvResult;
LPOLESTR ptName=_T("Visible");
hr = pWApp->GetIDsOfNames(IID_NULL, &ptName, 1, LOCALE_USER_DEFAULT, &dispID);
if(SUCCEEDED(hr))
{
VARIANT x;
x.vt = VT_I4;
x.lVal =1; DISPID prop=DISPATCH_PROPERTYPUT;
DISPPARAMS dp = { NULL,NULL,0,0 };
dp.cArgs =1;
dp.rgvarg =&x;
dp.cNamedArgs=1;
dp.rgdispidNamedArgs= ∝
hr = pWApp->Invoke(dispID, IID_NULL, LOCALE_SYSTEM_DEFAULT, DISPATCH_PROPERTYPUT,
&dp, &pvResult, NULL, NULL);
}
Get the DISPID of “Visible
”, use the DISPID with Invoke()
to set the “Visible
” property to true
. ptName
will be the actual name of a method or a property, used with the GetIDsOfNames()
method to get an equivalent DISPID. DISPPARAMS
has the parameters for the DISPID (including the method parameter or the property value), used with the Invoke()
method which is the actual call for the method or property.
To make the code easier to use, following is the generic function to call an OLE method or to set/get an OLE property:
HRESULT OLEMethod(int nType, VARIANT *pvResult,
IDispatch *pDisp,LPOLESTR ptName, int cArgs...)
{
if(!pDisp) return E_FAIL;
va_list marker;
va_start(marker, cArgs);
DISPPARAMS dp = { NULL, NULL, 0, 0 };
DISPID dispidNamed = DISPID_PROPERTYPUT;
DISPID dispID;
char szName[200];
WideCharToMultiByte(CP_ACP, 0, ptName, -1, szName, 256, NULL, NULL);
HRESULT hr= pDisp->GetIDsOfNames(IID_NULL, &ptName, 1,
LOCALE_USER_DEFAULT, &dispID);
if(FAILED(hr)) {
return hr;
}
VARIANT *pArgs = new VARIANT[cArgs+1];
for(int i=0; i<cArgs; i++) {
pArgs[i] = va_arg(marker, VARIANT);
}
dp.cArgs = cArgs;
dp.rgvarg = pArgs;
if(nType & DISPATCH_PROPERTYPUT) {
dp.cNamedArgs = 1;
dp.rgdispidNamedArgs = &dispidNamed;
}
hr = pDisp->Invoke(dispID, IID_NULL, LOCALE_SYSTEM_DEFAULT,
nType, &dp, pvResult, NULL, NULL);
if(FAILED(hr)) {
return hr;
}
va_end(marker);
delete [] pArgs;
return hr;
}
The above function is actually named as AutoWrap()
in a Microsoft support article. Nothing new to explain about the function as I have already explained about the GetIDsOfName()
and Invoke()
calls, except that they are separated to a function. Additionally, the function uses variable arguments to handle different number of parameters for different methods/properties. Now, to set the Visible
property of the MS Word object, it is more simpler to use this generic function:
VARIANT x;
x.vt = VT_I4;
x.lVal = 1; hr=OLEMethod(DISPATCH_PROPERTYPUT, NULL, pWApp, L"Visible", 1, x);
Note that, OLEMethod
receives variable parameters. I.e., you can pass any number of parameters depending on the property/method. Following is a summary of the OLEMethod()
parameters,
nType
– Type of call to make, which can be any of the following values:
DISPATCH_PROPERTYPUT
- Set property valueDISPATCH_PROPERTYGET
- Get property valueDISPATCH_METHOD
- Call a method
pvResult
– Return value for the call made; it can be another IDispatch
object, or an integer value, or a boolean, or so on..pDisp
– IDispatch
interface object for which the call is to be made.ptName
– Property or method name.cArgs
– Number of arguments followed after this parameter.- … parameters in reverse order for the call (it can be values of a property, or parameters of a method for the
IDispatch
object).
Methods and Properties
The MS Word application has a number of properties and methods, and everything cannot be explained here. I will explain a couple of functions here; refer to the source code for more functions, because the code for all the method/property calls will look similar. At the end of this section, I will tell you how to find a method name or property name and its parameters or values whenever needed.
To open a word document:
HRESULT CMSWord::OpenDocument(LPCTSTR szFilename, bool bVisible)
{
if(m_pWApp==NULL)
{
if(FAILED(m_hr=Initialize(bVisible)))
return m_hr;
}
COleVariant vFname(szFilename);
VARIANT fname=vFname.Detach();
{
VARIANT result;
VariantInit(&result);
m_hr=OLEMethod(DISPATCH_PROPERTYGET, &result, m_pWApp,
L"Documents", 0);
m_pDocuments= result.pdispVal;
}
{
VARIANT result;
VariantInit(&result);
m_hr=OLEMethod(DISPATCH_METHOD, &result, m_pDocuments,
L"Open", 1, fname);
m_pActiveDocument = result.pdispVal;
}
return m_hr;
}
To open a new document, replace “Open” with “Add” and change the parameters count to 0. Note that, “result
” is the output parameter which holds the IDispatch
object for “Documents” and “Open” (active document) in the above code.
To close all the opened Word documents:
HRESULT CMSWord::CloseDocuments()
{
if(m_pWApp==NULL) return E_FAIL;
{
VARIANT result;
VariantInit(&result);
m_hr=OLEMethod(DISPATCH_METHOD, &result, m_pDocuments,
L"Close", 0);
m_pDocuments=NULL;
m_pActiveDocument=NULL;
}
return m_hr;
}
The following code will set the font for the selected text in the active document:
HRESULT CMSWord::SetFont(LPCTSTR szFontName, int nSize,
bool bBold, bool bItalic,COLORREF crColor)
{
if(!m_pWApp || !m_pActiveDocument) return E_FAIL;
IDispatch *pDocApp;
{
VARIANT result;
VariantInit(&result);
OLEMethod(DISPATCH_PROPERTYGET, &result,
m_pActiveDocument, L"Application", 0);
pDocApp= result.pdispVal;
}
IDispatch *pSelection;
{
VARIANT result;
VariantInit(&result);
OLEMethod(DISPATCH_PROPERTYGET, &result,
pDocApp, L"Selection", 0);
pSelection=result.pdispVal;
}
IDispatch *pFont;
{
VARIANT result;
VariantInit(&result);
OLEMethod(DISPATCH_PROPERTYGET, &result,
pSelection, L"Font", 0);
pFont=result.pdispVal;
}
{
COleVariant oleName(szFontName);
m_hr=OLEMethod(DISPATCH_PROPERTYPUT, NULL, pFont,
L"Name", 1, oleName.Detach());
VARIANT x;
x.vt = VT_I4;
x.lVal = nSize;
m_hr=OLEMethod(DISPATCH_PROPERTYPUT, NULL, pFont, L"Size", 1, x);
x.lVal = crColor;
m_hr=OLEMethod(DISPATCH_PROPERTYPUT, NULL, pFont, L"Color", 1, x);
x.lVal = bBold?1:0;
m_hr=OLEMethod(DISPATCH_PROPERTYPUT, NULL, pFont, L"Bold", 1, x);
x.lVal = bItalic?1:0;
m_hr=OLEMethod(DISPATCH_PROPERTYPUT, NULL, pFont, L"Italic", 1, x);
}
pFont->Release();
pSelection->Release();
pDocApp->Release();
return m_hr;
}
To insert a picture into the active document:
HRESULT CMSWord::InserPicture(LPCTSTR szFilename)
{
if(!m_pWApp || !m_pActiveDocument) return E_FAIL;
IDispatch *pDocApp;
{
VARIANT result;
VariantInit(&result);
OLEMethod(DISPATCH_PROPERTYGET, &result, m_pActiveDocument,
L"Application", 0);
pDocApp= result.pdispVal;
}
IDispatch *pSelection;
{
VARIANT result;
VariantInit(&result);
OLEMethod(DISPATCH_PROPERTYGET, &result, pDocApp, L"Selection", 0);
pSelection=result.pdispVal;
}
IDispatch *pInlineShapes;
{
VARIANT result;
VariantInit(&result);
OLEMethod(DISPATCH_PROPERTYGET, &result, pSelection, L"InlineShapes", 0);
pInlineShapes=result.pdispVal;
}
{
COleVariant varFile(szFilename);
COleVariant varLink((BYTE)0);
COleVariant varSave((BYTE)1);
OLEMethod(DISPATCH_METHOD,NULL,pInlineShapes,L"AddPicture",3,
varSave.Detach(),varLink.Detach(),varFile.Detach());
}
return m_hr;
}
How can we identify a method/property which we need for an MS Word application? The answer is simple, look at the above functions, they can be simply explained as:
Application.Documents.Open(szFilename)
Documents.Close()
ActiveDocument.Application.Selection.Font.Name=szFontName
ActiveDocument.Application.Selection.Font.Size=nSize
ActiveDocument.Application.Selection.Font.Color=crColor
ActiveDocument.Application.Selection.Font.Bold=bBold
ActiveDocument.Application.Selection.Font.Italic=bItalic
ActiveDocument.Application.Selection.InlineShapes.AddPicture(szFilename,false,true)
Does this resemble something familiar to us? Yes, we used to see this often when creating macros in MS Word or MS Excel. These are VBA scripts. So, don’t you think it is much easier to get a method or property name you need for your application?
What to do when you need to know how to insert a picture into a Word document?
- Open MS Word
- Open a new document
- Go to Tools->Macro->Record New Macro
- Choose a keyboard macro and assign a key for the macro
- After the macro recording has been started, go to Insert->Picture->From File
- Choose an image file
- Stop macro
- Go to Tools->Macro->Visual Basic Editor (F11)
- Under “NewMacros”, you can see your recorded macro at the end
- Look at the code:
Selection.InlineShapes.AddPicture FileName:=<your-image>,
LinkToFile:=False, SaveWithDocument=True
Now, compare this with the above InsertPicture()
function so that you can understand how it is being coded in C++.
So, whatever the task you want to do with MS Word through automation, first do that with a sample macro in MS Word itself, and you will get to know the methods and their parameters, the property names, and their values. The job will be easy then with the OLEMethod()
call.
Points to Note
VARIANT
is a structure union which means literally “not sure”. Yes, we are not sure about the data type, so it can be anything. Using VARIANT
, we can get a BYTE
value or an unsigned long value, or an IUnknown
object, or whatever is needed in the case. Those who are already familiar with COM should know about this.- Make sure about the
DISPATCH_METHOD
, DISPATCH_PROPERTYPUT
, and DISPATCH_PROPERTYGET
usage in OLEMethod()
or in the Invoke()
call. We need to decide which needs to be used where, depending on the method or property we use with OLEMethod()
. - Try to understand
GetIDsOfNames()
and Invoke()
explained in the "IDispatch Interface" section of this article, which are more important than other information provided here. COleVariant
is the class version of the VARIANT
structure (union), which makes our job easier to initialize a variant with a value.- The
OLEMethod()
function receives variable parameters in reverse order. For example, the AddPicture
parameters are actually <filename, > in order, whereas in OLEMethod
, call <savewithdocument, /> - If your Word or Excel application remains in memory (check with Task Manager) after closing your client application, make sure you have released all the used
IDispatch
objects.
Conclusion
All the concepts explained above are same for Excel as well. Refer to the demo source code for Excel usage. The aim of this article is not to give you the complete set of methods and properties for MS Word and MS Excel Automation, but to give you a hand to help you do the Automation yourself. All the best.