Introduction
This code demonstrates how to access Excel spreadsheets using C++, COM and ATL using Visual Studio .NET 2003 (v7.1).
Background
I'm an old school C++ programmer and much of my work revolves around using C++. Although I'm not opposed to using new methodologies like C# (and have in fact for other consulting gigs) I always like to come back to C++.
First, the reason why I needed to access Excel in the first place.
My company is developing a product for Windows 2000 and XP and needs to get its strings translated. The Powers-That-Be have decided that the files we'll be delivering to our translator will be Excel files, one per project. All well and good.
But I'm kind of lazy and I don't feel like maintaining by hand a spreadsheet littered with translation strings. I'd much rather write a program that extracts the strings (which reside in .xml files) and stuff them into the Excel spreadsheet. Well, the first part was easy. Just use the handy-dandy IXMLDOMDocument
COM interface. Piece of cake.
Then I started exploring how to access Excel programmatically via C++.
Oh... my... God! The lack of documentation on this subject wasn't just skimpy. It was ludicrously non-existent. There were a few scant articles about how to use MFC to do it (don't try it, unless you like generating a billion little modules over the course of a few minutes while the MFC import wizard merrily generates IDispatch
code wrappers for Excel's many interfaces). But I'd abandoned MFC long ago to worship at the altar of ATL and WTL, and never looked back.
After hours of search I managed to find one link that talked about where the various interfaces are stored for the various Excel versions. Since I have Office 2003 installed, I went with that one. (It's stored in the actual Excel.EXE binary, by the way).
"Great!" I said to myself. Just add an #import line to stdafx.h and I'm done.
#import "C:\\Program Files\\Microsoft Office\\OFFICE11\\EXCEL.EXE"
Yeah, right. Boy, was I in for disappointment. You get missing, duplicate, triplicate, googleplicate error messages, warnings, curses and general barking behavior from the compiler and/or linker. Time to roll up the sleeves and get ready to wrestle with include files and #import directives.
The first thing I noticed was that a lot of the missing classes had 'mso' prefixes. Hmmmm... Wonder if that means Microsoft Office? Sure enough, it appears that all of the Office products require the same MSO.DLL to perform a lot of their work. So add that in.
#import "C:\\Program Files\\Common Files\\Microsoft Shared\\OFFICE11\\MSO.DLL"
#import "C:\\Program Files\\Microsoft Office\\OFFICE11\\EXCEL.EXE"
Well, that helped a lot. But it was still barking about 'definitions' not having enough parameters. It seems that a lot of the properties, method names, etc. in Microsoft Office collide with standard C/C++ #define
declarations. Fortunately, the Visual Studio hierophants over at Microsoft anticipated this and provided a mechanism for resolving these, namely the 'rename' option for the #import directive. It also appears that Office is deeply in bed with Visual Basic, because you need an #import for that as well.
After all was said and done, I finally came up with the magically alchemical formulae for including the COM interfaces of Excel into a C++ program.
#import "C:\\Program Files\\Common Files\\Microsoft Shared\\OFFICE11\\MSO.DLL" \
rename( "RGB", "MSORGB" )
using namespace Office;
#import "C:\\Program Files\\Common Files\\Microsoft Shared\\VBA\\VBA6\\VBE6EXT.OLB"
using namespace VBIDE;
#import "C:\\Program Files\\Microsoft Office\\OFFICE11\\EXCEL.EXE" \
rename( "DialogBox", "ExcelDialogBox" ) \
rename( "RGB", "ExcelRGB" ) \
rename( "CopyFile", "ExcelCopyFile" ) \
rename( "ReplaceText", "ExcelReplaceText" )
Voila!
I tried getting fancy and including a 'using namespace Excel;' as well, but the compiler started making growling noises while pawing the ground and I decided to quit while I was ahead. I can type in a few 'Excel::' prefixes to keep things social.
Using the code
The application is a simple WTL dialog based applet. The dialog contains a single list-view control in report mode. It's compiled with Unicode so I don't have to worry about messy MBCS strings. All the code that interfaces with Excel is in the OnLoad( )
method that gets invoked when the user presses the "Load..." button.
The first thing this method does is nuke the items in the list control.
m_list.DeleteAllItems( );
while ( m_list.DeleteColumn( 0 ) );
Then it prompts the user to select the Excel file by using the WTL class CFileDialog
. WTL, by the way, if you aren't altogether that familiar with it, is a truly barebones wrapper (with a few notable exceptions) around the Win32 API. Case in point. The CFileDialog
takes a pointer to a series of null terminated strings that define the filters to help the user with the file selection process. These are somewhat of a pain to define on the fly so I wrote a simple class that takes a single string with vertical bars (|) that are placeholders for the null terminators. It has an operator LPCTSTR()
method that returns a pointer to a string buffer with the vertical bars replaced with their null terminators.
I give to you the AFileFilter
class.
class AFileFilter
{
public:
AFileFilter( LPCTSTR pszFilter ) :
m_strFilter( pszFilter ),
m_pszFilter( NULL )
{
m_pszFilter = m_strFilter.GetBuffer( 0 );
LPTSTR psz = m_pszFilter;
while ( *psz )
{
LPTSTR pszNext = ::CharNext( psz );
if ( *psz == _T('|') )
*psz = _T('\0');
psz = pszNext;
}
return;
}
virtual ~AFileFilter( )
{
m_strFilter.ReleaseBuffer( );
return;
}
public:
operator LPCTSTR( ) const
{
return ( m_pszFilter );
}
protected:
CString m_strFilter;
LPTSTR m_pszFilter;
};
Now to the meat of the method. First we need a pointer to the Excel application. If you run this in the debugger you'll notice a couple second pause when tracing through this code as Excel is loaded by COM. I took this to be a good sign the first time I debugged the program: I was actually invoking Excel! (I always step through lines of new code in the debugger the first time I run a program, don't you?)
Excel::_ApplicationPtr pApplication;
if ( FAILED( pApplication.CreateInstance( _T("Excel.Application") ) ) )
{
Errorf( _T("Failed to initialize Excel::_Application!") );
return;
}
The Errorf()
method just allows me to quickly format a message with optional parameters on the fly for the user.
Next, we have to open the .xls file in Excel. This is done via the Workbooks property and its Open()
method.
_variant_t varOption( (long) DISP_E_PARAMNOTFOUND, VT_ERROR );
Excel::_WorkbookPtr pBook;
pBook =
pApplication->Workbooks->Open( dlgFile.m_szFileName,
varOption, varOption, varOption, varOption,
varOption, varOption, varOption, varOption,
varOption, varOption, varOption, varOption );
If you leave out even one of those varOption
parameters, it won't work. Gee, I wish I knew what they all did.
Next I get the pointer to the first Sheet in the workbook. Naturally the index to the first Sheet is 1, not 0. (Remember: Office is in bed with Visual Basic. MPEG at 11mps on BitTorrent!)
Excel::_WorksheetPtr pSheet = pBook->Sheets->Item[ 1 ];
Phew! Finally, we can start sifting through the spreadsheet data. Pretty near as I can figure out, this is all done via the Range
object. Now the Range
object, if used in a scripted language like C# or Visual Basic is a bold, wonderful thing with playful interfaces and powerful methods. In C++, it's like arguing with Zeus while he's fingering his Aegis.
The basic method for using it, I found, was just to oversupply it with a range. I then used a hack by scanning for empty cells as an the end-of-column or end-of-row marker. I'm sure there are much more elegant ways of using this object in C++, which I carefully leave as an exercise for the student.
First thing I do is fetch the first row, which I'm assuming is the header row delineating all the column names.
Excel::RangePtr pRange =
pSheet->GetRange( _bstr_t( _T("A1") ), _bstr_t( _T("Z1" ) ) );
The code scans this row via the Item operator of the Range
object. I go through a two-step process of first stuffing the cell contents into a _variant_t
and then into a _bstr_t
because I'm lazy and don't want to check the type of the _variant_t
.
_variant_t vItem = pRange->Item[ 1 ][ iColumn ];
_bstr_t bstrText( vItem );
When the bstrText
turns up empty, it's time to stop adding columns.
I then follow madness with lunacy and created a humdinger of an instance of a Range
object.
pRange = pSheet->GetRange( _bstr_t( _T("A2") ),
_bstr_t( _T("Z16384" ) ) );
I use a simple nested loop to create the rows in the list-view control and set the column data.
At the end, I close the book with a VARIANT_FALSE
to prevent any inadvertent changes from creeping into the .xls file.
pBook->Close( VARIANT_FALSE );
Finally, I Quit the Excel application. This is an important step because if you neglect to do it, Excel will be left floating in memory and worse yet, it will keep a lock on the file you opened above. Yeck!
pApplication->Quit( );
That's the lot! Have fun!
Points of Interest
Oh, yeah. What if you want to modify data in the spreadsheet? Well, that turns out to be quite simple. Just use the Item
property as a left operand.
_bstr_t bstrText( _T("Some text!") );
_variant_t vItem( bstrText );
pRange->Item[ 5 ][ 1 ] = vItem;
Keep the C++ fires burning. They'll need us. They'll always need us.
History
First revision.