Introduction
There has been the C++ project BasicExcel for quite some years now. It is very useful to read and write Excel sheets in .xls format. However, it contains only very basic functionality. There are missing features like text formatting using fonts, and display formats and formulas are not yet supported. This article covers this additional functionality. The new, derived project ExcelFormat
is based on the old BasicExcel
code, and just adds what I was missing for my own needs. If you want to see where I am using the new library, you can look at the Export functions of Service Manager and Uninstall Manager.
XLS Format
This library processes Excel files in the BIFF8 XLS file format. For a detailed documentation of the format and all its internals, there are two documents available:
Compatibility
You can use the code on MS Windows with the MSVC compilers of VC++ 6.0 SP3 or later. Using GCC on MacOSX or Linux is also supported, but with some limitations. May be it even works in other environments like Sun Solaris, but this is not yet tested.
Here you can see a table of environments with their respective status:
| small XLS files | larger XLS files |
---|
WIN32, using Windows API IStorage | OK | OK |
WIN64, using Windows API IStorage | OK | OK |
MacOS X / Linux 32 bit, using BasicExcel CompoundFile implementation | OK | issues |
MacOS X / Linux 64 bit, using BasicExcel CompoundFile implementation | OK | issues |
The issues listed above are because of the incomplete CompoundFile
implementation of the underlying BasicExcel
library.
I already fixed some part of the BasicExcel
code regarding numbers in RKValue
format and 64 bit compatibility,
but there are still issues that show up when reading or writing larger XLS files which use XBAT or SBAT entries.
Using the Code
To use the new formatting functionality, first create an XLSFormatManager
object, like in the example1()
function, and attach it to an existing BasicExcel
object:
void example1(const char* path)
{
BasicExcel xls;
xls.New(1);
BasicExcelWorksheet* sheet = xls.GetWorksheet(0);
XLSFormatManager fmt_mgr(xls);
You can find all the examples of this article in the source code file Examples.cpp.
To define a custom font, create an ExcelFont
object and set any needed properties, for example, the font weight for a bold font:
ExcelFont font_bold;
font_bold._weight = FW_BOLD;
The format of an Excel cell can be defined by a CellFormat
object, which holds the chosen font and some more properties:
CellFormat fmt_bold(fmt_mgr);
fmt_bold.set_font(font_bold);
After you have prepared the CellFormat
, you can choose the font and display settings of Excel cells by calling SetFormat()
:
int col, row = 0;
for(col=0; col<10; ++col) {
BasicExcelCell* cell = sheet->Cell(row, col);
cell->Set("TITLE");
cell->SetFormat(fmt_bold);
}
while(++row < 4) {
for(int col=0; col<10; ++col)
sheet->Cell(row, col)->Set("text");
}
++row;
Text color is specified by setting color indices in ExcelFont
, for example:
ExcelFont font_red_bold;
font_red_bold._weight = FW_BOLD;
font_red_bold._color_index = EGA_RED;
CellFormat fmt_red_bold(fmt_mgr, font_red_bold);
fmt_red_bold.set_color1(COLOR1_PAT_SOLID); fmt_red_bold.set_color2(MAKE_COLOR2(EGA_BLUE,0));
CellFormat fmt_green(fmt_mgr, ExcelFont().set_color_index(EGA_GREEN));
for(col=0; col<10; ++col) {
BasicExcelCell* cell = sheet->Cell(row, col);
cell->Set("xxx");
cell->SetFormat(fmt_red_bold);
cell = sheet->Cell(row, ++col);
cell->Set("yyy");
cell->SetFormat(fmt_green);
}
ExcelFormat.h contains constants to define basic palette colors in the enumeration EXCEL_COLORS
, you can use in calls to ExcelFont()::set_color_index()
. The macro MAKE_COLOR2
accepts two color indices to specify the pattern line and pattern background colors. As a shortcut to calling CellFormat::set_color1()
and CellFormat::set_color2()
, you can also use CellFormat::set_background()
to define cells with solid background colors or colorize patterns.
After creating and formatting the Excel cells in memory, all you have to do is to save the new Excel sheet as a file:
xls.SaveAs(path);
}
This should be enough for a quick overview on how to use the new ExcelFormat
objects. In the source code file ExcelFormat.cpp, you can find some more examples.
ExcelFont
There are nine font properties available:
struct ExcelFont
{
ExcelFont()
...
wstring _name;
short _height;
short _weight;
short _options;
short _color_index;
short _escapement_type;
char _underline_type;
char _family;
char _character_set;
...
};
For detailed information, please look into the XLS file format documentation.
CellFormat
These CellFormat
properties are available in addition to the chosen ExcelFont
and the display format:
struct CellFormat
{
CellFormat(XLSFormatManager& mgr)
...
char _alignment;
char _rotation;
char _text_props;
int _borderlines;
int _color1;
short _color2;
...
};
In example2()
, you can see how to change the font face and the font size:
ExcelFont font_header;
font_header.set_weight(FW_BOLD);
font_header.set_underline_type(EXCEL_UNDERLINE_SINGLE);
font_header.set_font_name(L"Times New Roman");
font_header.set_color_index(EGA_BLUE);
font_header._options = EXCEL_FONT_STRUCK_OUT;
CellFormat fmt_header(fmt_mgr, font_header);
fmt_header.set_rotation(30);
int row = 0;
for(int col=0; col<10; ++col) {
BasicExcelCell* cell = sheet->Cell(row, col);
cell->Set("TITLE");
cell->SetFormat(fmt_header);
}
The example3()
function shows how to use CellFormat::set_format_string()
and BasicExcelCell::SetFormat()
to define text, number, and date format string
s. There are some predefined constants for format string
s:
#define XLS_FORMAT_GENERAL L"General"
#define XLS_FORMAT_TEXT L"@"
#define XLS_FORMAT_INTEGER L"0"
#define XLS_FORMAT_DECIMAL L"0.00"
#define XLS_FORMAT_PERCENT L"0%"
#define XLS_FORMAT_DATE L"M/D/YY"
#define XLS_FORMAT_TIME L"h:mm:ss"
#define XLS_FORMAT_DATETIME L"M/D/YY h:mm"
However, you can use any valid Excel format string
to define custom display formats.
example4()
shows how to use many different fonts and colors in one sheet:
Formulas
Formulas are now preserved when reading and writing an Excel sheet. You can even store a formula into an Excel cell by calling BasicExcelCell::SetFormula()
. However, currently, you either have to copy a Worksheet::CellTable::RowBlock::CellBlock::Formula
object from an already existing cell, or you have to define it yourself, which is quite tricky because Excel uses a pre-parsed formula containing RPN tokens.
Points of Interest
For those who are interested in a bit of background information, I want to provide a description of what has been changed since the predecessor library BasicExcel.
Conditional Compilation
There are two different BasicExcel
implementations, one for VC++ version 6.0, and one for newer compilers. ExcelFormat
now merges these two code bases by using conditional compilation to handle the differences.
Most of this is accomplished in the header file by this code snippet (and modifications using the newly defined macros):
#if _MSC_VER<=1200 // VC++ 6.0
#pragma warning(disable: 4786)
#define LONGINT __int64
#define LONGINT_CONST(x) x
#define COMPOUNDFILE
#else // newer Microsoft compilers
#define LONGINT long long
#define LONGINT_CONST(x) x##LL
#define COMPOUNDFILE CompoundFile::
#ifdef _DEBUG
#define _ITERATOR_DEBUG_LEVEL 0 // speedup iterator operations while debugging
#endif
#endif
To distinguish between MS Windows environments using MSVC and others using GCC, the macro _MSC_VER is tested for existence. This switches between using the Windows API to access the compound binary file format and the BasicExcel implementation of CompoundFile.
The #pragma warning
statement disables compiler warnings of VC++ 6.0 because of long compiler internal names in object files, which occur for non-trivial usage of STL classes.
Immediately after these defines, there are these preprocessor statements for VS 2005 and above:
#if _MSC_VER>=1400 // VS 2005
#define _CRT_SECURE_NO_WARNINGS //MF
#define _SCL_SECURE_NO_WARNINGS //MF
#endif
They disable the VC++ security library warnings, because the BasicExcel
code is not yet prepared for this new runtime library additions. By the way, the MF comments mark additions and fixes by me to the old BasicExcel
code.
Additionally, I eliminated some compiler warnings by fixing data types. Typically, this was a change from integral types like int
to the C runtime library type size_t
.
To switch between 32 bit and 64 bit mode using GCC, use the compiler option -m32
respective -m64
like in the Makefile example in the source download.
In the VC++ environment you can choose the target environment in the project settings.
New Features
In addition to the new API described above, I had to add code in BasicExcel
in order to implement these new features:
- Read and write the
Formula
structure from/to XLS files
- Define, read, and write the
FORMAT
structure from/to XLS files
- Get/set the XF index value
BasicExcelCell
objects for all the various cell types
Formatting Storage Structures
The information about the format of an Excel cell in the BIFF8 file format is stored using a so called XF index. This refers to an XF (extended format
) record, which consists of the following members:
XF {
short fontRecordIndex short formatRecordIndex
short protectionType
char alignment
char rotation
char textProperties
char usedAttributes
int borderLines
int colour1
short colour2
}
Besides a few directly associated attributes specifying alignment, rotation, etc. of the Excel cell, there are two more index values: fontRecordIndex
and formatRecordIndex
. These are used to define font and display format descriptions. In the whole picture, this two-level index based formatting architecture enables cell formatting with small file sizes and low memory usage, because there are typically only a few different fonts and display formats used in an Excel worksheet. The font index refers to a FONT
record with the following attributes:
FONT {
short height
short options
short colourIndex
short weight
short escapementType
char underlineType
char family
char characterSet
char unused
SmallString name
}
The third index is a special one. This format index is associated with a record containing only the index itself and a text representation of the display format:
FORMAT {
short index
LargeString fmtstring
}
XLSFormatManager
manages these three formatting substructures to format Excel cells using the C++ structures CellFormat
and ExcelFont
:
struct CellFormat -> XF {FORMAT index, FONT index, XF attributes}
struct ExcelFont -> FONT {FONT attributes}
On calling CellFormat::set_font(const ExcelFont& font)
, the manager class searches for an already registered matching font description. If there is none yet, a new FONT
record is created to be stored in the Excel sheet. On calling CellFormat::set_format_string(const wstring& fmt_str)
, the manager class searches for an already registered matching display format string
. If there is none yet, a new FORMAT
record stores the display format string
. The same strategy is used when applying a CellFormat
to a cell object by calling BasicExcelCell::SetFormat(const CellFormat& fmt)
: the manager class searches for an already registered matching XF description with identical font and format indices as well as matching XF attributes. If there is none yet, a new XF
record is created to be stored in the Excel sheet. In the end, this results in a list of FORMAT
, FONT
, and XF
records, which are stored in the header of the Excel workbook file. Each cell is formatted by storing a single related XF index, which determines all the cell formatting attributes, the font, and the display format.
For further implementation details, please have a look into ExcelFormat.h, ExcelFormat.cpp, BasicExcel.hpp and BasicExcel.cpp of the source code.
Memory Usage
To limit memory usage, a reference counting SmartPtr
is used to manage the heap structures needed for formula storage. Every cell contains only an optionally filled pointer to the struct Formula
. For text or number cells, formula information is not needed. So the smart pointer stays empty, and just contains the value NULL
.
In the following section, you can see the implementation of the struct RefCnt
as the base for struct Formula
and the template struct SmartPtr
to hold the reference counted heap objects:
struct RefCnt
{
RefCnt()
: _ref_cnt(0)
{
}
int _ref_cnt;
};
template<typename T> struct SmartPtr
{
SmartPtr()
: _ptr(NULL)
{
}
SmartPtr(T* p)
: _ptr(p)
{
if (p)
++_ptr->_ref_cnt;
}
SmartPtr(const SmartPtr& other)
: _ptr(other._ptr)
{
if (_ptr)
++_ptr->_ref_cnt;
}
~SmartPtr()
{
if (_ptr) {
if (!--_ptr->_ref_cnt)
delete _ptr;
}
}
SmartPtr& operator=(T* p)
{
if (_ptr) {
if (!--_ptr->_ref_cnt)
delete _ptr;
_ptr = NULL;
}
if (p) {
_ptr = p;
++_ptr->_ref_cnt;
}
return *this;
}
operator bool() const {return _ptr != NULL;}
bool operator!() const {return !_ptr;}
T* operator->() {return _ptr;}
const T* operator->() const {return _ptr;}
T& operator*() {return *_ptr;}
const T& operator*() const {return *_ptr;}
private:
T* _ptr;
};
String Conversion
There are some new string
conversion functions: stringFromSmallString()
, stringFromLargeString()
, wstringFromSmallString()
, wstringFromLargeString()
using narrow_string()
/widen_string()
are used to convert from internal Excel string
structures to STL string
classes and vice versa. You can use them to access the internal data storage of BasicExcel
.
History
- 20.09.2009 - Version 1.0 of
ExcelFormat
(BasicExcel
Version 2.0)
- 28.09.2009 - Version 2.0
- Added new section Formatting storage structures
- 04.10.2009 - Version 2.1
- Updated source code with macros and constants for cell and font properties
- 07.11.2009 - Version 2.2
- Fixed VS2008 problem when reading sheets with formula fields
- Added
BasicExcel::Close()
, CellFormat::get
/set_text_props()
and get/set_borderlines()
- 12.01.2010 - Version 2.3: (by Ami Castonguay and Martin Fuchs)
- Fixed reference counting of Formula data structs
- Support for shared formulas
- Support for merged cells
- Save formatting even if cell is empty
- Flush
fstream
instead of closing it followed by open to prevent races in conjunction with virus scanners - Enable reading of XLS files exported by MacOS Numbers.app
- 15.11.2010 - Version 2.4
- Add second
set_borderlines()
overload - Add
ExcelFont::set_italic()
, CellFormat::set_wrapping()
- Handle
COLINFO
- Miscellaneous fixes
- 01.01.2011 - Version 2.5
- Dynamically allocate memory for unexpected high row/column values while loading XLS files
- Unicode overloads for
Load()
and SaveAs()
- Adjust to
RKValues
written by OpenOffice Calc
- 03.02.2011 - Version 3.0
- access compound document files using the Windows API
- reduced memory consumption and increased speed
- 64 bit portability
- return current value string from formula cells