Introduction
There are times when we wish to send our output to an
Excel file or some text delimited files like comma separated value file or tab
delimited file. This class encapsulates the functions for reading and writing to
these types of files. It was constructed with a goal of being simple to use. The
main features of this class are:
- Constructing a new Excel or text delimited file and writing rows or a
single cell to it.
- Reading rows, columns or a single cell from an Excel or text delimited
file.
- Replacing, inserting or appending rows to an Excel or text delimited file.
- Convert an existing or newly created Excel to text delimited file and vice
versa
Before I describe the various functions of the class, I
will first state the limitations of this class. The limitations are:
- Requires MFC.
- May or may not support unicode. (didn't check it out)
- Uses ODBC to read and write to Excel file, thus will have the limitations
of ODBC driver. (I am not sure whatever limitation does the ODBC driver has)
- Must have header row in Excel file and every column of the header row must
be unique. (because class is treating Excel file as a database)
- Cannot delete a sheet from Excel file. Can only delete contents of that
sheet
- Cell values are treated as a string, regardless of their format in Excel.
I have inserted error checking code whenever possible but
I may have missed out some cases. Thus the user is required to keep the
limitations in mind to prevent unexpected errors.
Below is a detailed description of the class. It is split
into two parts. The first part is for those who just wish to know how to use the
class. It contains a brief description of all the useful functions and how to
use them. It is followed by an example that demonstrates most of the functions
in the class.
The second part is for those who wish to know all the
functions and variables of the class in detail. That section will describe how
the function works, what are the limitations and possible errors, and how the
functions, after many revisions, came to their final form. This will be useful
to those who intends to change the class to suit their needs.
Now, prepare yourself for a long article ahead..
Part I. How to use the class
The following is an overview of the useful functions
in the class, organised by whether they are common to both Excel and text
delimited spreadsheet or only to one of them
Common Functions
CSpreadSheet(CString File, CString SheetOrSeparator, bool Backup = true)
bool AddHeaders(CStringArray &FieldNames, bool replace = false)
bool AddRow(CStringArray &RowValues, long row = 0, bool replace = false)
bool AddCell(CString CellValue, short column, long row = 0)
bool AddCell(CString CellValue, CString column, long row = 0, bool Auto = true)
bool ReadRow(CStringArray &RowValues, long row = 0)
bool ReadColumn(CStringArray &ColumnValues, short column)
bool ReadColumn(CStringArray &ColumnValues, CString column, bool Auto = true)
bool ReadCell (CString &CellValue, short column, long row = 0)
bool ReadCell (CString &CellValue, CString column, long row = 0, bool Auto = true)
bool DeleteSheet()
bool DeleteSheet(CString SheetName)
bool Convert(CString SheetOrSeparator)
void BeginTransaction()
bool Commit()
bool RollBack()
bool GetTransactionStatus()
void GetFieldNames (CStringArray &FieldNames)
long GetTotalRows()
short GetTotalColumns()
long GetCurrentRow()
bool GetBackupStatus()
CString GetLastError()
Excel Spreadsheet Specific Functions
bool ReplaceRows(CStringArray &NewRowValues, CStringArray &OldRowValues)
Text Delimited Spreadsheet Specific Functions
None
Common Functions
CSpreadSheet(CString File, CString SheetOrSeparator, bool Backup = true)
This constructor will open an Excel file (xls) or a
text delimited file (any other file extensions e.g. csv, tab) for reading (for
existing file) or writing (for new file or existing file). The variable SheetOrSeparator is the name of the sheet if the
file is an Excel file or the type of separator if the file is a text delimited
file. The default constructor will create a backup file (.bak for text delimited
file) or a backup sheet (named CSpreadSheetBackup for Excel file) if opening an
existing file.
bool AddHeaders(CStringArray
&FieldNames, bool replace = false)
This function will add a header row to the opened
spreadsheet. For an Excel spreadsheet, each column of the header row must be
unique. For text delimited file, there is no such limitation. If the opened
spreadsheet is an existing file, the default is to add new columns to the header
row unless the variable replace is set as true. The
return value will indicate whether the addition is successful or not. Note that
for a new Excel spreadsheet, this function should be called before adding any rows
to the spreadsheet. For text delimited spreadsheet, this function is optional.
bool AddRow(CStringArray
&RowValues, long row = 0, bool replace = false)
This function will append, insert or replace a row to
the opened spreadsheet. The default is appending a row to the end of the
spreadsheet. Depending on the value of the variable
replace, the new row will be inserted or replaced at the indicated row.
The default is to insert the new row in the indicated row. The return value will
indicate whether the addition is successful or not. Note that row=1 means the
first row (which is the header row).
bool AddCell(CString CellValue, short column, long row = 0)
bool AddCell(CString CellValue, CString column, long row = 0, bool Auto = true)
The first
AddCell function will add or replace a cell to the opened spreadsheet.
The default is to append a new row with the cell value at the indicated column
to the end of the spreadsheet.
The second
AddCell function is similar except that the variable
column is either the Excel column name (e.g. A, B, AE, EF etc) or the
field name of the header row. Set the variable Auto
to false if you don't want the function to automatically detect whether the
variable column is an Excel column name or a header row field name (for cases
where the field name of the header row is a two letter name).
The return value will indicate whether the addition is
successful or not. Note that row=1 means the
first row (which is the header row).
bool ReadRow(CStringArray
&RowValues, long row = 0)
This function will read a row from the opened
spreadsheet. The default is to read the next row. That is if you run the
function twice either consecutively or after some statements in between, the
first time the function will return the first row and the second time it will
return the second row. The return value will indicate whether the row was
correctly read or whether there are no more rows to read. Note that row=1 means the
first row (which is the header row).
bool ReadColumn(CStringArray
&ColumnValues, short column)
bool ReadColumn(CStringArray
&ColumnValues, CString column, bool Auto = true)
The first function will read a column from the opened
spreadsheet. The variable column is in terms
of the column number.
The second function is similar except that the variable
column is either the Excel column name (e.g. A, B, AE, EF etc) or the
field name of the header row. Set the variable Auto
to false if you don't want the function to automatically detect whether the
variable column is an Excel column name or a header row field name (for cases
where the field name of the header row is a two letter name).
The return value will indicate whether the reading of the
column is successful or not.
bool ReadCell
(CString &CellValue, short column, long row = 0)
bool ReadCell (CString
&CellValue, CString column, long row = 0, bool Auto = true)
The first function will read a single cell from the opened
spreadsheet. The default is to read the cell in the next row (similar to
ReadRow above).
The second function is similar except that the variable
column is either the Excel column name (e.g. A, B, AE, EF etc) or the
field name of the header row. Set the variable Auto
to false if you don't want the function to automatically detect whether the
variable column is an Excel column name or a header row field name (for cases
where the field name of the header row is a two letter name).
The return value will indicate whether the reading of the
column is successful or not. Note that row=1 means the
first row (which is the header row).
bool DeleteSheet()
bool DeleteSheet(CString SheetName)
The first function will erase the entire contents of
the opened spreadsheet.
The second function will erase the entire contents of a
specified sheet in the opened Excel spreadsheet. This sheet can be any sheet
that is present in the opened Excel spreadsheet.
The return value will indicate whether the erase is
successful or not. Note that this two function does not support
RollBack for an Excel spreadsheet
bool Convert(CString SheetOrSeparator)
This function will convert an Excel sheet to a text
delimited file (.csv of the same name) with the specified separator and vice
versa. If converting from text delimited file to an Excel file, the variable separator is not used. The return value will
indicate whether the conversion is successful or not.
void BeginTransaction()
bool Commit()
bool RollBack()
The above three functions is similar to the
corresponding functions in SQL. Use BeginTransaction
to denote the start of a transaction. Use Commit to
save the changes made to the spreadsheet. Use RollBack
to undo changes made to the spreadsheet (i.e. return the spreadsheet to the
state before the BeginTransaction statement). The
return values of Commit and
RollBack will indicate whether the changes are saved successfully or undo
successfully respectively.
bool
GetTransactionStatus()
This function will return the Transaction status.
True if Transaction is started. False if Transaction is not started or has
ended.
void GetFieldNames
(CStringArray &FieldNames)
This function will get the header row of the
spreadsheet.
long GetTotalRows()
This function will return the total number of rows in
the spreadsheet.
short
GetTotalColumns()
This function will return the total number of columns
in an Excel spreadsheet. For a text delimited spreadsheet, it will return the
largest number of columns present in the spreadsheet.
long GetCurrentRow()
This function will return the currently selected row
in the spreadsheet. The currently selected row is the row that will be read by
the default ReadRow function or the default ReadCell function.
bool GetBackupStatus()
This function will return the backup status of the
spreadsheet. True if the backup was performed. False if the backup was not
performed (either because of user's choice or an error occurred).
CString GetLastError()
This function will return the last error message. For
some functions like the AddHeaders, AddRow, AddCell
functions etc, some errors may occur. This function will return a string
containing a description of the error that occurred.
Excel Spreadsheet Specific Functions
bool ReplaceRows(CStringArray
&NewRowValues, CStringArray
&OldRowValues)
This function will search and replace multiple
occurrences of the old row with the new row. This function does not support RollBack
so changes cannot be undo. The return value will indicate
whether the replace is successful or not.
Demonstration of the functions
CSpreadSheet SS("Test.xls", "TestSheet");
CStringArray sampleArray, testRow, Rows, Column;
CString tempString;
char alphabet = 'A';
SS.BeginTransaction();
for (int i = 1; i <= 5; i++)
{
sampleArray.RemoveAll();
for (int j = 1; j <= 5; j++)
{
tempString.Format("%c%d", alphabet++, i);
sampleArray.Add(tempString);
}
alphabet = 'A';
if (i == 1)
{
SS.AddHeaders(sampleArray);
}
else
{
SS.AddRow(sampleArray);
}
}
for (int k = 1; k <= 5; k++)
{
testRow.Add("Test");
}
SS.AddRow(testRow);
SS.AddRow(testRow, 2);
SS.AddRow(testRow, 4, true);
SS.Committ();
SS.Convert(";");
printf("Total number of rows = %d\n\n", SS.GetTotalRows());
for (i = 1; i <= SS.GetTotalRows(); i++)
{
SS.ReadRow(Rows, i);
for (int j = 1; j <= Rows.GetSize(); j++)
{
if (j != Rows.GetSize())
{
printf("%s\t", Rows.GetAt(j-1));
}
else
{
printf("%s\n", Rows.GetAt(j-1));
}
}
}
printf("\nTotal number of columns = %d\n\n", SS.GetTotalColumns());
SS.ReadColumn(Column, 2);
for (i = 0; i < Column.GetSize(); i++)
{
printf("Column 2 row %d: %s\n", i+1, Column.GetAt(i));
}
if (SS.ReadCell(tempString, 3, 3))
{
printf("\nCell value at (3,3): %s\n", tempString);
}
else
{
printf("Error: %s\n", SS.GetLastError);
}
Part II. Detailed description of the functions and variables
CSpreadSheet(CString File, CString SheetOrSeparator, bool Backup = true)
The function will first check the file extension to
determine whether it is an Excel file (.xls) or a text delimited file (any other
extensions). If the file is an Excel file, it will set an internal flag m_bExcel to true. Otherwise,
m_bExcel is false. The filename is stored in
m_sFile. The constructor will then assign the string SheetOrSeparator to be either the Sheet name of the
Excel file (by setting m_sSheetName) or the
Separator (by setting m_sSeparator) for the text
delimited file.
The file is then opened by calling the function Open.
The reason for using another function to open the spreadsheet is because the
function will also be used by the function RollBack
to undo changes to the file. After the file is opened successfully, the total
number of rows and columns of the spreadsheet are found and stored in m_dTotalRows and
m_dTotalColumns respectively. The header row of the spreadsheet is also
found and stored in m_aFieldNames. The current row
(stored in m_dCurrentRow) is then set to 1. This
row corresponds to the row corresponds to the header row. All the rows are
stored in memory (stored in m_aRows).
For an Excel spreadsheet, all the columns in each row are first joined by the
separator: ,;.? before being stored in m_aRows. This
storing of rows in memory permits an implementation of a undo function. Backup of the previous file or sheet is then performed if
necessary. The backup status is then stored in a flag
m_bBackup.
The Excel file is opened using ODBC driver and thus it is
treated as a database. Due to the limitation of the Excel driver, recordsets can
only be used to read from the file. Recordsets cannot be used to write to the
file. Thus SQL statements are used for writing to the Excel file. Even then,
only a subset of SQL are available. Thus the writing to Excel file is somewhat
tedious and may be slow.
bool AddHeaders(CStringArray
&FieldNames, bool replace = false)
The adding, replacing or appending of a header row in
a text delimited spreadsheet is easy and straightforward. However, the same
cannot be said for an Excel spreadsheet. This is because the Excel spreadsheet
is opened using ODBC driver. The first row of an Excel spreadsheet is treated by
default by the ODBC driver as the row containing the field names of the
database. Thus each column of the header row must be unique. There cannot be two
cells in the header row with the same name. Hence, when appending to the header row in an Excel spreadsheet, the function
must first check that the columns of the header row are unique. When adding a
new header row or replacing the
header row in an Excel spreadsheet, the job of checking for duplicate column
names is passed on to the function AddRow.
bool AddRow(CStringArray
&RowValues, long row = 0, bool replace = false)
bool AddCell(CString CellValue, short column, long row = 0)
bool AddCell(CString CellValue, CString column, long row = 0, bool Auto = true)
The three functions are very similar in implementation
except that AddCell does not support insertion. The
insertion of a single cell (either shift cells down or shift cells right) is too
tedious and seldom of use (at least to me). Anyone who needs this type of
insertion is welcome to modify this function. I won't be doing it.
The flow of the functions is as follows: First, the function
checks whether the user specifies any row to add the new row or cell. If the
user does not specify any row, the new row or cell is then appended to the end
of the spreadsheet. If a row is specified, the functions will then check whether
the specified row is greater than the total number of rows in the spreadsheet.
If the specified row is greater than the total number of rows, blank rows are
added to the spreadsheet until just before the specified row is reached. Then
the new row is appended to the end of the spreadsheet. If the specified row is
less than the total number of rows, the function AddRow
will then check to see whether the new row is to be inserted into or replace the
specified row. For AddCell, the new cell will
replace the specified cell.
For Excel spreadsheet, there are a few additional checks.
First, if the row specified is the header row, it will check to ensure that
there is no duplicate column names. It will then check to ensure that the number
of columns in the header row is not reduced as this will affect the integrity of
the Excel spreadsheet (remember Excel is being treated as a database). If the
row specified is not the header row, the function will check to ensure there is
a header row. Then it will check to ensure that the total number of columns
added is not more than the columns in the header row.
The AddCell function is
overloaded to give the user a variety of ways
to specify the desired column. The column can be specified as the field name,
the column alphabet in an Excel spreadsheet or as the column number, starting
from 1. When the column is entered as a string, the function will automatically
detect whether the column is in terms of the field name or column alphabet. It
does this by checking the length of the input string. If the length of the input
string is 1 or 2, it will assume the string is in terms of column alphabet.
Otherwise, it will assume the string is in terms of the field name. For the
column alphabet case, the function will then call the function
CalculateColumnNumber to obtain the correct column number. For the field
name case, the function will use the field name to match against each column of
the header row to obtain the correct column number. The variable
Auto is used to handle the special case of a field name being 1 or 2
characters in length. Because of the algorithm for the automatic detection of
type of column, a field name of 1 or 2 characters in length will be
misclassified as a column alphabet and thus results in a incorrect computed
column number. For such special cases, setting Auto
to false will disable the automatic classification system and cause the function
to interpret the column as a field name instead of a column alphabet.
bool ReadRow(CStringArray
&RowValues, long row = 0)
bool ReadColumn(CStringArray
&ColumnValues, short column)
bool ReadColumn(CStringArray
&ColumnValues, CString column, bool Auto = true)
bool ReadCell (CString
&CellValue, short column, long row = 0)
bool ReadCell (CString
&CellValue, CString column, long row = 0, bool Auto = true)
The implementation of these 5 functions are
relatively simple. The function ReadRow is the most
important function of the five. This function is called by the functions AddRow
and AddCell. It is also used by the functions ReadColumn
and ReadCell when the file is a text delimited
file.
The function ReadRow first
has to split each row into the individual columns. The row is split by the
separator (m_sSeparator). For an Excel spreadsheet,
the separator is ,;.?. The function will first
try to split the rows by assuming the file has the following syntax:
"column1"separator"column2"separator"column3". E.g for a csv file:
"column1","column2","column3". If the function fails to split the rows in this
way, it will try to split the rows by assuming the file has the syntax of
column1separatorcolumn2separatorcolumn3. E.g. for a csv file:
column1,column2,column3. Another type of syntax other than the 2 above mentioned
are not supported. If the text delimited file belongs to the first syntax, the
function will then remove the quotes from both the first and last column. This
step is necessary because of the way the splitting of the row is implemented.
For an Excel spreadsheet, the rows are stored in memory using the first syntax.
The functions ReadColumn and ReadCell
depends on this function ReadRow to work. For ReadCell,
it will call ReadRow to obtain the columns of the
desired row and then get the desired cell value. For ReadColumn,
it will call ReadRow multiple times to extract the
desired column.
As for the case of AddCell, the
functions ReadColumn and ReadCell
are overloaded to allow flexibility in the input of the desired column. The
usage of the variables column and Auto is the
same as that for AddCell.
bool ReplaceRows(CStringArray
&NewRowValues, CStringArray
&OldRowValues)
This function uses the SQL statement, UPDATE...SET...WHERE,
to search and replace multiple rows in an Excel spreadsheet. Since it uses SQL,
the function is not available for a text delimited spreadsheet. Also the
function does not work on a memory copy of the rows but rather on the disk copy
itself. Thus the changes cannot be undo.
bool DeleteSheet()
bool DeleteSheet(CString SheetName)
The two overloaded functions are straightforward for the text
delimited spreadsheet. Basically, it deletes the memory copy of the rows and
field names. It also resets the total number of columns and rows to zero. For an
Excel spreadsheet, it uses the SQL statement, DROP TABLE. Thus the deletion
cannot be undo as it work on the disk copy directly. Also, due to the limitation
of the ODBC driver, the sheet itself is not deleted. Only the contents are
erased.
bool Convert(CString SheetOrSeparator)
The conversion of an Excel spreadsheet to a text
delimited spreadsheet is pretty straightforward. This function can be used for
batch conversion of Excel to text delimited spreadsheets or vice versa. However,
the limitation of the uniqueness of each column of the header row must be borne
in mind when converting from text delimited spreadsheets to Excel.
void BeginTransaction()
bool Commit()
bool RollBack()
The three functions are useful as they enabls changes
to be undone
by using the function RollBack. There is another
advantage to using BeginTransaction if you are
doing a lot of addition of rows or cells. By default, after every addition the
changes are saved to disk. But if BeginTransaction
is used, the saving of the changes to disk are delayed until the function Commit is called. This will speed up the addition
of rows or cells tremendously.
The implementation of Commit is
straightforward for text delimited spreadsheets. For Excel spreadsheet, the
implementation is slightly more tricky. As mentioned before, writing to an Excel
file requires the use of SQL statements. To write to an Excel file, we have to
take note of whether the sheet is already present or it is a new sheet. Both
cases will requires different approaches in the CREATE TABLE statement. For a
new sheet, we will use CREATE TABLE sheetname. For an existing sheet, we have to
first delete the sheet. However as mentioned earlier, deleting an Excel
spreadsheet only deletes its content. The actual sheet is not deleted. This
poses problems as the CREATE TABLE sheetname cannot be used now as the ODBC
driver will return an exception stating that the sheet is already present.
However, if you use INSERT INTO statement to insert rows into the spreadsheet,
it will also return an error. The solution to this problem is to use CREATE
TABLE [sheetname$A1:IV65536] to create the sheet first before using the INSERT
INTO statement to insert rows. Another interesting thing that I found is that
the INSERT INTO statement is not as straightforward also. To insert a row into a
sheet, you have to issue the statement INSERT INTO [sheetname$A1:IVx] where x is
an increasing number which increases with the number of rows in the spreadsheet.
bool Open()
The function Open is
called by the constructor and RollBack. function.
Basically, it open the spreadsheet, read its contents and store them in memory
and then closes the file. This function cannot be called directly by the user.
void GetExcelDriver()
This function gets the name of the Excel-ODBC driver.
It is called by the constructor and cannot be called directly by the user.
short
CalculateColumnNumber(CString column, bool Auto)
This function converts an Excel column in alphabet or
an Excel field name into
column number. It is used by the functions AddCell, ReadCell and ReadColumn.
It cannot be called directly by the user.
Variables
bool m_bAppend;
bool m_bBackup;
bool m_bExcel;
bool m_bTransaction;
long m_dCurrentRow;
long m_dTotalRows;
short m_dTotalColumns;
CString m_sSql;
CString m_sDsn;
CString m_stempSql;
CString m_stempString;
CString m_sSheetName;
CString m_sExcelDriver;
CString m_sFile;
CString m_sSeparator;
CString m_sLastError;
CStringArray m_atempArray;
CStringArray m_aFieldNames;
CStringArray m_aRows;
CDatabase *m_Database;
CRecordset *m_rSheet;
History
When I first started writing this class, I implement
the reading of Excel spreadsheet first. However, it soon turns out that that
implementation wasn't very good for writing to Excel file. So I modified it and
soon it was able to read and write to Excel spreadsheets. There were a few
hiccups regarding the SQL statements. It took quite some time of debugging
before I managed to get the SQL statements correct. Then I started to add in the
functions to handle text delimited files. At first everything seems ok and I was
ready to publish this class. However, the more I think about it, the more I feel
that something is wrong. In that version, the BeginTransaction, Commit and
Rollback functions can only work on text delimited files. Because of the way I
started with the implementation of the Excel portion, any changes made were
immediately written to disk. This prevent any undoing of changes and reduces
performance of the class when many reading or writing are done. So I decided to
revamp the Excel portion and the final implementation was similar to the text
delimited file portion. This enables the BeginTransaction, Commit and Rollback
functions to be used. However, I must admit that I was pretty tired of working
on the class already by this time. Thus I may not have check all the
functions as thoroughly as the first version. So there may be bugs in the class.
I welcome any suggestions to improve the class further and also any bug reports.
Finally, I hope anyone who modify this class releases the modified version for
all to use so that all will benefit.
Acknowledgements
I wish to thank Alexander Mikula for providing the
inspiration to write this class. Without his article on read and writing to
Excel using ODBC drivers, I wouldn't be able to start writing this class. I also
wish to thank Christopher W. Backen for
providing the function GetExcelDriver. Finally, I wish to thank everyone who
have managed to read the whole of this article. I know it is a pretty long
article. Well, the class is not small (1335 lines) and there are many functions
so I hope to give a good documentation. Hopefully with this documentation,
beginners can use this class easily without any problem and advanced user can
modify the class to suit their needs better.