Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

CSpreadSheet - A Class to Read and Write to Excel and Text Delimited Spreadsheet

0.00/5 (No votes)
15 Feb 2002 8  
A Class to Read and Write to Excel and Text Delimited Spreadsheet

Introduction

There are times when we wish to send our output to an Excel file or some text delimited files like comma separated value file or tab delimited file. This class encapsulates the functions for reading and writing to these types of files. It was constructed with a goal of being simple to use. The main features of this class are:

  1. Constructing a new Excel or text delimited file and writing rows or a single cell to it.
  2. Reading rows, columns or a single cell from an Excel or text delimited file.
  3. Replacing, inserting or appending rows to an Excel or text delimited file.
  4. Convert an existing or newly created Excel to text delimited file and vice versa

Before I describe the various functions of the class, I will first state the limitations of this class. The limitations are:

  1. Requires MFC.
  2. May or may not support unicode. (didn't check it out)
  3. Uses ODBC to read and write to Excel file, thus will have the limitations of ODBC driver. (I am not sure whatever limitation does the ODBC driver has)
  4. Must have header row in Excel file and every column of the header row must be unique. (because class is treating Excel file as a database)
  5. Cannot delete a sheet from Excel file. Can only delete contents of that sheet
  6. Cell values are treated as a string, regardless of their format in Excel.

I have inserted error checking code whenever possible but I may have missed out some cases. Thus the user is required to keep the limitations in mind to prevent unexpected errors.

Below is a detailed description of the class. It is split into two parts. The first part is for those who just wish to know how to use the class. It contains a brief description of all the useful functions and how to use them. It is followed by an example that demonstrates most of the functions in the class.

The second part is for those who wish to know all the functions and variables of the class in detail. That section will describe how the function works, what are the limitations and possible errors, and how the functions, after many revisions, came to their final form. This will be useful to those who intends to change the class to suit their needs.

Now, prepare yourself for a long article ahead..

Part I. How to use the class

The following is an overview of the useful functions in the class, organised by whether they are common to both Excel and text delimited spreadsheet or only to one of them

Common Functions

CSpreadSheet(CString File, CString SheetOrSeparator, bool Backup = true)
bool AddHeaders(CStringArray &FieldNames, bool replace = false)
bool AddRow(CStringArray &RowValues, long row = 0, bool replace = false)
bool AddCell(CString CellValue, short column, long row = 0)
bool AddCell(CString CellValue, CString column, long row = 0, bool Auto = true)
bool ReadRow(CStringArray &RowValues, long row = 0)
bool ReadColumn(CStringArray &ColumnValues, short column)
bool ReadColumn(CStringArray &ColumnValues, CString column, bool Auto = true)
bool ReadCell (CString &CellValue, short column, long row = 0)
bool ReadCell (CString &CellValue, CString column, long row = 0, bool Auto = true)
bool DeleteSheet()
bool DeleteSheet(CString SheetName)
bool Convert(CString SheetOrSeparator)
void BeginTransaction()
bool Commit()
bool RollBack()
bool GetTransactionStatus()
void GetFieldNames (CStringArray &FieldNames)
long GetTotalRows()
short GetTotalColumns()
long GetCurrentRow()
bool GetBackupStatus()
CString GetLastError()

Excel Spreadsheet Specific Functions

bool ReplaceRows(CStringArray &NewRowValues, CStringArray &OldRowValues) 

Text Delimited Spreadsheet Specific Functions

None

Common Functions

CSpreadSheet(CString File, CString SheetOrSeparator, bool Backup = true)

This constructor will open an Excel file (xls) or a text delimited file (any other file extensions e.g. csv, tab) for reading (for existing file) or writing (for new file or existing file). The variable SheetOrSeparator is the name of the sheet if the file is an Excel file or the type of separator if the file is a text delimited file. The default constructor will create a backup file (.bak for text delimited file) or a backup sheet (named CSpreadSheetBackup for Excel file) if opening an existing file.

bool AddHeaders(CStringArray &FieldNames, bool replace = false)

This function will add a header row to the opened spreadsheet. For an Excel spreadsheet, each column of the header row must be unique. For text delimited file, there is no such limitation. If the opened spreadsheet is an existing file, the default is to add new columns to the header row unless the variable replace is set as true. The return value will indicate whether the addition is successful or not. Note that for a new Excel spreadsheet, this function should be called before adding any rows to the spreadsheet. For text delimited spreadsheet, this function is optional.

bool AddRow(CStringArray &RowValues, long row = 0, bool replace = false)

This function will append, insert or replace a row to the opened spreadsheet. The default is appending a row to the end of the spreadsheet. Depending on the value of the variable replace, the new row will be inserted or replaced at the indicated row. The default is to insert the new row in the indicated row. The return value will indicate whether the addition is successful or not. Note that row=1 means the first row (which is the header row).

bool AddCell(CString CellValue, short column, long row = 0)
bool AddCell(CString CellValue, CString column, long row = 0, bool Auto = true)

The first AddCell function will add or replace a cell to the opened spreadsheet. The default is to append a new row with the cell value at the indicated column to the end of the spreadsheet.

The second AddCell function is similar except that the variable column is either the Excel column name (e.g. A, B, AE, EF etc) or the field name of the header row. Set the variable Auto to false if you don't want the function to automatically detect whether the variable column is an Excel column name or a header row field name (for cases where the field name of the header row is a two letter name).

The return value will indicate whether the addition is successful or not. Note that row=1 means the first row (which is the header row).

bool ReadRow(CStringArray &RowValues, long row = 0)

This function will read a row from the opened spreadsheet. The default is to read the next row. That is if you run the function twice either consecutively or after some statements in between, the first time the function will return the first row and the second time it will return the second row. The return value will indicate whether the row was correctly read or whether there are no more rows to read. Note that row=1 means the first row (which is the header row).

bool ReadColumn(CStringArray &ColumnValues, short column)
bool ReadColumn(CStringArray &ColumnValues, CString column, bool Auto = true)

The first function will read a column from the opened spreadsheet. The variable column is in terms of the column number.
The second function is similar except that the variable column is either the Excel column name (e.g. A, B, AE, EF etc) or the field name of the header row. Set the variable Auto to false if you don't want the function to automatically detect whether the variable column is an Excel column name or a header row field name (for cases where the field name of the header row is a two letter name).

The return value will indicate whether the reading of the column is successful or not.

bool ReadCell (CString &CellValue, short column, long row = 0)
bool ReadCell (CString &CellValue, CString column, long row = 0, bool Auto = true)

The first function will read a single cell from the opened spreadsheet. The default is to read the cell in the next row (similar to ReadRow above).

The second function is similar except that the variable column is either the Excel column name (e.g. A, B, AE, EF etc) or the field name of the header row. Set the variable Auto to false if you don't want the function to automatically detect whether the variable column is an Excel column name or a header row field name (for cases where the field name of the header row is a two letter name).

The return value will indicate whether the reading of the column is successful or not. Note that row=1 means the first row (which is the header row).

bool DeleteSheet()
bool DeleteSheet(CString SheetName)

The first function will erase the entire contents of the opened spreadsheet.

The second function will erase the entire contents of a specified sheet in the opened Excel spreadsheet. This sheet can be any sheet that is present in the opened Excel spreadsheet.

The return value will indicate whether the erase is successful or not. Note that this two function does not support RollBack for an Excel spreadsheet

bool Convert(CString SheetOrSeparator)

This function will convert an Excel sheet to a text delimited file (.csv of the same name) with the specified separator and vice versa. If converting from text delimited file to an Excel file, the variable separator is not used. The return value will indicate whether the conversion is successful or not.

void BeginTransaction()
bool Commit()
bool RollBack()

The above three functions is similar to the corresponding functions in SQL. Use BeginTransaction to denote the start of a transaction. Use Commit to save the changes made to the spreadsheet. Use RollBack to undo changes made to the spreadsheet (i.e. return the spreadsheet to the state before the BeginTransaction statement). The return values of Commit and RollBack will indicate whether the changes are saved successfully or undo successfully respectively.

bool GetTransactionStatus()

This function will return the Transaction status. True if Transaction is started. False if Transaction is not started or has ended.

void GetFieldNames (CStringArray &FieldNames)

This function will get the header row of the spreadsheet.

long GetTotalRows()

This function will return the total number of rows in the spreadsheet.

short GetTotalColumns()

This function will return the total number of columns in an Excel spreadsheet. For a text delimited spreadsheet, it will return the largest number of columns present in the spreadsheet.

long GetCurrentRow()

This function will return the currently selected row in the spreadsheet. The currently selected row is the row that will be read by the default ReadRow function or the default ReadCell function.

bool GetBackupStatus()

This function will return the backup status of the spreadsheet. True if the backup was performed. False if the backup was not performed (either because of user's choice or an error occurred).

CString GetLastError()

This function will return the last error message. For some functions like the AddHeaders, AddRow, AddCell functions etc, some errors may occur. This function will return a string containing a description of the error that occurred.

Excel Spreadsheet Specific Functions

bool ReplaceRows(CStringArray &NewRowValues, CStringArray &OldRowValues)
This function will search and replace multiple occurrences of the old row with the new row. This function does not support RollBack so changes cannot be undo. The return value will indicate whether the replace is successful or not.

Demonstration of the functions

// Create a new Excel spreadsheet, filename is test.xls, sheetname is TestSheet

CSpreadSheet SS("Test.xls", "TestSheet");

// Fill a sample 5 by 5 sheet

CStringArray sampleArray, testRow, Rows, Column;
CString tempString;
char alphabet = 'A';

SS.BeginTransaction();
for (int i = 1; i <= 5; i++)
{
    sampleArray.RemoveAll();
    for (int j = 1; j <= 5; j++)
    {
        tempString.Format("%c%d", alphabet++, i);
        sampleArray.Add(tempString);
    }
    alphabet = 'A';
    if (i == 1) // Add header rows

    {
        SS.AddHeaders(sampleArray);
    }
    else // Add data rows

    {
        SS.AddRow(sampleArray);
    }
}

// Set up test row for appending, inserting and replacing

for (int k = 1; k <= 5; k++)
{
    testRow.Add("Test");
}

SS.AddRow(testRow); // append test row to spreadsheet

SS.AddRow(testRow, 2); // insert test row into second row of spreadsheet

SS.AddRow(testRow, 4, true); // replace fourth row of spreadsheet with test row


SS.Committ();

SS.Convert(";"); // convert Excel spreadsheet into text delimited format 

                 // with ; as separator


// print out total number of rows

printf("Total number of rows = %d\n\n", SS.GetTotalRows()); 

// Print out entire spreadsheet

for (i = 1; i <= SS.GetTotalRows(); i++)
{
    // Read row

    SS.ReadRow(Rows, i);
    for (int j = 1; j <= Rows.GetSize(); j++)
    {
        if (j != Rows.GetSize())
        {
            printf("%s\t", Rows.GetAt(j-1));
        }
        else
        {
            printf("%s\n", Rows.GetAt(j-1));
        }
    }
}

// print out total number of columns

printf("\nTotal number of columns = %d\n\n", SS.GetTotalColumns()); 

// Read and print out contents of second column of spreadsheet

SS.ReadColumn(Column, 2);
for (i = 0; i < Column.GetSize(); i++)
{
    printf("Column 2 row %d: %s\n", i+1, Column.GetAt(i));
}

// Read in and print out the cell value at column 3, row 3 of spreadsheet

if (SS.ReadCell(tempString, 3, 3))
{
    printf("\nCell value at (3,3): %s\n", tempString);
}
else
{
    // print out error message if cell value cannot be read

    printf("Error: %s\n", SS.GetLastError); 
}

Part II. Detailed description of the functions and variables

CSpreadSheet(CString File, CString SheetOrSeparator, bool Backup = true)

The function will first check the file extension to determine whether it is an Excel file (.xls) or a text delimited file (any other extensions). If the file is an Excel file, it will set an internal flag m_bExcel to true. Otherwise, m_bExcel is false. The filename is stored in m_sFile. The constructor will then assign the string SheetOrSeparator to be either the Sheet name of the Excel file (by setting m_sSheetName) or the Separator (by setting m_sSeparator) for the text delimited file.

The file is then opened by calling the function Open. The reason for using another function to open the spreadsheet is because the function will also be used by the function RollBack to undo changes to the file. After the file is opened successfully, the total number of rows and columns of the spreadsheet are found and stored in m_dTotalRows and m_dTotalColumns respectively. The header row of the spreadsheet is also found and stored in m_aFieldNames. The current row (stored in m_dCurrentRow) is then set to 1. This row corresponds to the row corresponds to the header row. All the rows are stored in memory (stored in m_aRows). For an Excel spreadsheet, all the columns in each row are first joined by the separator: ,;.? before being stored in m_aRows. This storing of rows in memory permits an implementation of a undo function. Backup of the previous file or sheet is then performed if necessary. The backup status is then stored in a flag m_bBackup.

The Excel file is opened using ODBC driver and thus it is treated as a database. Due to the limitation of the Excel driver, recordsets can only be used to read from the file. Recordsets cannot be used to write to the file. Thus SQL statements are used for writing to the Excel file. Even then, only a subset of SQL are available. Thus the writing to Excel file is somewhat tedious and may be slow.

bool AddHeaders(CStringArray &FieldNames, bool replace = false)

The adding, replacing or appending of a header row in a text delimited spreadsheet is easy and straightforward. However, the same cannot be said for an Excel spreadsheet. This is because the Excel spreadsheet is opened using ODBC driver. The first row of an Excel spreadsheet is treated by default by the ODBC driver as the row containing the field names of the database. Thus each column of the header row must be unique. There cannot be two cells in the header row with the same name. Hence, when appending to the header row in an Excel spreadsheet, the function must first check that the columns of the header row are unique. When adding a new header row or replacing the header row in an Excel spreadsheet, the job of checking for duplicate column names is passed on to the function AddRow.

bool AddRow(CStringArray &RowValues, long row = 0, bool replace = false)
bool AddCell(CString CellValue, short column, long row = 0)
bool AddCell(CString CellValue, CString column, long row = 0, bool Auto = true)

The three functions are very similar in implementation except that AddCell does not support insertion. The insertion of a single cell (either shift cells down or shift cells right) is too tedious and seldom of use (at least to me). Anyone who needs this type of insertion is welcome to modify this function. I won't be doing it.
The flow of the functions is as follows: First, the function checks whether the user specifies any row to add the new row or cell. If the user does not specify any row, the new row or cell is then appended to the end of the spreadsheet. If a row is specified, the functions will then check whether the specified row is greater than the total number of rows in the spreadsheet. If the specified row is greater than the total number of rows, blank rows are added to the spreadsheet until just before the specified row is reached. Then the new row is appended to the end of the spreadsheet. If the specified row is less than the total number of rows, the function AddRow will then check to see whether the new row is to be inserted into or replace the specified row. For AddCell, the new cell will replace the specified cell.

For Excel spreadsheet, there are a few additional checks. First, if the row specified is the header row, it will check to ensure that there is no duplicate column names. It will then check to ensure that the number of columns in the header row is not reduced as this will affect the integrity of the Excel spreadsheet (remember Excel is being treated as a database). If the row specified is not the header row, the function will check to ensure there is a header row. Then it will check to ensure that the total number of columns added is not more than the columns in the header row.

The AddCell function is overloaded to give the user a variety of ways to specify the desired column. The column can be specified as the field name, the column alphabet in an Excel spreadsheet or as the column number, starting from 1. When the column is entered as a string, the function will automatically detect whether the column is in terms of the field name or column alphabet. It does this by checking the length of the input string. If the length of the input string is 1 or 2, it will assume the string is in terms of column alphabet. Otherwise, it will assume the string is in terms of the field name. For the column alphabet case, the function will then call the function CalculateColumnNumber to obtain the correct column number. For the field name case, the function will use the field name to match against each column of the header row to obtain the correct column number. The variable Auto is used to handle the special case of a field name being 1 or 2 characters in length. Because of the algorithm for the automatic detection of type of column, a field name of 1 or 2 characters in length will be misclassified as a column alphabet and thus results in a incorrect computed column number. For such special cases, setting Auto to false will disable the automatic classification system and cause the function to interpret the column as a field name instead of a column alphabet.

bool ReadRow(CStringArray &RowValues, long row = 0)
bool ReadColumn(CStringArray &ColumnValues, short column)
bool ReadColumn(CStringArray &ColumnValues, CString column, bool Auto = true)
bool ReadCell (CString &CellValue, short column, long row = 0)
bool ReadCell (CString &CellValue, CString column, long row = 0, bool Auto = true)

The implementation of these 5 functions are relatively simple. The function ReadRow is the most important function of the five. This function is called by the functions AddRow and AddCell. It is also used by the functions ReadColumn and ReadCell when the file is a text delimited file.

The function ReadRow first has to split each row into the individual columns. The row is split by the separator (m_sSeparator). For an Excel spreadsheet, the separator is ,;.?. The function will first try to split the rows by assuming the file has the following syntax: "column1"separator"column2"separator"column3". E.g for a csv file: "column1","column2","column3". If the function fails to split the rows in this way, it will try to split the rows by assuming the file has the syntax of column1separatorcolumn2separatorcolumn3. E.g. for a csv file: column1,column2,column3. Another type of syntax other than the 2 above mentioned are not supported. If the text delimited file belongs to the first syntax, the function will then remove the quotes from both the first and last column. This step is necessary because of the way the splitting of the row is implemented. For an Excel spreadsheet, the rows are stored in memory using the first syntax.
The functions ReadColumn and ReadCell depends on this function ReadRow to work. For ReadCell, it will call ReadRow to obtain the columns of the desired row and then get the desired cell value. For ReadColumn, it will call ReadRow multiple times to extract the desired column.
As for the case of AddCell, the functions ReadColumn and ReadCell are overloaded to allow flexibility in the input of the desired column. The usage of the variables column and Auto is the same as that for AddCell.

bool ReplaceRows(CStringArray &NewRowValues, CStringArray &OldRowValues)

This function uses the SQL statement, UPDATE...SET...WHERE, to search and replace multiple rows in an Excel spreadsheet. Since it uses SQL, the function is not available for a text delimited spreadsheet. Also the function does not work on a memory copy of the rows but rather on the disk copy itself. Thus the changes cannot be undo.

bool DeleteSheet()
bool DeleteSheet(CString SheetName)

The two overloaded functions are straightforward for the text delimited spreadsheet. Basically, it deletes the memory copy of the rows and field names. It also resets the total number of columns and rows to zero. For an Excel spreadsheet, it uses the SQL statement, DROP TABLE. Thus the deletion cannot be undo as it work on the disk copy directly. Also, due to the limitation of the ODBC driver, the sheet itself is not deleted. Only the contents are erased.

bool Convert(CString SheetOrSeparator)

The conversion of an Excel spreadsheet to a text delimited spreadsheet is pretty straightforward. This function can be used for batch conversion of Excel to text delimited spreadsheets or vice versa. However, the limitation of the uniqueness of each column of the header row must be borne in mind when converting from text delimited spreadsheets to Excel.

void BeginTransaction()
bool Commit()
bool RollBack()

The three functions are useful as they enabls changes to be undone by using the function RollBack. There is another advantage to using BeginTransaction if you are doing a lot of addition of rows or cells. By default, after every addition the changes are saved to disk. But if BeginTransaction is used, the saving of the changes to disk are delayed until the function Commit is called. This will speed up the addition of rows or cells tremendously.

The implementation of Commit is straightforward for text delimited spreadsheets. For Excel spreadsheet, the implementation is slightly more tricky. As mentioned before, writing to an Excel file requires the use of SQL statements. To write to an Excel file, we have to take note of whether the sheet is already present or it is a new sheet. Both cases will requires different approaches in the CREATE TABLE statement. For a new sheet, we will use CREATE TABLE sheetname. For an existing sheet, we have to first delete the sheet. However as mentioned earlier, deleting an Excel spreadsheet only deletes its content. The actual sheet is not deleted. This poses problems as the CREATE TABLE sheetname cannot be used now as the ODBC driver will return an exception stating that the sheet is already present. However, if you use INSERT INTO statement to insert rows into the spreadsheet, it will also return an error. The solution to this problem is to use CREATE TABLE [sheetname$A1:IV65536] to create the sheet first before using the INSERT INTO statement to insert rows. Another interesting thing that I found is that the INSERT INTO statement is not as straightforward also. To insert a row into a sheet, you have to issue the statement INSERT INTO [sheetname$A1:IVx] where x is an increasing number which increases with the number of rows in the spreadsheet.

bool Open()
The function Open is called by the constructor and RollBack. function. Basically, it open the spreadsheet, read its contents and store them in memory and then closes the file. This function cannot be called directly by the user.

void GetExcelDriver()
This function gets the name of the Excel-ODBC driver. It is called by the constructor and cannot be called directly by the user.

short CalculateColumnNumber(CString column, bool Auto)
This function converts an Excel column in alphabet or an Excel field name into column number. It is used by the functions AddCell, ReadCell and ReadColumn. It cannot be called directly by the user.

Variables

bool m_bAppend; // Internal flag to denote newly created spreadsheet or 

                // previously created spreadsheet

bool m_bBackup; // Internal flag to denote status of Backup

bool m_bExcel; // Internal flag to denote whether file is Excel spreadsheet 

               // or text delimited spreadsheet

bool m_bTransaction; // Internal flag to denote status of Transaction

long m_dCurrentRow; // Index of current row, starting from 1

long m_dTotalRows; // Total number of rows in spreadsheet

short m_dTotalColumns; // Total number of columns in Excel spreadsheet. Largest number of columns 

                       // in text delimited spreadsheet

CString m_sSql; // SQL statement to open Excel spreadsheet for reading

CString m_sDsn; // DSN string to open Excel spreadsheet for reading and writing

CString m_stempSql; // Temporary string for SQL statements or for use by functions

CString m_stempString; // Temporary string for use by functions

CString m_sSheetName; // Sheet name of Excel spreadsheet

CString m_sExcelDriver; // Name of Excel Driver

CString m_sFile; // Spreadsheet file name

CString m_sSeparator; // Separator in text delimited spreadsheet

CString m_sLastError; // Last error message

CStringArray m_atempArray; // Temporary array for use by functions

CStringArray m_aFieldNames; // Header row in spreadsheet

CStringArray m_aRows; // Content of all the rows in spreadsheet

CDatabase *m_Database; // Database variable for Excel spreadsheet

CRecordset *m_rSheet; // Recordset for Excel spreadsheet

History

When I first started writing this class, I implement the reading of Excel spreadsheet first. However, it soon turns out that that implementation wasn't very good for writing to Excel file. So I modified it and soon it was able to read and write to Excel spreadsheets. There were a few hiccups regarding the SQL statements. It took quite some time of debugging before I managed to get the SQL statements correct. Then I started to add in the functions to handle text delimited files. At first everything seems ok and I was ready to publish this class. However, the more I think about it, the more I feel that something is wrong. In that version, the BeginTransaction, Commit and Rollback functions can only work on text delimited files. Because of the way I started with the implementation of the Excel portion, any changes made were immediately written to disk. This prevent any undoing of changes and reduces performance of the class when many reading or writing are done. So I decided to revamp the Excel portion and the final implementation was similar to the text delimited file portion. This enables the BeginTransaction, Commit and Rollback functions to be used. However, I must admit that I was pretty tired of working on the class already by this time. Thus I may not have check all the functions as thoroughly as the first version. So there may be bugs in the class. I welcome any suggestions to improve the class further and also any bug reports. Finally, I hope anyone who modify this class releases the modified version for all to use so that all will benefit.

Acknowledgements

I wish to thank Alexander Mikula for providing the inspiration to write this class. Without his article on read and writing to Excel using ODBC drivers, I wouldn't be able to start writing this class. I also wish to thank Christopher W. Backen for providing the function GetExcelDriver. Finally, I wish to thank everyone who have managed to read the whole of this article. I know it is a pretty long article. Well, the class is not small (1335 lines) and there are many functions so I hope to give a good documentation. Hopefully with this documentation, beginners can use this class easily without any problem and advanced user can modify the class to suit their needs better.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here