Introduction
This tip presents code to access (read & write) data in an Excel sheet without using any external module, by directly accessing the data in the ooxml format. Also a great effort has been made to make it as fast as possible, minimizing the number of memory allocations, and making it as usable as possible.
You can find the most up to date code at https://github.com/jsegarra1971/SejExcelExport.
Background
There are different techniques to read/write data from an Excel sheet, some of them use the database paradigm (DAO, odbc,..), others use the Excel-like-objects path (worksheets, cells, ...), and others use Excel itself (COM, Interop,...). Each of these techniques has a number of libraries, hiding the dirty details, and making their use fairly simple.
These techniques have their pros and cons, but most of them either, require additional software (i.e.: Excel, Microsoft.ACE.OLEDB,...), usually this is not a big problem, as most of this software is open source, free or very cheap, and easily accessible.
Nevertheless and when it comes to do the job, most of these tools are quite "liberal" in their memory consumption. This usually it is not a problem for small sheets (less than 5,000 rows), but it can bring your system to its knees for big sheets (more than 50,000 rows).
Reading a Sheet
Reading a sheet is quite simple, you just need:
ExcelReader myReader=new ExcelReader("this_is_my_excel_file.xlsx"); myReader.Process(OnExcelCell);
and to get the actual values:
?void OnExcelCell(char Column, int RowNumber, string value)
{
if (Column=='#') Console.Write("Row: "+RowNumber);
else Console.Write("Column: "+Column+" Value: "+value);
}
Currently, only the first sheet and columns up to 'Z' are processed. Nevertheless it is quite simple to overcome this limitations (see ExcelReader.cs).
Writing a Sheet
For writing data into a sheet, we need two things:
- The data, which will come as an implementation of
IDataReader
.
- A template, which will come as an Excel file with placeholders for the actual data. These placeholders are just names beginning with and asterik "*". The values from the data source are retrieved by matching this placeholder with the
FIELDNAME
.
data:image/s3,"s3://crabby-images/0adf8/0adf8982b99979c36a406d325f2baab4e0b4e05f" alt=""
Using this template:
ROW1
is fixed, and will be present in the output file
ROW2
will hold the fields "name
","surname
" and "age
" of the first record of data
ROW3
will hold the fields "name
","surname
" and "age
" of the second record of data
and then repeat until EOF.
ExcelWriter t = new ExcelWriter("Template.xlsx"); MySampleData data = new MySampleData(); t.Export(data,"output_file.xlsx");
Performance
It is hard to read/write an Excel file faster. Included there is a sample that generates an Excel file with 100,000 rows, and then reads all the 100,000 rows. Both operations done in less than 4 seconds, running on a rather outdated system (Windows 7 virtual machine on a 2009 iMac)
data:image/s3,"s3://crabby-images/4307b/4307b4f3aa542489aa29d2c5bdf71dfef158c39c" alt=""