Excel 2003 and Visual Studio 2010
The need to extract information from existing Excel 2003 Workbooks aroused (.xls file extension) within one of our projects. Using the
newer XML based formats was not an option on our customer’s side. The Visual Studio Tools for Office of the selected IDE Visual Studio
2010 does not support Excel 2003 anymore (see Features Available by Office Application and Project Type,
http://msdn.microsoft.com/en-us/library/aa942839.aspx).
Therefore COM Interop of the Microsoft Excel 11.0 Object Library was the remaining way while focusing on Microsoft provisions. But this
solution depends on the installed version of Excel. So to be able to use and test the Excel 2003 binding, it is necessary to install Excel 2003 first.
The attached download (Compare_COM_vs_NPOI_v11_20120221.zip - 1 MB) of this article provides a sample Excel file too.
It has some sheets with content and calculations.
Access an Excel 2003 file using the Excel 11.0 Object Library
The following code snippet shows a simple traversal of the whole content of an Excel 2003 file. At the beginning, an instance of Excel
Application
is used to open the .xls file. Afterwards, all the Worksheets are traversed row by row and column by column.
The value and the comment of each of the cells is printed to the console screen.
A comprehensive step by step tutorial on the COM interop side is provided in the article "Opening and Navigating
Excel with C#" (http://www.codeproject.com/Articles/5123/Opening-and-Navigating-Excel-with-C).
try
{
xlApp = new Excel.Application();
if (null == xlApp)
{
Console.WriteLine("Excel could not be started. Check that your " +
"office installation and project references are correct.");
return;
}
bool openReadOnly = true;
xlWorkbook = xlApp.Workbooks.Open(extractFile,
0, openReadOnly, 5, "", "", false,
Excel.XlPlatform.xlWindows, "",
true, false, 0, true, false, false);
if (null == xlWorkbook)
{
Console.WriteLine(string.Format(
"Excel Workbook '{0}' could not be opened.", extractFile));
}
Console.WriteLine();
foreach (Excel.Worksheet xlSheet in xlWorkbook.Worksheets)
{
if (null != xlSheet)
{
Console.WriteLine();
Console.WriteLine("*** Worksheet " + xlSheet.Name + " ***");
Excel.Range usedRange = xlSheet.UsedRange;
if ((null != usedRange) && (null != usedRange.Cells))
{
xlCells = usedRange.Cells;
for (int rowNumber = 1; rowNumber <= usedRange.Rows.Count; ++rowNumber)
{
if (1 == (rowNumber % 10)) { Console.WriteLine(); }
Console.Write(string.Format("Row {0,2}: ", rowNumber));
for (int colNumber = 1; colNumber <= usedRange.Columns.Count; ++colNumber)
{
dynamic cell = xlCells[rowNumber, colNumber];
if (null != cell)
{
string value = GetValue(cell);
string comment = GetComment(cell);
Console.Write(string.Format("{0}{1};",
string.IsNullOrWhiteSpace(value) ? "" : value,
string.IsNullOrWhiteSpace(comment) ? "" : " [" + comment + "]"));
}
else
{
Console.Write(";");
}
}
Console.WriteLine();
}
}
Marshal.FinalReleaseComObject(xlSheet);
Console.WriteLine();
}
}
xlWorkbook.Close(Type.Missing, Type.Missing, Type.Missing);
xlApp.Quit();
}
catch (Exception ex)
{
Console.WriteLine("Exception: " + ex.Message);
if (null != ex.InnerException)
{
Console.WriteLine("Inner Exception: " + ex.InnerException.Message);
}
}
finally
{
if (null != xlUsedRange) { Marshal.FinalReleaseComObject(xlUsedRange); xlUsedRange = null; }
if (null != xlCells) { Marshal.FinalReleaseComObject(xlCells); xlCells = null; }
if (null != xlWorkbook) { Marshal.FinalReleaseComObject(xlWorkbook); xlWorkbook = null; }
if (null != xlApp) { Marshal.FinalReleaseComObject(xlApp); xlApp = null; }
}
To extract the content from a cell, the method GetValue
is used.
private static string GetValue(dynamic cell)
{
string ret = string.Empty;
if (null == cell) { return ret; }
Excel.Range singleCell = cell as Excel.Range;
if (null == singleCell) { return ret; }
if (null != singleCell.Text)
{
ret = singleCell.Text as string;
}
if (null == ret) { return string.Empty; }
return ret.Replace("\n", " ");
}
The comment of the cell is extracted with the method GetComment
.
private static string GetComment(dynamic cell)
{
string ret = string.Empty;
if ((null == cell) || (null == cell.Comment)) { return ret; }
ret = cell.Comment.Text as string;
if (null == ret) { return string.Empty; }
return ret.Replace("\n", " ");
}
Access Excel 2003 File using the NPOI Library - Step by Step
An alternative access method is provided by the library NPOI. This library is provided on Microsoft CodePlex (http://npoi.codeplex.com/)
and developed on GitHub (https://github.com/tonyqus/npoi). The implementation is performed in C# only and provides therefore a pure
managed access to Excel files.
The NPOI library itself is ported from a Java project named Apache POI (Apache POI - the Java API for Microsoft Documents,
http://poi.apache.org/). Both libraries are mature and actively developed.
The following code snippets use the NPOI library to print the whole content of an Excel 2003 file. Initially, the standard stream
reader is used to read the XLS file. The usual standard flags can be applied to the stream reader as necessary. The stream is read from the
POIFSFileSystem
class. This class reads and writes Microsoft's OLE 2 Compound document format. This is the underlying base format of the
Excel file format. Finally, HSSFWorkbook
reflects the structure of the Excel file that is visible to the Excel user.
using (StreamReader input = new StreamReader(extractFile))
{
IWorkbook workbook = new HSSFWorkbook(new POIFSFileSystem(input.BaseStream));
if (null == workbook)
{
Console.WriteLine(string.Format("Excel Workbook '{0}' could not be opened.", extractFile));
return;
}
...
}
Next, the formulas of the Workbook are evaluated and an instance of a data formatter is created.
IFormulaEvaluator formulaEvaluator = new HSSFFormulaEvaluator(workbook);
DataFormatter dataFormatter = new HSSFDataFormatter(new CultureInfo("en-US"));
The individual sheets of the Workbook are accessible by number, name, or using an iterator. This is the iterator based example.
foreach (ISheet sheet in workbook)
{
Console.WriteLine("\n\n*** Worksheet " + sheet.SheetName + " ***");
...
}
Accessing the rows of a given sheet is done using an iterator, again. The row number is gathered from the IRow
type variable and printed to console output.
foreach (IRow row in sheet)
{
Console.Write(string.Format("Row {0,2}: ", row.RowNum + 1));
}
Now the content from each cell of the row is extracted, formatted, and written to the console. The same applies for the comments that are associated with the cell.
foreach (ICell cell in row)
{
string value = GetValue(cell, dataFormatter, formulaEvaluator);
string commentText = GetComment(cell);
Console.Write(string.Format("{0}{1};",
string.IsNullOrWhiteSpace(value) ? "" : value,
string.IsNullOrWhiteSpace(commentText) ? "" : " [" + commentText + "]"));
}
Access Excel 2003 File using the NPOI Library - Complete Example
This is the complete code snippet to compare it with the COM Interop snippet above.
try
{
using (StreamReader input = new StreamReader(extractFile))
{
IWorkbook workbook = new HSSFWorkbook(new POIFSFileSystem(input.BaseStream));
if (null == workbook)
{
Console.WriteLine(string.Format("Excel Workbook '{0}' could not be opened.", extractFile));
return;
}
IFormulaEvaluator formulaEvaluator = new HSSFFormulaEvaluator(workbook);
DataFormatter dataFormatter = new HSSFDataFormatter(new CultureInfo("en-US"));
Console.WriteLine();
foreach (ISheet sheet in workbook)
{
Console.WriteLine("\n\n*** Worksheet " + sheet.SheetName + " ***");
foreach (IRow row in sheet)
{
if (0 == row.RowNum % 10) { Console.WriteLine(); }
Console.Write(string.Format("Row {0,2}: ", row.RowNum + 1));
foreach (ICell cell in row)
{
string value = GetValue(cell, dataFormatter, formulaEvaluator);
string commentText = GetComment(cell);
Console.Write(string.Format("{0}{1};",
string.IsNullOrWhiteSpace(value) ? "" : value,
string.IsNullOrWhiteSpace(commentText) ? "" : " [" + commentText + "]"));
}
Console.WriteLine();
}
}
}
}
catch (Exception ex)
{
Console.WriteLine("Exception: " + ex.Message);
if (null != ex.InnerException)
{
Console.WriteLine("Inner Exception: " + ex.InnerException.Message);
}
}
For extracting the content from the cell, the method GetValue
is used, again. But this time, it is the NPOI variant:
private static string GetValue(ICell cell, DataFormatter dataFormatter,
IFormulaEvaluator formulaEvaluator)
{
string ret = string.Empty;
if (null == cell) { return ret; }
ret = dataFormatter.FormatCellValue(cell, formulaEvaluator);
return ret.Replace("\n", " ");
}
The comment of the cell is extracted with the method GetComment
.
private static string GetComment(ICell cell)
{
string ret = string.Empty;
if ((null == cell) || (null == cell.CellComment)) { return ret; }
IRichTextString str = cell.CellComment.String;
if (str != null && str.Length > 0)
{
ret = str.ToString();
}
return ret.Replace("\n", " ");
}
Here is a screenshot showing some of the extracted information.
Comparing NPOI and the Excel Object Library
Comparing the two snippets, it becomes obvious that NPOI provides a modern interface, e.g., it is possible to iterate the instances using
foreach
or the resources are automatically and safely freed. Additionally, NPOI provides a strongly type based interface that most C# developers are
expecting today. The library API is designed in a clean C# way. This is in contrast to the COM Interop where most parameter types are object
s.
To our surprise, the NPOI implementation is way faster than the COM alternative. The programs in the accompanying archive contain the
Stopwatch used for measurements. The test machine is a Core i7 860@2.8GHz, 3 GB RAM, Win7 32 bit. Executing both programs multiple
times, COM Interop requires around 2500 - 4200 ms to complete whereas the NPOI program requires only 350 - 410 ms. Even larger samples used in
our project still return nearly immediately. This is especially true when avoiding the lengthy console output. But try it with your own samples.
The documentation of the NPOI library methods are available at the same level provided by standard .NET classes including Intellisense help
and navigation to the documented method. The COM Interop DLL provides only the names of the methods and a parameter list
where the structured types are all of the meaningless base object
type. The NPOI library is available with full source code. The source code is
licensed under Apache License 2.0 and can be used in any project, even commercial and proprietary ones.
Opening the Excel file for extraction as a read only file becomes natural in NPOI because of the StreamReader
. COM Interop requires
the use of the Open
method with a long parameter list. When extracting information from a customer provided .xls file, it prints the content as
desired, but additionally a "Save changes" dialog pops up after each invocation when using COM interop. Opening the .xls file as a read
only file does not avoid this unwanted dialog. The reason seems to be an update of the formulas during the access of some cells. This is in
contrast to NPOI that does not change the .xls file.
Excel is always running in a single instance on the PC. Therefore, it must not be used in a server setting. This limitation does not exist for NPOI.
The download archive of this article provides a Visual Studio 2010 solution with three projects. Each of the projects extracts the whole information
of an Excel file. The first named ReadExcelUsingCom uses the COM Interop DLL to perform the task. This is the project where the above
example at the beginning of this article is extracted. The second one named ReadExcelUsingNPOI is the example using NPOI that is explained in
detail above. This program shows all the content from the Excel file. But the COM Interop program shows empty rows, too. Therefore
a third program was developed that produces exactly the same output as the COM example even with NPOI. This last program is contained in the project ReadExcelUsingNPOICloneCom
.
In addition to the three programs, the solution files have an up to date version of the NPOI DLL included. So these examples are expected to
compile and run just out of the box. Of course, Excel 2003 is missing and must be installed separately to get the COM Interop example working. To
repeat it, it is required to install Excel 2003, other versions are not supported.
The presented comparison of the NPOI library and the COM Interop of Excel 11.0 Object Library can be summarized as follows:
- NPOI provides a strongly type based interface.
- The standalone NPOI library does not depend on the installed version of Excel.
- NPOI is significantly faster even when tested with only three small sheets.
- NPOI does not change .xls file when reading it.
- Only the Excel 11.0 Object Library is supported by Microsoft.
- The Open Source NPOI library provides full source code.
History
- Initial article
- Updated download and article to NPOI 1.2.5 pre release as of 2012-02-21, svn r309
- Source code of the NPOI library moved to https://github.com/tonyqus/npoi. Fixed reference above