Introduction
The C# code snippet demonstrates how to use the open-source NPOI library to read cell values in Microsoft Excel files.
References: NPOI - https://npoi.codeplex.com/
Background
Excel cell can contain values of different types. Also, values may be evaluated from formulae. We need to cater to these cases when reading the cell values.
Using the Code
I write an abstract
class ExcelFileReader
providing basic functions to read values from an Excel file.
public abstract class ExcelFileReader
{
protected IWorkbook workbook;
protected DataFormatter dataFormatter;
protected IFormulaEvaluator formulaEvaluator;
protected void InitializeMembers(Stream excelFileStream)
{
this.workbook = WorkbookFactory.Create(excelFileStream);
if (this.workbook != null)
{
this.dataFormatter = new DataFormatter(CultureInfo.InvariantCulture);
this.formulaEvaluator = WorkbookFactory.CreateFormulaEvaluator(this.workbook);
}
}
protected string GetFormattedValue(ICell cell)
{
string returnValue = string.Empty;
if (cell != null)
{
try
{
returnValue = this.dataFormatter.FormatCellValue(cell, this.formulaEvaluator);
}
catch
{
if (cell.CellType == CellType.FORMULA)
{
switch (cell.CachedFormulaResultType)
{
case CellType.STRING:
returnValue = cell.StringCellValue;
cell.SetCellValue(cell.StringCellValue);
break;
case CellType.NUMERIC:
returnValue = dataFormatter.FormatRawCellContents
(cell.NumericCellValue, 0, cell.CellStyle.GetDataFormatString())
cell.SetCellValue(cell.NumericCellValue);
break;
case CellType.BOOLEAN:
returnValue = cell.BooleanCellValue.ToString();
cell.SetCellValue(cell.BooleanCellValue);
break;
default:
break;
}
}
}
}
return (returnValue ?? string.Empty).Trim();
}
protected string GetUnformattedValue(ICell cell)
{
string returnValue = string.Empty;
if (cell != null)
{
try
{
returnValue = (cell.CellType == CellType.NUMERIC ||
(cell.CellType == CellType.FORMULA &&
cell.CachedFormulaResultType == CellType.NUMERIC)) ?
formulaEvaluator.EvaluateInCell(cell).NumericCellValue.ToString() :
this.dataFormatter.FormatCellValue(cell, this.formulaEvaluator);
}
catch
{
if (cell.CellType == CellType.FORMULA)
{
switch (cell.CachedFormulaResultType)
{
case CellType.STRING:
returnValue = cell.StringCellValue;
cell.SetCellValue(cell.StringCellValue);
break;
case CellType.NUMERIC:
returnValue = cell.NumericCellValue.ToString();
cell.SetCellValue(cell.NumericCellValue);
break;
case CellType.BOOLEAN:
returnValue = cell.BooleanCellValue.ToString();
cell.SetCellValue(cell.BooleanCellValue);
break;
default:
break;
}
}
}
}
return (returnValue ?? string.Empty).Trim();
}
}
You have to implement this abstract
ExcelFileReader
, which gets the ISheet
object from this.workbook and manipulate its IRow
object to get the cell values.
Points of Interest
The try
-catch
block in the function GetFormattedValue()
handles the failure in evaluating the cell formula (says with external reference) by taking the cached value (if any) instead. The cell value is then set to the cached value by calling the SetCellValue()
function for any formulae in other cells which reference to this cell. For numeric value, formatted value will be obtained by calling DataFormatter.FormatRawCellContents()
function.
Similarily, the function GetUnformattedValue()
gets cell value as string
without any format applied. It identities numeric cell, which format may be applied, and gets the unformatted value from the ICell.NumericCellValue
property.
The CellType enum
does not have any item like DATE
for date/time value, which are stored as number in Excel. So, there is no direct way to identity the cell is storing a date/time value. I think one way is to see its format by refering to ICell.CellStyle.DataFormat
property or ICell.CellStyle.GetDataFormatString()
function.
History
- 6th November, 2015 - First publication