Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / C#

A Very Easy to Use Excel XML Import-Export Library

4.18/5 (86 votes)
25 Nov 2008CPOL13 min read 5   19.6K  
Import export library for the Excel XML format, which reduces a programmer's work to the bare minimum.
Latest versions [External links]

Introduction

Just like any other Excel XML export article/library writer, my quest started when I wanted to export a simple report to Excel. I knew that writing an Excel XML library is very easy, and I was sure I will find plenty of such libraries on the Internet. I did find many, but none of them were simple, yet intuitive, powerful, and implemented everything I needed. I could take ages to write a background component like an export library, but when using it, I wanted it to do my work fast and without fuss. The library should also be compatible with .NET 2.0/3.0/3.5.

So, I came with this library which is easy and fast to use, yet is very powerful to be considered to be one of the most powerful XML export libraries present. OK, I will count it as a shameless self promotion.

P.S.: The XML file format is only supported by Excel versions XP, 2003, and 2007. Previous versions, i.e., Excel 97 and Excel 2000 do not support this feature.

Features

There are a multitude of features which are present in the library. They are...

  • Full import and export of Excel XML files
  • Access cells from anywhere including ExcelXmlWorkbook, Worksheet, Row, and Cell
  • Full support for formulae and ranges
  • Imported files' formulae and ranges are parsed to Formula and Range instances
  • Multitude of Insert, Delete, and Add functions available for ExcelXmlWorkbook, Worksheet, Row, and Cell
  • Auto management of cell references in the book on any change made via any of the Insert, Delete, and Add functions
  • Full support for styles in Worksheet, Row, and Cell, and even Range
  • No instance declaration; all the dirty work is done by the library
  • Cell ContentType, which enables you to know what the cell actually contains
  • DataSet to ExcelXmlWorkbook conversion facility
  • Absolute and non-absolute ranges and Named ranges
  • Auto filter
  • Very fast as everything is done via XmlWriter
  • Ability to freeze rows or columns, or both
  • Almost all print options
  • Almost all style options
  • Many formatting options
  • Hidden columns and rows

Using the library

Using the code is very easy. This was the primary concern when I was building this library. The primary or top level class is ExcelXmlWorkbook which contains multiple Worksheets. The library resides in Yogesh.Extensions.ExcelXml. The following example shows the various ways of adding cells in a Workbook, right from creating a instance.

C#
// Create the instance
ExcelXmlWorkbook book = new ExcelXmlWorkbook();

// Many such properties exist. Details can be found in the documentation
book.Properties.Author = "Yogesh Jagota"; // The author of the document

// This returns the first worksheet.
// Note that we have not declared a instance of a new worksheet
// All the dirty work is done by the library.
Worksheet sheet = book[0];

// Name is the name of the sheet. If not set, the default name
// style is "sheet" + sheet number, like sheet1, sheet2
sheet.Name = "AgewiseOutstanding";

// More on this in documentation
sheet.FreezeTopRows = 3;

// and this too...
sheet.PrintOptions.Orientation = PageOrientation.Landscape;
sheet.PrintOptions.SetMargins(0.5, 0.4, 0.5, 0.4);

// This is the actual code which sets out the cell values
// Note again, that we don't declare any instance at all.
// All the work is done by the library.
// Index operator takes first value as column and second as row.
sheet[0, 0].Value = "Outstanding as on " + DateTime.Now;

sheet[0, 1].Value = "Name of Party";
sheet[1, 1].Value = "RSM";
sheet[2, 1].Value = "ASM";
sheet[3, 1].Value = "0-30";
sheet[4, 1].Value = "31-60";
sheet[5, 1].Value = "61-90";
sheet[6, 1].Value = "91+";

sheet[0, 2].Value = "M/s Stupid Paymaster";
sheet[1, 2].Value = "Mr. Nonsense";
sheet[2, 2].Value = "Mr. More Nonsense";
sheet[3, 2].Value = 0;
sheet[4, 2].Value = 5000;
sheet[5, 2].Value = 45000;
sheet[6, 2].Value = 0;

sheet[0, 3].Value = "M/s Good Paymaster";
sheet[1, 3].Value = "Mr. Good RSM";
sheet[2, 3].Value = "Mr. Good ASM";
sheet[3, 3].Value = 32000;
sheet[4, 3].Value = 0;
sheet[5, 3].Value = 0;
sheet[6, 3].Value = 0;
sheet[7, 3].Value = sheet[6, 3];

string outputFile = "Outstanding File.xml";
// no extension is added if not present

book.Export(outputFile);

Importing a file

To import a file, you can either supply a file name or a Stream object to the static ExcelXmlWorkbook.Import method which returns a ExcelXmlWorkbook instance loaded with the file. If any error occurs, the function simply returns null, and there is no way to find out what error occurred. So, the preferred way to import is to pass a Stream because it gives you more control with error management.

Exporting the file

All the code is written to disk only when the Export function is called. Export must be supplied with either a Stream or a file name. If any error occurs, the function simply returns false, and there is no way to find out what error occurred. So, the preferred way to export is to pass a Stream because it gives you more control with error management.

Assigning values to cells

Notice the last assignment in the previous example:

C#
sheet[7, 3].Value = sheet[6, 2];

Here, we are actually assigning a cell to a cell. What will be the value of the cell, you might wonder? The cell will not have a value at all. It will have a reference to the assigned cell, something like this when you will open the file in Excel: =G3. It won't be an absolute reference, more on that later. We can assign these values to a cell:

  1. string
  2. bool
  3. All integar types, i.e., byte, sbyte, int, uint, long, ulong, float, double, decimal.
  4. DateTime
  5. Cell
  6. Formula [more on this below]

Knowing the type of content a cell contains

Every Cell contains a ContentType readonly field which can be used to check what value type the cell contains. The available values are String, Number, Boolean, DateTime, Formula, UnresolvedValue.

Retrieving values from cells

A readonly property GetValue<T> returns the cell value converted to the type supplied. You can use ContentType with GetValue<T> to retrieve the exact value of a cell. GetValue<T> enables strict type checking when retrieving a cell value. Further, if the type supplied with GetValue<T> does not match the type of the cell type, default(T) is returned instead. For example, if a cell's ContentType == ContentType.Numeric, the only way to retrieve the value of the cell is to supply a byte, sbyte, int, uint, long, ulong, float, or double. If a cell's ContentType == ContentType.String, the only way to retrieve the value of the cell is to supply a string etc.

Various ways of accessing the cells

There is no hard coded way of accessing a particular cell. There are numerous ways of doing so. For example, the fourth column of the second row in the last example can be set to a value of 1 by...

  1. Directly using the ExcelXmlWorkbook class.
  2. C#
    book[0][3, 1].Value = 1
  3. Using the Worksheet class.
  4. C#
    Worksheet sheet = book[0];
    sheet[3, 1].Value = 1
  5. Using the Row class.
  6. C#
    Worksheet sheet = book[0];
    Row row = sheet[1];
    row[3].Value = 1
  7. Using the Cell class.
  8. C#
    Worksheet sheet = book[0];
    Row row = sheet[1];
    Cell cell = row[3];
    cell.Value = 1

Note that we do not need to declare a instance of a new worksheet, row, or cell. All the dirty work is done by the library. This style of coding opens many ways of accessing cells and rows.

Styles

All cells, rows, and worksheets have styles which can be set individually. These are Font, Alignment, Interior, Border, and DisplayFormat. More information can be found in the documentation about members of the style classes. Changing a worksheet style setting affects all cells in the worksheet. A row setting affects all child cells in the row, and a single cell setting affects, well, that very cell. Example:

C#
sheet[1, 3].Font.Bold = true;

All the functionality of a style is implemented in a class XmlStyle. You can create an instance to XmlStyle in your code and assign it to the Style property which is present in all the cells, rows, and worksheets. Example:

C#
XmlStyle style = new XmlStyle();
style.Font.Bold = true;
sheet[1, 3].Style = style;

Ranges

The main reason for writing my own implementation was ranges, which I found missing, or not having the powers which ranges should have. In this library, ranges are very powerful and extendible. Ranges have all the style elements found in cells, rows, and worksheets. Example:

C#
// This sets the text of cells 1-8 of row 3 to bold
Range range = new Range(sheet[0, 2], sheet[7, 2]);
range.Font.Bold = true;

Even this is valid code, although many might recommend doing it the first way...

C#
new Range(sheet[0, 2], sheet[7, 2]).Font.Bold = true;

Please note that ranges can not be assigned to a cell value. Assigning it will generate an empty cell. A range can contain a single cell or a range of cells. In the above example, we are providing the constructor with the first cell and the last cell. Ranges always contain rectangular ranges, just like in Excel.

Applying auto filter to ranges

To apply auto filter to a range, you only need to call the range's AutoFilter method and you are done. Example:

C#
new Range(sheet[0, 1], sheet[6, 3]).AutoFilter();

Absolute and non-absolute ranges

By default, all ranges output a non-absolute reference. To set up an absolute reference, just set the Absolute property of the range to true.

C#
Range range = new Range(sheet[0 ,2], sheet[7, 2]);
range.Font.Bold = true;
range.Absolute = true;

Functions

Now, we come to the real use of ranges and their Absolute property: adding functions. I think a function in my library can be easily understood by this example which uses the first example in this article.

C#
sheet[7, 3].Value = FormulaHelper.Formula("sum", new Range(sheet[3, 3], sheet[6, 3]));

or:

C#
sheet[7, 3].Value = new Formula().Add("sum").StartGroup().Add(
            new Range(sheet[3, 3], sheet[6, 3])).EndGroup();

When you will open this book in Excel, the value of the cell will be =SUM(D4:G4).

Function parameters

Here, we have added a single parameter in the formula constructor. You can add as many parameters as you want using the Add function of the Formula class. Only two types of parameters are allowed though, string or Range. The string parameter type can be used to add any value and named ranges also (read more about named ranges in the documentation). Example:

C#
Formula formula = new Formula().Add("sum").StartGroup();

formula.Add("D4").Operator(',');

// Here I am using the object initializers just to fit the code in one line
// The library is compatible with both VS2005 and VS2008
formula.Add(new Range(sheet[4, 3]) { Absolute = true } ).Operator(',');

formula.Add(new Range(sheet[5, 3], Range(sheet[6, 3])).EndGroup();

sheet[7, 3].Value = formula;

When you will open this book in Excel, the value of the cell will be =SUM(D4, $E$4, F4:G4).

Filtering cells as parameters by checking the cell value or style

You can filter all cells and auto add them to the parameter list of a formula by passing a parameter, i.e., a delegate which accepts Cell as its value and returns bool to both the Formula constructor or Add. All the value accessors (i.e., Value, IntValue, etc.) and cell styles can be checked. Examples:

  • Example 1:
  • C#
    // Lets assume column 1,2,3,6 and 7 are bold...
    XmlStyle style = new XmlStyle();
    style.Font.Bold = true;
    
    // VS2008 style
    sheet[7, 3].Value = FormulaHelper.Formula("sum", 
                        new Range(sheet[0, 3], sheet[6, 3]), 
                        cell => cell.Style == style);
    
    // or VS2005 style
    sheet[7, 3].Value = FormulaHelper.Formula("sum", 
                        new Range(sheet[0, 3], sheet[6, 3]), 
                        delegate (Cell cell) { return cell.Style == style; } );
  • Example 2:
  • C#
    sheet[7, 3].Value = FormulaHelper.Formula("sum", 
                        new Range(sheet[0, 3], sheet[6, 3]), 
                        cell => cell.GetValue<int>() > 10000 && 
                        cell.GetValue<int>() <= 50000);

In the first example of style, the value of the cell will be =SUM(A4:C4, F4:G4). Continuous range matching to true will be joined as one parameter, i.e., A4:C4, and not three parameters, i.e., A4,B4,C4.

Modifying imported Excel XML files

Imported Excel XML files can be modified directly via direct assignment just like new files. Further to this, there are many functions which allow insertion, deletion, and addition of...

  • One or multiple worksheets in books, e.g., InsertSheetBefore, InsertSheetAfter.
  • One or multiple rows and columns in sheets, e.g., InsertColumnAfter, InsertColumnsAfter, InsertRowBefore, InsertRowsBefore.
  • One or multiple cells in rows e.g., InsertCellBefore, InsertCellsBefore.

See the documentation for more on these functions.

Exporting a DataSet to an ExcelXmlWorksheet

A static member in ExcelXmlWorksheet, DataSetToWorkbook, is provided which converts a DataSet and returns a Worksheet reference. All the tables are converted into different sheets of the Workbook.

Usage

C#
ExcelXmlWorksheet sheet = ExcelXmlWorksheet.DataSetToWorkbook(sourceDataSet)

Cell collection

Cell collection is a strongly typed List collection with full support for LINQ. You can use the Add method to add a Worksheet, Range, Row, or Cell. You can add all cells, or you can filter the cells using a predicate.

Memory

Looking at all this code might make you think that all the cells, rows, Worksheets, ranges must be using too much memory. They must also be having their own separate copy of styles which will cause extra overhead. The answer is no.

I have optimized the library to use as little memory as possible. As far as the styles go, if you have a 100,000 cell Workbook written programmatically, which contains only 10 individual styles, the number of styles in memory will only be 11, i.e., 10 separate styles + 1 default style. Although the styles are added on a book level, if you have 10 books with the 10 same styles present in all of them, the number of style instances active in the program will be 110.

Conclusion

I will love to hear your comments and suggestions. Any bugs can be reported here.

Updates

Code breaking changes in v3.29

The SetHeaderFooterMargin method in PrintOptions has been removed. Use the HeaderMargin and FooterMargin properties instead.

Code breaking changes in v3.06

The formula system does not work the way it used to, so the previous code might break. For backwards compatibility, I have included a static class FormulaHelper which can be used with the previous code. Just replace the previous code in the following way:

C#
cell.Value = new Formula("Sum", new Range(sheet[3, 3], sheet[6, 3]));

with:

C#
cell.Value = FormulaHelper.Formula("Sum", new Range(sheet[3, 3], sheet[6, 3]));

History

  • [26 Nov 2008] Revision 3.35
    • Fixed an error with the GUID type storage.
    • Fixed an error with the format generation of ContentType.Time.
  • [19 Nov 2008] Revision 3.33
    • Added GUID cell content type.
    • Now GetValue<string> converts almost everything to string type including the newly added GUID cell content type.
    • Fixed SetHeader and SetFooter where empty strings caused issues.
    • Fixed error with the new IsRawContent property.
  • [12 Nov 2008] Revision 3.29
    • Added support for tab and comma delimited export.
    • Fixed error with Workbook's Add(string sheetName) method, which added two sheets instead of one.
    • Fixed error in Worksheet's ImportTable method where column index was using an one based index.
    • Fixed error where merged cells spanned across multiple lines causing a Table error in Excel.
    • Fixed error where DateTime values were saved in 12 hour format, but did not took AM/PM into account and always saved as AM.
    • Converted the HeaderMargin and FooterMargin fields to properties, and removed the SetHeaderFooterMargin method.
    • Added the Header and Footer string properties.
    • Added multipart header and footer string helpers, i.e., SetHeader and SetFooter.
    • Added support for print gridlines (EnableGridlines property).
    • Reset many internal values to private and cleaned up the project.
    • Added currency display format.
    • Fixed error where GetValue was not retrieving a decimal value.
    • Set the default display format of dates to GeneralDate.
  • [23 Jul 2008] Revision 3.06
    • New formula system implemented.
    • RowSpan and ColumnSpan properties added to cell.
    • Small error in named range rename fixed.
    • Freeze column won't work if freeze row is set. Fixed.
    • Style was not saved in Column export. Fixed.
    • Assembly file contained wrong information about this library. Fixed.
    • Workbook export can throw a NullReferenceException when new XmlWriter is created. Fixed.
    • Documented the IStyle interfaces.
  • [10 Jun 2008] Revision 2.89
    • Added Print Area support.
    • Added the TabColor property to Worksheet.
    • Added reference (HRef) support to cell.
    • Added support for custom display formats.
    • Added support for cell patterns.
    • Fixed error where "Long Date", "Short Date", "Time", and "@" as DisplayFormat caused an exception.
    • Fixed number format issue where some international formats were not saved properly.
  • [03 Apr 2008] Revision 2.82
    • Fixed single cell merge bug.
  • [19 Mar 2008] Revision 2.81
    • Fixed multi-Worksheet import bug.
  • [10 Mar 2008] Revision 2.80
    • Added decimal support.
    • Fixed an error where assigning 0 to a cell still caused an exception.
  • [06 Mar 2008] Revision 2.79
    • Added cell merge/unmerge support.
    • Added GetEnumerator support for sheets, rows, and ranges.
    • Added the CellCollection class.
    • Fixed an error where numeric output of the cell contained global number format where it should only be US only format. Thanks to Reinhard.
    • Added six new display format types and removed the Custom format type.
    • Added Index property to cell which also has a ExcelColumnIndex property which returns columns in Excel format, e.g., A, AA, AC, FA.
  • [28 Feb 2008] Revision 2.45
    • Fixed an error where GetValue<T> was not accepting string and DateTime types. Thanks to Karl for pointing this out.
  • [21 Feb 2008] Revision 2.44
    • Fixed a error where assigning 0 to a cell caused a exception. Thanks to Ralf for pointing this out.
  • [20 Feb 2008] Revision 2.43
  • [19 Jan 2008] Revision 1.30
    • Added complete documentation.
    • Added DataSet export capability.
    • Added range auto filter.
    • Added print row and column headers
    • Some name changes to remove warnings reported by FxCop. (ExcelXmlWorkBook to ExcelXmlWorkbook, WorkSheet to Worksheet, and CellCompareDelegate to CellCompare.
    • CellCompare is now in Yogesh.Extensions.ExcelXml and not in Yogesh.Extensions.ExcelXml.Formula.
    • Added cell comments.
  • [16 Jan 2008] Revision 1.0
    • First release on The Code Project.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)