Introduction
The purpose of this article is to describe how to create an Excel workbook using solely DocumentFormat.OpenXml.dll (namespace is DocumentFormat.OpenXml
).
In order to test the samples you have to download and install the Open XML SDK 2.0 from Download Center.
The demo is created for both C# and Visual Basic.
When creating an Excel file with Open XML SDK, there's no need to install Microsoft Office, so this library can be used without Excel installation. However, the demo project starts the xlsx document after it hsa been created so in order to view the file created, a program that can open xlsx files should be installed to view the file.
Little bit of explanation about Open XML
Open XML is an open and standardized format for Office files. The standards used are:
These standards define the structure and the elements for the Office files. The Office files (like xlsx for Excel) themselves are zipped files that contain a specific directory and file structure. The files that hold the content of a spreadsheet are xml files like any other xml files.
In case of Excel files a basic xlsx file contains for example following files:
/[Content_Types].xml
: Defines parts and extensions for the spreadsheet /xl/workbook.xml
: For e xample sheets that are included in the workbook /xl/styles.xml
: Styles used in the worksheets /xl/sharedStrings.xml
: Strings that are shared among cells /xl/worksheets/sheet1.xml...
: The actual worksheets
The actual package contains more files but in the scope of this article these are the most interesting ones. The demo projects included show few operations that are done to produce and modify these files.
About the project
The project itself is very simple. It consists of two classes: MainWindow
class and a static Excel
Class. The Excel class is responsible of all the operations done against the Excel spreadsheet. It's kinda utility class, but note that it's nowhere near ready. It's supposed to be used as a learning tool or a seed to an actual implementation.
When writing this demo I found out that Excel is very picky on the XML files. One surprise was that the order of the elements in XML files is very important. For example elements in style sheet such as fonts, fills, borders, cellStyleXfs, cellXfs etc must be in specific order. Otherwise the document is interpreted as corrupted.
Another observation was that the indexes of the elements are quite often used (for example the index of a shared string). However there is no support in the library to fetch the indexes so the collections have to be looped in order to calculate the index of a desired element.
So one of the best tools when building this was a utility to extract data from the xlsx (=zip) file to see what is the actual content.
If you download the project, you'll notice that fully qualified class names are used. In this article I have removed the namespaces before the classes in order to enhance readability. So if you are copying code from the code snippets, don't forget to define using
(in C#) or Imports
(in VB) for the two namespaces:
- DocumentFormat.OpenXml.Packaging
- DocumentFormat.OpenXml.Spreadsheet
To create the spreadsheet
Now to the code. The first thing is to create the spreadsheet. This is actually the xlsx file. The spreadsheet is created in C# as follows
public static SpreadsheetDocument CreateWorkbook(string fileName) {
SpreadsheetDocument spreadSheet = null;
SharedStringTablePart sharedStringTablePart;
WorkbookStylesPart workbookStylesPart;
try {
spreadSheet = SpreadsheetDocument.Create(fileName, SpreadsheetDocumentType.Workbook, false);
spreadSheet.AddWorkbookPart();
spreadSheet.WorkbookPart.Workbook = new Workbook();
spreadSheet.WorkbookPart.Workbook.Save();
sharedStringTablePart = spreadSheet.WorkbookPart.AddNewPart<SharedStringTablePart>();
sharedStringTablePart.SharedStringTable = new SharedStringTable();
sharedStringTablePart.SharedStringTable.Save();
spreadSheet.WorkbookPart.Workbook.Sheets = new Sheets();
spreadSheet.WorkbookPart.Workbook.Save();
workbookStylesPart = spreadSheet.WorkbookPart.AddNewPart<WorkbookStylesPart>();
workbookStylesPart.Stylesheet = new Stylesheet();
workbookStylesPart.Stylesheet.Save();
} catch (System.Exception exception) {
System.Windows.MessageBox.Show(exception.Message, "Excel OpenXML basics", System.Windows.MessageBoxButton.OK, System.Windows.MessageBoxImage.Hand);
}
return spreadSheet;
}
And in Visual Basic
Public Shared Function CreateWorkbook(fileName As String) As SpreadsheetDocument
Dim spreadSheet As SpreadsheetDocument = Nothing
Dim sharedStringTablePart As SharedStringTablePart
Dim workbookStylesPart As WorkbookStylesPart
Try
spreadSheet = SpreadsheetDocument.Create(fileName, SpreadsheetDocumentType.Workbook, False)
spreadSheet.AddWorkbookPart()
spreadSheet.WorkbookPart.Workbook = New Workbook()
spreadSheet.WorkbookPart.Workbook.Save()
sharedStringTablePart = spreadSheet.WorkbookPart.AddNewPart(Of SharedStringTablePart)()
sharedStringTablePart.SharedStringTable = New SharedStringTable()
sharedStringTablePart.SharedStringTable.Save()
spreadSheet.WorkbookPart.Workbook.Sheets = New DocumentFormat.OpenXml.Spreadsheet.Sheets()
spreadSheet.WorkbookPart.Workbook.Save()
workbookStylesPart = spreadSheet.WorkbookPart.AddNewPart(Of WorkbookStylesPart)()
workbookStylesPart.Stylesheet = New Stylesheet()
workbookStylesPart.Stylesheet.Save()
Catch exception As System.Exception
System.Windows.MessageBox.Show(exception.Message, "Excel OpenXML basics", System.Windows.MessageBoxButton.OK, System.Windows.MessageBoxImage.Hand)
End Try
Return spreadSheet
End Function
After the spreadsheet document is created, four elemental parts common to all worksheets are added:
- Workbook: Contains definition of the workbook itself
- SharedStringTable: A table of strings that are shared among cells
- Sheets collection: Collection of sheets in the workbook
- Stylesheet: Style definitions for the cells. Such as borders, fonts etc.
When creating each of these, first a part is created and after that the object itself. Note that parts are found in
DocumentFormat.OpenXml.Packaging
namespace while Excel objects are in
DocumentFormat.OpenXml.Spreadsheet
namespace. So the parts describe the 'storage' while the objects from the Spreadsheet namespace describe the content elements in the xml files.
Creating the styles
Creating using the code
One way to create the style definitions is to create them from the code. In the demo, the following basic styles are created:
- Numbering format (in
x:numFmts
): This is used to properly format a currency value - Font (in
x:fonts
): Calibri with font size 11 is used - Fill (in
x:fills
): A fill with no pattern is defined - Border (in
x:borders
): A border definition with no borders is defined - Cell style format (in
x:CellStyleXfs
): A general cell style format definition is used - Cell formats (in
x:CellXfs
) - General text
- Date: NumberFormatId 22
- Currency: NumberFormatId 164, references the numbering format
- Percentage: NumberFormatId 10
The funny thing is that many of the numbering formats are predefined. So in order to use a correct formatting one has to discover the id's. One way to do this is to find the information by creating an Excel file containing the desired format and to examine the contents of the xlsx file.
Another thing is that, as mentioned before, the order of the style sheet elements is critical. If the order isn't correct, Excel will very easily interpret the document as corrupted.
An excerpt from the style creation is the creation of CellStyleFormats
element and a CellFormat
element inside the formats:
Stylesheet stylesheet = spreadsheet.WorkbookPart.WorkbookStylesPart.Stylesheet;
stylesheet.InsertAt<CellFormats>(new CellFormats(), 5);
stylesheet.GetFirstChild<CellFormats>().InsertAt<CellFormat>(
new CellFormat() {
FormatId = 0,
NumberFormatId = 0
}, 0);
And in Visual Basic
Dim stylesheet As Stylesheet = spreadsheet.WorkbookPart.WorkbookStylesPart.Stylesheet
stylesheet.InsertAt(Of CellFormats)(New CellFormats(), 5)
stylesheet.GetFirstChild(Of CellFormats)().InsertAt(Of CellFormat)(
New CellFormat() With {
.FormatId = 0,
.NumberFormatId = 0
}, 0)
Using a predefined stylesheet
Another way to define the styles is to use an existing XML file that holds the style definition. In the project theres a PredefinedStyles.xml file included in the build output. The contents of this file are read into a string which is then added to the empty stylesheet created previously.
The style definitions look as following (it's actually the same that is also created using the code):
<x:numFmts>
<x:numFmt numFmtId="164" formatCode="#,##0.00\ "€"" />
</x:numFmts>
<x:fonts>
<x:font>
<x:sz val="11" />
<x:name val="Calibri" />
</x:font>
</x:fonts>
<x:fills>
<x:fill>
<x:patternFill patternType="none" />
</x:fill>
</x:fills>
<x:borders>
<x:border>
<x:left />
<x:right />
<x:top />
<x:bottom />
<x:diagonal />
</x:border>
</x:borders>
<x:cellStyleXfs>
<x:xf numFmtId="0" fontId="0" fillId="0" borderId="0" />
</x:cellStyleXfs>
<x:cellXfs>
<x:xf numFmtId="0" xfId="0" />
<x:xf numFmtId="22" fontId="0" fillId="0" borderId="0" xfId="0" applyNumberFormat="1" />
<x:xf numFmtId="164" fontId="0" fillId="0" borderId="0" xfId="0" applyNumberFormat="1" />
<x:xf numFmtId="10" fontId="0" fillId="0" borderId="0" xfId="0" applyNumberFormat="1" />
</x:cellXfs>
And it's added using the following method in C#
public static bool AddPredefinedStyles(SpreadsheetDocument spreadsheet, string xml) {
spreadsheet.WorkbookPart.WorkbookStylesPart.Stylesheet.InnerXml = xml;
spreadsheet.WorkbookPart.WorkbookStylesPart.Stylesheet.Save();
return true;
}
And the function in Visual Basic
Public Shared Function AddPredefinedStyles(spreadsheet As SpreadsheetDocument, xml As String) As Boolean
spreadsheet.WorkbookPart.WorkbookStylesPart.Stylesheet.InnerXml = xml
spreadsheet.WorkbookPart.WorkbookStylesPart.Stylesheet.Save()
Return True
End Function
As a matter of fact basically any element can be filled by just adding an XML into it. For example a worksheet can be filled by adding a proper XML as the inner text of the worksheet.
Adding a worksheet
Next step is to add a worksheet. Adding a worksheet is straight-forward. However, one gotcha is to remember to define the relation to the workbook. Otherwise the sheet isn't included when the workbook is opened.
The creation of the workbook looks as following in C#
public static bool AddWorksheet(SpreadsheetDocument spreadsheet, string name) {
Sheets sheets = spreadsheet.WorkbookPart.Workbook.GetFirstChild<Sheets>();
Sheet sheet;
WorksheetPart worksheetPart;
worksheetPart = spreadsheet.WorkbookPart.AddNewPart<WorksheetPart>();
worksheetPart.Worksheet = new Worksheet(new SheetData());
worksheetPart.Worksheet.Save();
sheet = new DocumentFormat.OpenXml.Spreadsheet.Sheet() {
Id = spreadsheet.WorkbookPart.GetIdOfPart(worksheetPart),
SheetId = (uint)(spreadsheet.WorkbookPart.Workbook.Sheets.Count() + 1),
Name = name
};
sheets.Append(sheet);
spreadsheet.WorkbookPart.Workbook.Save();
return true;
}
And the Visual Basic equivalent function
Public Shared Function AddWorksheet(spreadsheet As SpreadsheetDocument, name As String) As Boolean
Dim sheets As Sheets = spreadsheet.WorkbookPart.Workbook.GetFirstChild(Of Sheets)()
Dim sheet As Sheet
Dim worksheetPart As WorksheetPart
worksheetPart = spreadsheet.WorkbookPart.AddNewPart(Of WorksheetPart)()
worksheetPart.Worksheet = New Worksheet(New SheetData())
worksheetPart.Worksheet.Save()
sheet = New Sheet With {
.Id = spreadsheet.WorkbookPart.GetIdOfPart(worksheetPart),
.SheetId = (spreadsheet.WorkbookPart.Workbook.Sheets.Count() + 1),
.Name = name}
sheets.Append(sheet)
spreadsheet.WorkbookPart.Workbook.Save()
Return True
End Function
Another thing is to remember to number the sheets correctly. Because of this the amount of sheets is queried when defining the SheetId
.
One more common thing, the shared strings
A workbook contains a table of shared strings. The idea is that the worksheets can reference a shared string without actually storing the string value inside the worksheet.
As this sounds a good idea and is the proposed way, the effect on the file size isn't very huge. When I tested this with 10,000 repetitions with a single (short) string the difference between using a shared string compared to repeating the string inside the worksheet was only few hundred bytes. Largely this is because of the compression algorithm compresses the strings quite nicely even if they are repeated. Of course the structure of the workbook and the location of the strings affect this quite much.
In order to add a shared string to the table I use the following method in C#
public static bool AddSharedString(SpreadsheetDocument spreadsheet, string stringItem, bool save = true) {
SharedStringTable sharedStringTable = spreadsheet.WorkbookPart.SharedStringTablePart.SharedStringTable;
if (0 == sharedStringTable.Where(item => item.InnerText == stringItem).Count()) {
sharedStringTable.AppendChild(
new DocumentFormat.OpenXml.Spreadsheet.SharedStringItem(
new DocumentFormat.OpenXml.Spreadsheet.Text(stringItem)));
if (save) {
sharedStringTable.Save();
}
}
return true;
}
And in Visual Basic
Public Shared Function AddSharedString(spreadsheet As SpreadsheetDocument, stringItem As String, Optional save As Boolean = True) As Boolean
Dim sharedStringTable As SharedStringTable = spreadsheet.WorkbookPart.SharedStringTablePart.SharedStringTable
Dim stringQuery = (From item In sharedStringTable
Where item.InnerText = stringItem
Select item).Count()
If 0 = stringQuery Then
sharedStringTable.AppendChild(
New DocumentFormat.OpenXml.Spreadsheet.SharedStringItem(
New DocumentFormat.OpenXml.Spreadsheet.Text(stringItem)))
If save Then
sharedStringTable.Save()
End If
End If
Return True
End Function
First, the existence of the string is checked and if it doesn't exist, it's added to the table.
In many methods, I have an optional save
parameter. This is because saving the changes is quite slow so when the data is added for example in a loop, the saving is deferred.
Now to the data
The demo adds the following data to the worksheet
- A shared string, 3 times
- A decimal number
- An integer number
- A currency value
- A date
- A percentage
- And a boolean value
All the data is added through specific methods but they all use a common method to actually store the value inside the worksheet.
I wont list adding the cell value since it's a bit bigger method but it contains the following operations:
- Check the existence of the defined row. If it isn't found a row is created
- Check the existence of the defined cell (based on the column index). If it isn't found a cell is created
- Check if the column is defined in the columns collection. Added if necessary. Actually this isn't mandatory but by defining the columns, their width can be set.
- And at last the value is added to the cell
Few important things:
- Again, take care about the order. Rows and cells must be in correct order in the xml
- The date data type isn't used. This is because Office 2007 doesn't support date type.
- The references in cells are defined as A1, B3 etc. This is quite cumbersome way from the code point of view. This is why I use indexes for both columns and rows.
The column string is constructed as follows
public static string ColumnNameFromIndex(uint columnIndex) {
uint remainder;
string columnName = "";
while (columnIndex > 0) {
remainder = (columnIndex - 1) % 26;
columnName = System.Convert.ToChar(65 + remainder).ToString() + columnName;
columnIndex = (uint)((columnIndex - remainder) / 26);
}
return columnName;
}
And in VB.Net
Public Shared Function ColumnNameFromIndex(columnIndex As UInt32) As String
Dim remainder As UInt32
Dim columnName As String = ""
While (columnIndex > 0)
remainder = (columnIndex - 1) Mod 26
columnName = System.Convert.ToChar(65 + remainder).ToString() + columnName
columnIndex = ((columnIndex - remainder) / 26)
End While
Return columnName
End Function
Adding a string or a shared string
As explained before a shared string uses just an index in the worksheet to point to a string in the shared strings table. What was quite amazing was that I didn't find a mechanism to get the index of a string from the table directly. Instead I had to build a loop to calculate the index.
public static int IndexOfSharedString(SpreadsheetDocument spreadsheet, string stringItem) {
SharedStringTable sharedStringTable = spreadsheet.WorkbookPart.SharedStringTablePart.SharedStringTable;
bool found = false;
int index = 0;
foreach (SharedStringItem sharedString in sharedStringTable.Elements<SharedStringItem>()) {
if (sharedString.InnerText == stringItem) {
found = true;
break; ;
}
index++;
}
return found ? index : -1;
}
And in Visual Basic
Public Shared Function IndexOfSharedString(spreadsheet As SpreadsheetDocument, stringItem As String) As Int32
Dim sharedStringTable As SharedStringTable = spreadsheet.WorkbookPart.SharedStringTablePart.SharedStringTable
Dim found As Boolean = False
Dim index As Int32 = 0
For Each sharedString As SharedStringItem In sharedStringTable.Elements(Of SharedStringItem)()
If sharedString.InnerText = stringItem Then
found = True
Exit For
End If
index = index + 1
Next
If found Then
Return index
Else
Return -1
End If
End Function
The method for adding the string is quite simple. It gives the option of adding a shared string or a normal string.
public static bool SetCellValue(SpreadsheetDocument spreadsheet, Worksheet worksheet, uint columnIndex, uint rowIndex, string stringValue, bool useSharedString, bool save = true) {
string columnValue = stringValue;
CellValues cellValueType;
if (useSharedString) {
if (Excel.IndexOfSharedString(spreadsheet, stringValue) == -1) {
Excel.AddSharedString(spreadsheet, stringValue, true);
}
columnValue = Excel.IndexOfSharedString(spreadsheet, stringValue).ToString();
cellValueType = CellValues.SharedString;
} else {
cellValueType = CellValues.String;
}
return SetCellValue(spreadsheet, worksheet, columnIndex, rowIndex, cellValueType, columnValue, null, save);
}
And in VB.Net
Public Shared Function SetStringCellValue(spreadsheet As SpreadsheetDocument, worksheet As Worksheet, columnIndex As UInt32, rowIndex As UInt32, stringValue As String, useSharedString As Boolean, Optional save As Boolean = True) As Boolean
Dim columnValue As String = stringValue
Dim cellValueType As CellValues
If (useSharedString) Then
If (Excel.IndexOfSharedString(spreadsheet, stringValue) = -1) Then
Excel.AddSharedString(spreadsheet, stringValue, True)
End If
columnValue = Excel.IndexOfSharedString(spreadsheet, stringValue).ToString()
cellValueType = CellValues.SharedString
Else
cellValueType = CellValues.String
End If
Return SetCellValue(spreadsheet, worksheet, columnIndex, rowIndex, cellValueType, columnValue, Nothing, save)
End Function
Adding a numeric value
Adding a numeric value is much like adding a non-shared string value. The only concern is to use proper decimal separator (.
) inside the xml so the decimal separator may need to be replaced. Another way is to set the current threads culture to en-US. If you compile the project with EN_US_CULTURE
compilation symbol (default in the project) the threads UI
culture will be set to local culture, but the CurrentCulture
of the thread will be replaced with en-US. This eliminates the need to reformat decimals. However, keep in mind that if you are getting for example regional settings like CurrencySymbol
, it must be fetched from CurrentUICulture
.
The code looks like
public static bool SetCellValue(SpreadsheetDocument spreadsheet, Worksheet worksheet, uint columnIndex, uint rowIndex, double doubleValue, uint? styleIndex, bool save = true) {
#if EN_US_CULTURE
string columnValue = doubleValue.ToString();
#else
string columnValue = doubleValue.ToString().Replace(System.Globalization.CultureInfo.CurrentUICulture.NumberFormat.NumberDecimalSeparator, ".");
#endif
return SetCellValue(spreadsheet, worksheet, columnIndex, rowIndex, CellValues.Number, columnValue, styleIndex, save);
}
The Visual Basic version is
Public Shared Function SetDoubleCellValue(spreadsheet As SpreadsheetDocument, worksheet As Worksheet, columnIndex As UInt32, rowIndex As UInt32, doubleValue As Double, styleIndex As UInt32?, Optional save As Boolean = True) As Boolean
#If EN_US_CULTURE Then
Dim columnValue As String = doubleValue.ToString()
#Else
Dim columnValue As String = doubleValue.ToString().Replace(System.Globalization.CultureInfo.CurrentUICulture.NumberFormat.NumberDecimalSeparator, ".")
#End If
Return SetCellValue(spreadsheet, worksheet, columnIndex, rowIndex, CellValues.Number, columnValue, styleIndex, save)
End Function<span style="font-size: 12px; white-space: pre;">
</span>
Adding a date value
Date value is a bit more tricky. Things to keep in mind with dates:
- A style needs to be specified. I've used a predefined format id 22 to format the date correctly
- In the style definition ApplyNumberFormat has to be true.
- The date is expressed as a double value
- Excel starts to count the dates from 1.1.1900. This is different from .Net so OLE automation date has to be used
- The decimal separator may need to be corrected (depending on the culture settings). For example my decimal separator is , so it has to be replaced with . Or as explained earlier with numbers, en-US culture may be used.
The addition of a date value looks as following
public static bool SetCellValue(SpreadsheetDocument spreadsheet, Worksheet worksheet, uint columnIndex, uint rowIndex, System.DateTime datetimeValue, uint? styleIndex, bool save = true) {
#if EN_US_CULTURE
string columnValue = datetimeValue.ToOADate().ToString();
#else
string columnValue = datetimeValue.ToOADate().ToString().Replace(System.Globalization.CultureInfo.CurrentUICulture.NumberFormat.NumberDecimalSeparator, ".");
#endif
return SetCellValue(spreadsheet, worksheet, columnIndex, rowIndex, CellValues.Date, columnValue, styleIndex, save);
}
And the Visual Basic equivalent
Public Shared Function SetDateCellValue(spreadsheet As SpreadsheetDocument, worksheet As Worksheet, columnIndex As UInt32, rowIndex As UInt32, datetimeValue As System.DateTime, styleIndex As UInt32?, Optional save As Boolean = True) As Boolean
#If EN_US_CULTURE Then
Dim columnValue As String = datetimeValue.ToOADate().ToString()
#Else
Dim columnValue As String = datetimeValue.ToOADate().ToString().Replace(System.Globalization.CultureInfo.CurrentUICulture.NumberFormat.NumberDecimalSeparator, ".")
#End If
Return SetCellValue(spreadsheet, worksheet, columnIndex, rowIndex, CellValues.Date, columnValue, styleIndex, save)
End Function
Currency, percentage and boolean values
The next methods for adding different value types look very much the same as the previous so I won't include them in the article. However, few things to keep in mind:
- Percentage is like a decimal number but with different format. I've used format id 10.
- Currency is much like a percentage, but this time also a
numFmt
needs to be defined. - Excel uses 0 and 1 for boolean values. If
True
or False
is used they are interpreted as text. In Visual Basic the value of true (-1) needs to be changed to 1.
Last words
Hopefully this article helps to work with Open XML Excel files. The best way to get to know the code is to debug it and make small changes to see what happens :) Have fun!
History
- 22th April, 2012: Created
- 25th April, 2012: Readability modification, added alternative way to format numbers when writing to xlsx