Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / productivity / Office / MS-Excel

Leveraging the Benefits of Open XML

4.75/5 (3 votes)
30 Jun 2013CPOL2 min read 22.2K  
Leveraging the benefits of Open XML

Introduction

Recently, I had a requirement to import data from an Excel sheet to SQL Server. The Excel workbook was very huge and had 250,000 records under that. It was not an easy task to import that directly from an Excel sheet through SQL Server Import data wizard. Let me tell you some of the options that I tried and their corresponding results.

  1. Use SQL Server Import Data Wizard - This was able to open the file and import the data. When I verified the data, I could observe that there were heavy data losses. Several rows did not get imported, and moreover, I could see many blank values in place where a data was present in Excel.
  2. Write a query that can read data from Excel using the OLEDB connection - SQL Server was not able to open the connection and do the processing. The issues popped up majorly because of 64 bit operating System and 32 bit Office systems, etc.
  3. Created a small .NET application which could open the Excel sheet, read the values and post the data to SQL Server - The application was not able to open that huge file into Memory for Processing.

Background

Having tried and tired with these options, I was looking for an option that could solve my expectations.

  1. Should be able to open a huge Excel file for reading.
  2. Need not load the entire file into memory. Instead, I wanted that to be loaded in parts as may be required.
  3. Should use the OpenXML technologies and need not bother me with InterOp / DLL reference issues.

Here I came across an OpenXML method which helped me open an Excel Spreadsheet in several parts, not loading everything into memory at once, and which uses the Packaging methods and not Interop Methods.

OpenXML exposes an API that is called - Simple API for XML, also called as SAX which I was able to leverage for my requirements.

I am going to provide the C# code for your reference here.

Using the Code

C#
private void ReadExcelFileSAX(string fileName)
{
 //Open the file. You can pass 'false', if you just need to open the file for reading.
 using (SpreadsheetDocument myDoc = SpreadsheetDocument.Open(fileName, true))
 {
  WorkbookPart workbookPart = myDoc.WorkbookPart;
 
  foreach (WorksheetPart worksheetPart in workbookPart.WorksheetParts){
   //Create a OpenXmlReader that can iterate through 
   //the worksheet parts and read the values in it.
   OpenXmlReader reader = OpenXmlReader.Create(worksheetPart);
   while (reader.Read()){
    if (reader.ElementType == typeof(Row)){
     reader.ReadFirstChild();
     do{
      if (reader.ElementType == typeof(Cell)){
       Cell c = (Cell)reader.LoadCurrentElement();
       string cellValue;
       if (c.DataType != null && c.DataType == CellValues.SharedString){
        SharedStringItem ssi = workbookPart.SharedStringTablePart.
        SharedStringTable.Elements<SharedStringItem>().ElementAt
        (int.Parse(c.CellValue.InnerText));
        cellValue = ssi.Text.Text;
       }
       else{
        cellValue = c.CellValue.InnerText;
       }
      }
     } while (reader.ReadNextSibling());
    }
   }
  }
 }
}

History

  • 30th June, 2013: Initial version

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)