Importing Excel Clipboard Data INTO .NET via XML Spreadsheet

Casey Witt

5.00/5 (4 votes)

14 Apr 2015CPOL4 min read

25.8K

Function for parsing Microsoft Excel XML Spreadsheet clipboard DataObject into .NET Array of Objects

Introduction

This tip presents a function to transfer data cut or copied from Microsoft Excel into VB.NET using the clipboard. The function will inspect the clipboard DataObject for an "XML Spreadsheet" object, and convert that to a .NET Array of Objects (which the user can then use as needed).

Background

I am developing an application and need to be able to copy and paste between my application and Microsoft Excel using the clipboard. Copying from my app to Excel is easy (it is already baked into the DevExpress GridView I am using). Copying data from Excel to my application turned out to be a tricky beast. My first try was copying the text from the clipboard and then parsing it, but that didn't work because my text has Asian characters and the text is copied as ASCII (I think). My next try was to grab the text from the clipboard as a "UnicodeText" object and then use a TextFieldParser (with VbTab deliminator), but that didn't work because the TextFieldParser would ignore blank cells (Microsoft says this is not a bug, but I think it is since the TextFieldParser actually performs a transform which deletes empty rows as it parses). I didn't not find any other standardized parser and wasn't confident to roll my own due to issues such as embedded deliminators or line breaks, etc. The other issue was that both of these methods transferred the data as it was "displayed" in Microsoft Excel which meant that according to the Microsoft Excel user's Excel formatting, the data could be transformed in arbitrary formats.

The solution I finally arrived at was to use the "XML Spreadsheet" format to transfer the data. The main benefits of this are that it ensures empty cells are not skipped, and it will always transfer the raw unformatted value. Although the array which is returned is an array of objects, the underlying object type will be either String, Decimal, or DateTime.

The "XML Spreadsheet" object seems to be a self contained XML spreadsheet with a single sheet containing only the cells that were copied to the clipboard. You can inspect the schema for this by setting a debugging breakpoint after the call to ReadXml and then click the magnifying glass in the watch window.

Using the Code

This is a function that will inspect the clipboard for a DataObject containing a "XML Spreadsheet" object and if found return an Array(,) of Object, or Nothing if not found.

The code is mostly self explanatory, and I would appreciate any comments or suggestions for improvement.

VB.NET

Private Class XMLSpreadsheetCellData
    Public Property CellId As Int32
    Public Property RowIndex As Int32
    Public Property ColumnIndex As Int32
    Public Property DataType As String
    Public Property DataValue As String
End Class

Private Function ParseClipboard() As Object(,)
        Dim clipboardData = Clipboard.GetDataObject
        If clipboardData IsNot Nothing Then
            If clipboardData.GetFormats.Contains("XML Spreadsheet") Then
                Dim spreadsheet = New DataSet
                spreadsheet.ReadXml(clipboardData.GetData("XML Spreadsheet"))
                Dim rowCount = spreadsheet.Tables("Table").Rows(0)("ExpandedRowCount")
                Dim columnCount = spreadsheet.Tables("Table").Rows(0)("ExpandedColumnCount")
                If rowCount > 0 AndAlso columnCount > 0 Then
                    Dim result(rowCount - 1, columnCount - 1) As Object
                    If spreadsheet.Tables.Contains("Data") Then 'if there is no "Data" table then all cells are empty and all array elements will be nothing
                        If Not spreadsheet.Tables("Cell").Columns.Contains("Index") Then spreadsheet.Tables("Cell").Columns.Add(New DataColumn("Index", GetType(Int32)))
                    If Not spreadsheet.Tables("Row").Columns.Contains("Index") Then spreadsheet.Tables("Row").Columns.Add(New DataColumn("Index", GetType(Int32)))

                    'Iterate through the Row table and set the row indexes
                    Dim rowIndex = 1
                    With spreadsheet.Tables("Row")
                        For i = 0 To .Rows.Count - 1
                            If IsDBNull(.Rows(i)("Index")) Then
                                .Rows(i)("Index") = rowIndex
                                rowIndex += 1
                            Else
                                rowIndex = .Rows(i)("Index") + 1
                            End If
                        Next
                    End With

                    'Iterate through the cell table and set the column indexes
                    rowIndex = -1
                    Dim columnIndex = 0
                    With spreadsheet.Tables("Cell")
                        For i = 0 To .Rows.Count - 1
                            If .Rows(i)("Row_Id") <> rowIndex Then columnIndex = 1
                            rowIndex = .Rows(i)("Row_Id")
                            If IsDBNull(.Rows(i)("Index")) Then
                                .Rows(i)("Index") = columnIndex
                                columnIndex += 1
                            Else
                                columnIndex = .Rows(i)("Index") + 1
                            End If
                        Next
                    End With

                    Dim cells = (From cellRecord In spreadsheet.Tables("Cell") Join rowRecord In spreadsheet.Tables("Row")
                                 On cellRecord("Row_Id") Equals rowRecord("Row_Id") Join dataRecord In spreadsheet.Tables("Data")
                                 On cellRecord("Cell_Id") Equals dataRecord("Cell_Id")
                                 Select New XMLSpreadsheetCellData With {.CellId = cellRecord("Cell_Id"),
                                                                         .RowIndex = rowRecord("Index") - 1,
                                                                         .ColumnIndex = cellRecord("Index") - 1,
                                                                         .DataType = dataRecord("Type"),
                                                                         .DataValue = dataRecord("Data_Text")})

                    For Each cell In (From entry In cells
                                      Order By entry.RowIndex, entry.CellId)
                        rowIndex = cell.RowIndex
                        columnIndex = cell.ColumnIndex
                        Select Case cell.DataType
                            Case "String"
                                result(rowIndex, columnIndex) = cell.DataValue
                            Case "DateTime"
                                result(rowIndex, columnIndex) = DateTime.Parse(cell.DataValue)
                            Case "Number"
                                result(rowIndex, columnIndex) = Decimal.Parse(cell.DataValue)
                                If Decimal.Floor(result(rowIndex, columnIndex)) = result(rowIndex, columnIndex) Then
                                    result(rowIndex, columnIndex) = Integer.Parse(result(rowIndex, columnIndex))
                                End If
                            Case Else
                                Throw New DataException(String.Format("XML Spreadsheet Type {0} not recognized.", cell.DataType))
                        End Select
                    Next
                    End If
                    Return result
                End If
                Return Nothing
            End If
        End If
        Return Nothing
    End Function

The three tables of interest in the xmlSpreadsheet DataSet schema are "Data" which holds the data values, "Row" which holds the row indexes, and "Cell" which hold the column indexes.

If there are no blank cells in the Excel data then the "Index" columns will be missing from the "Row" and "Cell" tables. In this case a new "Index" column is added to each of these tables.

Then the row indexes are calculated and stored in the "Index" column in the "Row" table using the rule that the first index is 1, and each time the Row_Id field changes the index is incremented.

The column indexes are calculated and stored in the "Index" column in the "Cell" table using the rule that each time the Row_Id changes the column index is reset to 1, and then incremented for each of the following records with the same Row_Id.

If the Excel data contains empty cells in a column then the "Cell" table will already include the "Index" column, so the rule for calculating the column indexes is each time the Row_Id changes the column index is reset to 1, and if the "Index" field is empty set it to the column index, but if it is not empty then don't change it.

If the Excel data contains empty rows then the "Row" table will already include the "Index" column, so the rule for calculating the row indexes is that the first index is 1, and for each row if the "Index" field has a value don't change it otherwise increment the "Index" field by 1 for each row.

History

14April 2015: First posted
16 April 2015: Updated code to fix a bug based on a wrong assumption about how row and column indexes should be calculated

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)