Introduction
This tip presents a function to transfer data cut or copied from Microsoft Excel into VB.NET using the clipboard. The function will inspect the clipboard DataObject
for an "XML Spreadsheet" object, and convert that to a .NET Array of Objects (which the user can then use as needed).
Background
I am developing an application and need to be able to copy and paste between my application and Microsoft Excel using the clipboard. Copying from my app to Excel is easy (it is already baked into the DevExpress GridView
I am using). Copying data from Excel to my application turned out to be a tricky beast. My first try was copying the text from the clipboard and then parsing it, but that didn't work because my text has Asian characters and the text is copied as ASCII (I think). My next try was to grab the text from the clipboard as a "UnicodeText" object and then use a TextFieldParser
(with VbTab deliminator), but that didn't work because the TextFieldParser
would ignore blank cells (Microsoft says this is not a bug, but I think it is since the TextFieldParser
actually performs a transform which deletes empty rows as it parses). I didn't not find any other standardized parser and wasn't confident to roll my own due to issues such as embedded deliminators or line breaks, etc. The other issue was that both of these methods transferred the data as it was "displayed" in Microsoft Excel which meant that according to the Microsoft Excel user's Excel formatting, the data could be transformed in arbitrary formats.
The solution I finally arrived at was to use the "XML Spreadsheet" format to transfer the data. The main benefits of this are that it ensures empty cells are not skipped, and it will always transfer the raw unformatted value. Although the array which is returned is an array of objects, the underlying object type will be either String
, Decimal
, or DateTime
.
The "XML Spreadsheet" object seems to be a self contained XML spreadsheet with a single sheet containing only the cells that were copied to the clipboard. You can inspect the schema for this by setting a debugging breakpoint after the call to ReadXml
and then click the magnifying glass in the watch window.
Using the Code
This is a function that will inspect the clipboard for a DataObject
containing a "XML Spreadsheet" object and if found return an Array(,)
of Object
, or Nothing
if not found.
The code is mostly self explanatory, and I would appreciate any comments or suggestions for improvement.
Private Class XMLSpreadsheetCellData
Public Property CellId As Int32
Public Property RowIndex As Int32
Public Property ColumnIndex As Int32
Public Property DataType As String
Public Property DataValue As String
End Class
Private Function ParseClipboard() As Object(,)
Dim clipboardData = Clipboard.GetDataObject
If clipboardData IsNot Nothing Then
If clipboardData.GetFormats.Contains("XML Spreadsheet") Then
Dim spreadsheet = New DataSet
spreadsheet.ReadXml(clipboardData.GetData("XML Spreadsheet"))
Dim rowCount = spreadsheet.Tables("Table").Rows(0)("ExpandedRowCount")
Dim columnCount = spreadsheet.Tables("Table").Rows(0)("ExpandedColumnCount")
If rowCount > 0 AndAlso columnCount > 0 Then
Dim result(rowCount - 1, columnCount - 1) As Object
If spreadsheet.Tables.Contains("Data") Then
If Not spreadsheet.Tables("Cell").Columns.Contains("Index") Then spreadsheet.Tables("Cell").Columns.Add(New DataColumn("Index", GetType(Int32)))
If Not spreadsheet.Tables("Row").Columns.Contains("Index") Then spreadsheet.Tables("Row").Columns.Add(New DataColumn("Index", GetType(Int32)))
Dim rowIndex = 1
With spreadsheet.Tables("Row")
For i = 0 To .Rows.Count - 1
If IsDBNull(.Rows(i)("Index")) Then
.Rows(i)("Index") = rowIndex
rowIndex += 1
Else
rowIndex = .Rows(i)("Index") + 1
End If
Next
End With
rowIndex = -1
Dim columnIndex = 0
With spreadsheet.Tables("Cell")
For i = 0 To .Rows.Count - 1
If .Rows(i)("Row_Id") <> rowIndex Then columnIndex = 1
rowIndex = .Rows(i)("Row_Id")
If IsDBNull(.Rows(i)("Index")) Then
.Rows(i)("Index") = columnIndex
columnIndex += 1
Else
columnIndex = .Rows(i)("Index") + 1
End If
Next
End With
Dim cells = (From cellRecord In spreadsheet.Tables("Cell") Join rowRecord In spreadsheet.Tables("Row")
On cellRecord("Row_Id") Equals rowRecord("Row_Id") Join dataRecord In spreadsheet.Tables("Data")
On cellRecord("Cell_Id") Equals dataRecord("Cell_Id")
Select New XMLSpreadsheetCellData With {.CellId = cellRecord("Cell_Id"),
.RowIndex = rowRecord("Index") - 1,
.ColumnIndex = cellRecord("Index") - 1,
.DataType = dataRecord("Type"),
.DataValue = dataRecord("Data_Text")})
For Each cell In (From entry In cells
Order By entry.RowIndex, entry.CellId)
rowIndex = cell.RowIndex
columnIndex = cell.ColumnIndex
Select Case cell.DataType
Case "String"
result(rowIndex, columnIndex) = cell.DataValue
Case "DateTime"
result(rowIndex, columnIndex) = DateTime.Parse(cell.DataValue)
Case "Number"
result(rowIndex, columnIndex) = Decimal.Parse(cell.DataValue)
If Decimal.Floor(result(rowIndex, columnIndex)) = result(rowIndex, columnIndex) Then
result(rowIndex, columnIndex) = Integer.Parse(result(rowIndex, columnIndex))
End If
Case Else
Throw New DataException(String.Format("XML Spreadsheet Type {0} not recognized.", cell.DataType))
End Select
Next
End If
Return result
End If
Return Nothing
End If
End If
Return Nothing
End Function
The three tables of interest in the xmlSpreadsheet DataSet schema are "Data" which holds the data values, "Row" which holds the row indexes, and "Cell" which hold the column indexes.
If there are no blank cells in the Excel data then the "Index" columns will be missing from the "Row" and "Cell" tables. In this case a new "Index" column is added to each of these tables.
Then the row indexes are calculated and stored in the "Index" column in the "Row" table using the rule that the first index is 1, and each time the Row_Id field changes the index is incremented.
The column indexes are calculated and stored in the "Index" column in the "Cell" table using the rule that each time the Row_Id changes the column index is reset to 1, and then incremented for each of the following records with the same Row_Id.
If the Excel data contains empty cells in a column then the "Cell" table will already include the "Index" column, so the rule for calculating the column indexes is each time the Row_Id changes the column index is reset to 1, and if the "Index" field is empty set it to the column index, but if it is not empty then don't change it.
If the Excel data contains empty rows then the "Row" table will already include the "Index" column, so the rule for calculating the row indexes is that the first index is 1, and for each row if the "Index" field has a value don't change it otherwise increment the "Index" field by 1 for each row.
History
- 14 April 2015: First posted
- 16 April 2015: Updated code to fix a bug based on a wrong assumption about how row and column indexes should be calculated