Introduction
Microsoft Excel file format (*.xls) is named as Binary Interchange File Format (BIFF). BIFF files are divided into several types:
- Excel 2.x (BIFF2)
- Excel 3.0 (BIFF3)
- Excel 4.0 (BIFF4)
- Excel 5.0 (BIFF5)
- Excel 7.0/95 (BIFF5)
- Excel 8.0/97 (BIFF8)
- Excel 9.0/2000 (BIFF8)
- Excel 10.0/XP (BIFF8)
- Excel 11.0/2003 (BIFF8)
This sample code is created for BIFF8
, which is used by Microsoft Excel 97-2003
. It is showing each block of record inside XLS file to analyze the binary file structure. Excel binary file structure documentation can be downloaded from here.
BIFF files are divided into several types including worksheet, workbook, chart, etc. In this tip, I will focus on workbook
format only.
Workbook file format is divided into several parts called substream
. Every substream is started with Beginning of File (BOF) record and ended with End Of File (EOF) record.
Each record
in general contains:
- ID (2 bytes)
- Length (2 bytes)
- Body (Size is based on Length in bytes)
BOF record ID is 0x0809 and EOF record ID is 0x000A.
Reading Excel Binary File
Basically to read BIFF files, we will need to find the BOF record first. After that, we continue reading other records until we find EOF record. Then we continue reading to find another BOF record and so on.
Dim fName As String = Request("fName")
Dim fStream As FileStream
Dim bReader As BinaryReader
Dim uBuffer As UInt16
Dim bFindBOF As Boolean = True
Dim bBuffer As Byte
Dim j As Integer = 0
If (String.IsNullOrEmpty(fName)) Then
Return
End If
Try
fStream = New FileStream(fName, FileMode.Open, FileAccess.Read)
bReader = New BinaryReader(fStream)
Do While j < fStream.Length - 1
If bFindBOF = True Then
Do While j < fStream.Length - 3 And bFindBOF = True
uBuffer = bReader.ReadUInt16()
If uBuffer = 2057 Then
bFindBOF = False
Response.Write("ID = 0x0809 | ")
End If
j = j + 2
Loop
If bFindBOF = False Then
uBuffer = bReader.ReadUInt16()
Response.Write("Length = " & uBuffer.ToString() & _
" | Offset = " & j.ToString())
j = j + 2
Response.Write(IIf(uBuffer > 0, " | Body = ", String.Empty))
For i = 1 To uBuffer
bBuffer = bReader.ReadByte()
Response.Write(GetByteHex(bBuffer) & " | ")
Next
Response.Write("<br/>")
j = j + uBuffer
Else
j = j + 2
End If
Else
uBuffer = bReader.ReadUInt16()
If uBuffer = 10 Then
bFindBOF = True
End If
Response.Write("ID = " & GetUInt16Hex(uBuffer) & " | ")
j = j + 2
uBuffer = bReader.ReadUInt16()
Response.Write("Length = " & uBuffer.ToString() & _
" | Offset = " & j.ToString())
j = j + 2
Response.Write(IIf(uBuffer > 0, " | Body = ", String.Empty))
For i = 1 To uBuffer
bBuffer = bReader.ReadByte()
Response.Write(GetByteHex(bBuffer) & " | ")
Next
Response.Write("<br/>")
j = j + uBuffer
End If
Loop
Response.Write("*ID and LENGTH take 4 bytes, " & _
"so total bytes per record is 4 + Length.")
bReader.Dispose()
fStream.Close()
Catch ex As Exception
Response.Write("xls-check::Page_Load: exception: " & _
ex.Message & Environment.NewLine & ex.StackTrace)
End Try
First, the code above is opening the XLS file and read the first BOF record. If BOF records is found, then read the next record until EOF record is found. After that, it will try to find another BOF record and so on.
Numeric to Hexadecimal Translator
The below code block is used to translate UInt16 to Hexadecimal string.
Private Function GetUInt16Hex(ByVal pValue As UInt16) As String
Dim sResult As String = "0x"
Dim dTemp As Double = 0
dTemp = Math.Floor(pValue / 4096)
sResult = sResult & GetHexLetter(dTemp).ToString()
pValue = pValue - (dTemp * 4096)
dTemp = Math.Floor(pValue / 256)
sResult = sResult & GetHexLetter(dTemp).ToString()
pValue = pValue - (dTemp * 256)
dTemp = Math.Floor(pValue / 16)
sResult = sResult & GetHexLetter(dTemp).ToString()
pValue = pValue - (dTemp * 16)
sResult = sResult & GetHexLetter(pValue).ToString()
Return sResult
End Function
The below code block is used to translate Byte to Hexadecimal string.
Private Function GetByteHex(ByVal pValue As UInt16) As String
Dim sResult As String = "0x"
Dim dTemp As Double = 0
dTemp = Math.Floor(pValue / 16)
sResult = sResult & GetHexLetter(dTemp).ToString()
pValue = pValue - (dTemp * 16)
sResult = sResult & GetHexLetter(pValue).ToString()
Return sResult
End Function
The below code block is used to get Hexadecimal character of a numeric value.
Private Function GetHexLetter(ByVal pValue As Double) As String
Dim sResult As String = String.Empty
If pValue < 10 Then
sResult = pValue.ToString()
ElseIf pValue = 10 Then
sResult = "A"
ElseIf pValue = 11 Then
sResult = "B"
ElseIf pValue = 12 Then
sResult = "C"
ElseIf pValue = 13 Then
sResult = "D"
ElseIf pValue = 14 Then
sResult = "E"
ElseIf pValue = 15 Then
sResult = "F"
End If
Return sResult
End Function