Introduction
I have used the VB-JSON parser library in one of my projects (VB6/VBA) to parse the JSON response of a JSON webservice. The data returned by this webservice can be in the order of tens of megabytes. The VB-JSON parser does its job, but with these kinds of messages becomes rather slow. Time for a thorough inspection of the code and to implement potential improvements. I take no credit for the VB-JSON Parser Library, that excellent code can be found here and is published under a BSD License, which in turn is based on this project also published under the BSD License. The optimized parser only contains methods to parse JSON, not to generate JSON. That can be taken from the original if required.
The source code is maintained on GitHub.
Background
The VB-JSON parser reads through a string
looking for specific tokens that represent arrays, using [ ]
, objects, using { }
, and key-value pairs that represent the properties of the found objects. There is a bit more to it, for a more thorough explanation, please visit json.org. This page also contains a list of many pieces of software related to JSON.
An example of a JSON object representing a Person
can be the following:
{ "name" : "Billy Joe", "surname" : "Jim-Bob",
"email" : [ "billy@jim-bob.com", "billyjoe@jim-bob.com" ],
"age" : 42, "married" : true, weight : "150.63" }
Imagine you have a huge list of these, and knowing that VB6/VBA is not super fast at handling string
s and doing string
comparisons, one can imagine parsing 15 megabytes of persons may become slow... 10 seconds kind of slow! But, we can circumvent using string
s and string
comparisons. Having read this page on the dos and don'ts of VB6 string
handling, I have managed to have an improved performance for VB-JSON with a factor of 2.5 to 3.
Having managed this with some straight forward improvements, I wanted to share this with the CodeProject community to show that VB6/VBA and JSON make a perfect match even for large string
s, files, web responses.
My focus was mainly on large JSON string
s and I have not checked the performance on small JSON string
s, perhaps there the performance is worse than it was before. I leave that to the reader to check.
Improvements
In the following sections, I treat some of the functions and methods that I have refactored to gain the improved performance to get the idea across. The source code of the bas module is included with the article. The original can be downloaded from the VB-JSON website.
parse
The entry method of the Parser is the following:
Public Function parse(ByRef str As String) As Object
m_decSep = ModLocale.GetRegionalSettings(LOCALE_SDECIMAL)
m_groupSep = ModLocale.GetRegionalSettings(LOCALE_SGROUPING)
Dim index As Long
index = 1
psErrors = ""
On Error Resume Next
Call skipChar(str, index)
Select Case Mid(str, index, 1)
Case "{"
Set parse = parseObject(str, index)
Case "["
Set parse = parseArray(str, index)
Case Else
psErrors = "Invalid JSON"
End Select
End Function
The parse
function takes a (JSON) string
as input and it is important to note that this string
is passed ByRef
and not ByVal
. Passing a string ByVal
would create a copy of the string
, this is unwanted in this case since the string
s may be very large. So far so good, nothing to change, the original code takes care of this in the way I think it should for this specific case.
The first method that is being called is called skipChar
, this moves the index forward in the string
until it finds an interesting character, one of the tokens that are specified by JSON. Before I move on to the skipChar
method, which is the method that is called the most, I will start by improving the parse
method. Granted, it is a small improvement, but one that fits with the pattern I use to improve the performance of the complete module.
Instead of doing comparisons using the Mid()
function of VB6, which returns a string
, I convert the whole input string
to an array of integer
where I store the Unicode value of each character in the string
and I use this array for almost all parsing. The improved version of the parse
function becomes:
Public Function parse(ByRef str As String) As Object
Dim index As Long
index = 1
Call GenerateStringArray(str)
psErrors = vbNullString
On Error Resume Next
Call skipChar(index)
Select Case m_str(index)
Case A_SQUARE_BRACKET_OPEN
Set parse = parseArray(str, index)
Case A_CURLY_BRACKET_OPEN
Set parse = parseObject(str, index)
Case Else
psErrors = "Invalid JSON"
End Select
Erase m_str
End Function
What you can notice is that I have an extra method called GenerateStringArray
that converts the JSON string
to an array of integer
called m_str
. The Select
Case operates on m_str(index)
and for each Case
statement, there is a constant defined which is a Unicode value of the corresponding character. The following constants have been defined at the top of the module:
Private Const A_CURLY_BRACKET_OPEN As Integer = 123
Private Const A_CURLY_BRACKET_CLOSE As Integer = 125
Private Const A_SQUARE_BRACKET_OPEN As Integer = 91
Private Const A_SQUARE_BRACKET_CLOSE As Integer = 93
Private Const A_BRACKET_OPEN As Integer = 40
Private Const A_BRACKET_CLOSE As Integer = 41
Private Const A_COMMA As Integer = 44
Private Const A_DOUBLE_QUOTE As Integer = 34
Private Const A_SINGLE_QUOTE As Integer = 39
Private Const A_BACKSLASH As Integer = 92
Private Const A_FORWARDSLASH As Integer = 47
Private Const A_COLON As Integer = 58
Private Const A_SPACE As Integer = 32
Private Const A_ASTERIX As Integer = 42
Private Const A_VBCR As Integer = 13
Private Const A_VBLF As Integer = 10
Private Const A_VBTAB As Integer = 9
Private Const A_VBCRLF As Integer = 13
Private Const A_b As Integer = 98
Private Const A_f As Integer = 102
Private Const A_n As Integer = 110
Private Const A_r As Integer = 114
Private Const A_t As Integer = 116
Private Const A_u As Integer = 117
GenerateStringArray
The GenerateStringArray
method stores the length of the JSON string
in a private
variable for later use (instead of recalculating in each of the methods the length of the JSON string
) and stores the Unicode value for each character in the m_str
array.
Private Sub GenerateStringArray(ByRef str As String)
Dim i As Long
m_length = Len(str)
ReDim m_str(1 To m_length)
For i = 1 To m_length
m_str(i) = AscW(Mid$(str, i, 1))
Next i
End Sub
skipChar
The skipChar
method is a method that moves the index or cursor forward in the JSON string
. Here is the original method:
Private Sub skipChar(ByRef str As String, ByRef index As Long)
Dim bComment As Boolean
Dim bStartComment As Boolean
Dim bLongComment As Boolean
Do While index > 0 And index <= Len(str)
Select Case Mid(str, index, 1)
Case vbCr, vbLf
If Not bLongComment Then
bStartComment = False
bComment = False
End If
Case vbTab, " ", "(", ")"
Case "/"
If Not bLongComment Then
If bStartComment Then
bStartComment = False
bComment = True
Else
bStartComment = True
bComment = False
bLongComment = False
End If
Else
If bStartComment Then
bLongComment = False
bStartComment = False
bComment = False
End If
End If
Case "*"
If bStartComment Then
bStartComment = False
bComment = True
bLongComment = True
Else
bStartComment = True
End If
Case Else
If Not bComment Then
Exit Do
End If
End Select
index = index + 1
Loop
End Sub
Again, it makes use of string
comparisons which I have improved by using the m_str
array, the m_length
variable and the constants that are declared in the module. Notice that the str
is no longer passed as an argument to the skipChar
method
Private Sub skipChar(ByRef index As Long)
Dim bComment As Boolean
Dim bStartComment As Boolean
Dim bLongComment As Boolean
Do While index > 0 And index <= m_length
Select Case m_str(index)
Case A_VBCR, A_VBLF
If Not bLongComment Then
bStartComment = False
bComment = False
End If
Case A_VBTAB, A_SPACE, A_BRACKET_OPEN, A_BRACKET_CLOSE
Case A_FORWARDSLASH
If Not bLongComment Then
If bStartComment Then
bStartComment = False
bComment = True
Else
bStartComment = True
bComment = False
bLongComment = False
End If
Else
If bStartComment Then
bLongComment = False
bStartComment = False
bComment = False
End If
End If
Case A_ASTERIX
If bStartComment Then
bStartComment = False
bComment = True
bLongComment = True
Else
bStartComment = True
End If
Case Else
If Not bComment Then
Exit Do
End If
End Select
index = index + 1
Loop
End Sub
parseNumber
The original parseNumber
method works fine for those locales where a the decimal separator is a period, but not for those where it is a comma. This is fixed by replacing the "." by a "," of the local Value
variable in case the regional settings such that the decimal separator is a comma. The settings are stored in two variables called m_decSep
and m_groupSep
and are set in the parse function.
parseNumber(ByRef str As String, ByRef index As Long)
Dim Value As String
Dim Char As String
Call skipChar(index)
Do While index > 0 And index <= m_length
Char = Mid$(str, index, 1)
If InStr("+-0123456789.eE", Char) Then
Value = Value & Char
index = index + 1
Else
If Not m_decSep = "." Then
Value = Replace(Value, ".", m_decSep)
End If
If m_groupSep = "." Then
Value = Replace(Value, ".", m_decSep)
End If
parseNumber = CDec(Value)
Exit Function
End If
Loop
End Function
parseString
The last method that is described is the parseString
method. In the original, you can see that the cStringBuilder
class is being used, found at vbaccelerator.com. This class is good for concatenating large string
s together. Remember, everytime VB6 concatenates 2 string
s, it creates a new string
, so it doesn't really append the second string
to the first. The cStringBuilder
class makes use of the CopyMemory
API function for this purpose. I found that in the case of parsing JSON string
and creating dictionaries and collections, it did not make too much sense to use the cStringBuilder
class. It was rather costly to use for the amount of times it was required to concatenate string
s while parsing the string
s that are created are the keys and values of the dictionaries. For my use case, these string
s are small, especially the attribute names, the values can be larger, but never at the size that the cStringBuilder
class would make a big difference.
Private Function parseString(ByRef str As String, ByRef index As Long) As String
Dim quote As String
Dim Char As String
Dim Code As String
Dim SB As New cStringBuilder
Call skipChar(str, index)
quote = Mid(str, index, 1)
index = index + 1
Do While index > 0 And index <= Len(str)
Char = Mid(str, index, 1)
Select Case (Char)
Case "\"
index = index + 1
Char = Mid(str, index, 1)
Select Case (Char)
Case """", "\", "/", "'"
SB.Append Char
index = index + 1
Case "b"
SB.Append vbBack
index = index + 1
Case "f"
SB.Append vbFormFeed
index = index + 1
Case "n"
SB.Append vbLf
index = index + 1
Case "r"
SB.Append vbCr
index = index + 1
Case "t"
SB.Append vbTab
index = index + 1
Case "u"
index = index + 1
Code = Mid(str, index, 4)
SB.Append ChrW(Val("&h" + Code))
index = index + 4
End Select
Case quote
index = index + 1
parseString = SB.ToString
Set SB = Nothing
Exit Function
Case Else
SB.Append Char
index = index + 1
End Select
Loop
parseString = SB.ToString
Set SB = Nothing
End Function
The refactored method does not make use of the cStringBuilder
class. This already improved the performance. The major refactoring was done in line which has already been discussed earlier. One thing that is important to remember is that the m_str()
array was created using the AscW()
function. Take notice of the W
.
AscW(S)
returns the Unicode value of the first character in S
Asc(S)
returns the ANSI value of the first character in S
So, we are working with Unicode here, and not with ANSI. Therefore, when concatenating the resulting parseString
with the character at the currint index (cursor), we must also use the Unicode version of the Chr
function.
parseString = parseString & ChrW$(charint)
The complete function is listed below:
Private Function parseString(Byref str As string, ByRef index As Long) As String
Dim quoteint As Integer
Dim charint As Integer
Dim Code As String
Call skipChar(index)
quoteint = m_str(index)
index = index + 1
Do While index > 0 And index <= m_length
charint = m_str(index)
Select Case charint
Case A_BACKSLASH
index = index + 1
charint = m_str(index)
Select Case charint
Case A_DOUBLE_QUOTE, A_BACKSLASH, A_FORWARDSLASH, A_SINGLE_QUOTE
parseString = parseString & ChrW$(charint)
index = index + 1
Case A_b
parseString = parseString & vbBack
index = index + 1
Case A_f
parseString = parseString & vbFormFeed
index = index + 1
Case A_n
parseString = parseString & vbLf
index = index + 1
Case A_r
parseString = parseString & vbCr
index = index + 1
Case A_t
parseString = parseString & vbTab
index = index + 1
Case A_u
index = index + 1
Code = Mid$(str, index, 4)
parseString = parseString & ChrW$(Val("&h" + Code))
index = index + 4
End Select
Case quoteint
index = index + 1
Exit Function
Case Else
parseString = parseString & ChrW$(charint)
index = index + 1
End Select
Loop
End Function
Improvements
- Replaced the
string
functions with the $
counterpart, most used: Mid()
-> Mid$()
- Refactored the use of
Mid()$
and replaced where possible with returning a value from the m_str
array that is generated at the beginning of the Parser - Never recalculate the
Len(jsonstring)
, calculate once and reuse private
variable. - Use Unicode
string
functions: AscW(S)
and ChrW$(U)
, do not mix with ANSI counterparts. The ANSI counterparts are also slower.
Points of Interest
History
- Version 1: 2014-02-03
- Version 2: 2014-02-08