Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / desktop / Win32

VB-JSON Parser - Improved Performance

4.61/5 (13 votes)
17 Jun 2014BSD6 min read 135.8K  
Factor 2 improved performance on VB-JSON Parser

Introduction

I have used the VB-JSON parser library in one of my projects (VB6/VBA) to parse the JSON response of a JSON webservice. The data returned by this webservice can be in the order of tens of megabytes. The VB-JSON parser does its job, but with these kinds of messages becomes rather slow. Time for a thorough inspection of the code and to implement potential improvements. I take no credit for the VB-JSON Parser Library, that excellent code can be found here and is published under a BSD License, which in turn is based on this project also published under the BSD License. The optimized parser only contains methods to parse JSON, not to generate JSON. That can be taken from the original if required.

The source code is maintained on GitHub.

Background

The VB-JSON parser reads through a string looking for specific tokens that represent arrays, using [ ], objects, using { }, and key-value pairs that represent the properties of the found objects. There is a bit more to it, for a more thorough explanation, please visit json.org. This page also contains a list of many pieces of software related to JSON.

An example of a JSON object representing a Person can be the following:

JavaScript
{ "name" : "Billy Joe", "surname" : "Jim-Bob",
"email" : [ "billy@jim-bob.com", "billyjoe@jim-bob.com" ],
"age" : 42, "married" : true, weight : "150.63" }

Imagine you have a huge list of these, and knowing that VB6/VBA is not super fast at handling strings and doing string comparisons, one can imagine parsing 15 megabytes of persons may become slow... 10 seconds kind of slow! But, we can circumvent using strings and string comparisons. Having read this page on the dos and don'ts of VB6 string handling, I have managed to have an improved performance for VB-JSON with a factor of 2.5 to 3.

Having managed this with some straight forward improvements, I wanted to share this with the CodeProject community to show that VB6/VBA and JSON make a perfect match even for large strings, files, web responses.

My focus was mainly on large JSON strings and I have not checked the performance on small JSON strings, perhaps there the performance is worse than it was before. I leave that to the reader to check.

Improvements

In the following sections, I treat some of the functions and methods that I have refactored to gain the improved performance to get the idea across. The source code of the bas module is included with the article. The original can be downloaded from the VB-JSON website.

parse

The entry method of the Parser is the following:

VB.NET
Public Function parse(ByRef str As String) As Object

   m_decSep = ModLocale.GetRegionalSettings(LOCALE_SDECIMAL)
   m_groupSep = ModLocale.GetRegionalSettings(LOCALE_SGROUPING)

   Dim index As Long
   index = 1
   psErrors = ""
   On Error Resume Next
   Call skipChar(str, index)
   Select Case Mid(str, index, 1)
      Case "{"
         Set parse = parseObject(str, index)
      Case "["
         Set parse = parseArray(str, index)
      Case Else
         psErrors = "Invalid JSON"
   End Select

End Function   

The parse function takes a (JSON) string as input and it is important to note that this string is passed ByRef and not ByVal. Passing a string ByVal would create a copy of the string, this is unwanted in this case since the strings may be very large. So far so good, nothing to change, the original code takes care of this in the way I think it should for this specific case.

The first method that is being called is called skipChar, this moves the index forward in the string until it finds an interesting character, one of the tokens that are specified by JSON. Before I move on to the skipChar method, which is the method that is called the most, I will start by improving the parse method. Granted, it is a small improvement, but one that fits with the pattern I use to improve the performance of the complete module.

Instead of doing comparisons using the Mid() function of VB6, which returns a string, I convert the whole input string to an array of integer where I store the Unicode value of each character in the string and I use this array for almost all parsing. The improved version of the parse function becomes:

VB.NET
Public Function parse(ByRef str As String) As Object

Dim index As Long
index = 1

Call GenerateStringArray(str)

psErrors = vbNullString
On Error Resume Next

Call skipChar(index)

Select Case m_str(index)
Case A_SQUARE_BRACKET_OPEN
    Set parse = parseArray(str, index)
Case A_CURLY_BRACKET_OPEN
    Set parse = parseObject(str, index)
Case Else
    psErrors = "Invalid JSON"
End Select

'clean array
Erase m_str

End Function 

What you can notice is that I have an extra method called GenerateStringArray that converts the JSON string to an array of integer called m_str. The Select Case operates on m_str(index) and for each Case statement, there is a constant defined which is a Unicode value of the corresponding character. The following constants have been defined at the top of the module:

VB.NET
Private Const A_CURLY_BRACKET_OPEN As Integer = 123  ' AscW("{")
Private Const A_CURLY_BRACKET_CLOSE As Integer = 125 ' AscW("}")
Private Const A_SQUARE_BRACKET_OPEN As Integer = 91  ' AscW("[")
Private Const A_SQUARE_BRACKET_CLOSE As Integer = 93 ' AscW("]")
Private Const A_BRACKET_OPEN As Integer = 40         ' AscW("(")
Private Const A_BRACKET_CLOSE As Integer = 41        ' AscW(")")
Private Const A_COMMA As Integer = 44                ' AscW(",")
Private Const A_DOUBLE_QUOTE As Integer = 34         ' AscW("""")
Private Const A_SINGLE_QUOTE As Integer = 39         ' AscW("'")
Private Const A_BACKSLASH As Integer = 92            ' AscW("\")
Private Const A_FORWARDSLASH As Integer = 47         ' AscW("/")
Private Const A_COLON As Integer = 58                ' AscW(":")
Private Const A_SPACE As Integer = 32                ' AscW(" ")
Private Const A_ASTERIX As Integer = 42              ' AscW("*")
Private Const A_VBCR As Integer = 13                 ' AscW("vbcr")
Private Const A_VBLF As Integer = 10                 ' AscW("vblf")
Private Const A_VBTAB As Integer = 9                 ' AscW("vbTab")
Private Const A_VBCRLF As Integer = 13               ' AscW("vbcrlf")

Private Const A_b As Integer = 98                    ' AscW("b")
Private Const A_f As Integer = 102                   ' AscW("f")
Private Const A_n As Integer = 110                   ' AscW("n")
Private Const A_r As Integer = 114                   ' AscW("r"
Private Const A_t As Integer = 116                   ' AscW("t"))
Private Const A_u As Integer = 117                   ' AscW("u") 

GenerateStringArray

The GenerateStringArray method stores the length of the JSON string in a private variable for later use (instead of recalculating in each of the methods the length of the JSON string) and stores the Unicode value for each character in the m_str array.

VB.NET
Private Sub GenerateStringArray(ByRef str As String)

Dim i As Long

m_length = Len(str)
ReDim m_str(1 To m_length)

For i = 1 To m_length
    m_str(i) = AscW(Mid$(str, i, 1))
Next i

End Sub

skipChar

The skipChar method is a method that moves the index or cursor forward in the JSON string. Here is the original method:

VB.NET
Private Sub skipChar(ByRef str As String, ByRef index As Long)
   Dim bComment As Boolean
   Dim bStartComment As Boolean
   Dim bLongComment As Boolean
   Do While index > 0 And index <= Len(str)
      Select Case Mid(str, index, 1)
      Case vbCr, vbLf
         If Not bLongComment Then
            bStartComment = False
            bComment = False
         End If

      Case vbTab, " ", "(", ")"

      Case "/"
         If Not bLongComment Then
            If bStartComment Then
               bStartComment = False
               bComment = True
            Else
               bStartComment = True
               bComment = False
               bLongComment = False
            End If
         Else
            If bStartComment Then
               bLongComment = False
               bStartComment = False
               bComment = False
            End If
         End If

      Case "*"
         If bStartComment Then
            bStartComment = False
            bComment = True
            bLongComment = True
         Else
            bStartComment = True
         End If

      Case Else
         If Not bComment Then
            Exit Do
         End If
      End Select

      index = index + 1
   Loop

End Sub

Again, it makes use of string comparisons which I have improved by using the m_str array, the m_length variable and the constants that are declared in the module. Notice that the str is no longer passed as an argument to the skipChar method

VB.NET
Private Sub skipChar(ByRef index As Long)

Dim bComment As Boolean
Dim bStartComment As Boolean
Dim bLongComment As Boolean

Do While index > 0 And index <= m_length

    Select Case m_str(index)
    Case A_VBCR, A_VBLF
        If Not bLongComment Then
            bStartComment = False
            bComment = False
        End If

    Case A_VBTAB, A_SPACE, A_BRACKET_OPEN, A_BRACKET_CLOSE
        'do nothing

    Case A_FORWARDSLASH
        If Not bLongComment Then
            If bStartComment Then
                bStartComment = False
                bComment = True
            Else
                bStartComment = True
                bComment = False
                bLongComment = False
            End If
        Else
            If bStartComment Then
                bLongComment = False
                bStartComment = False
                bComment = False
            End If
        End If
    Case A_ASTERIX
        If bStartComment Then
            bStartComment = False
            bComment = True
            bLongComment = True
        Else
            bStartComment = True
        End If
    Case Else
        If Not bComment Then
            Exit Do
        End If
    End Select

    index = index + 1
Loop

End Sub

parseNumber

The original parseNumber method works fine for those locales where a the decimal separator is a period, but not for those where it is a comma. This is fixed by replacing the "." by a "," of the local Value variable in case the regional settings such that the decimal separator is a comma. The settings are stored in two variables called m_decSep and m_groupSep and are set in the parse function.

VB.NET
parseNumber(ByRef str As String, ByRef index As Long)

Dim Value   As String
Dim Char    As String

Call skipChar(index)

Do While index > 0 And index <= m_length
    Char = Mid$(str, index, 1)
    If InStr("+-0123456789.eE", Char) Then
        Value = Value & Char
        index = index + 1
    Else
        'check what is the grouping seperator
        If Not m_decSep = "." Then
            Value = Replace(Value, ".", m_decSep)
        End If
     
        If m_groupSep = "." Then
            Value = Replace(Value, ".", m_decSep)
        End If
     
        parseNumber = CDec(Value)
        Exit Function
    End If
Loop

End Function

parseString

The last method that is described is the parseString method. In the original, you can see that the cStringBuilder class is being used, found at vbaccelerator.com. This class is good for concatenating large strings together. Remember, everytime VB6 concatenates 2 strings, it creates a new string, so it doesn't really append the second string to the first. The cStringBuilder class makes use of the CopyMemory API function for this purpose. I found that in the case of parsing JSON string and creating dictionaries and collections, it did not make too much sense to use the cStringBuilder class. It was rather costly to use for the amount of times it was required to concatenate strings while parsing the strings that are created are the keys and values of the dictionaries. For my use case, these strings are small, especially the attribute names, the values can be larger, but never at the size that the cStringBuilder class would make a big difference.

VB.NET
Private Function parseString(ByRef str As String, ByRef index As Long) As String

   Dim quote   As String
   Dim Char    As String
   Dim Code    As String

   Dim SB As New cStringBuilder

   Call skipChar(str, index)
   quote = Mid(str, index, 1)
   index = index + 1

   Do While index > 0 And index <= Len(str)
      Char = Mid(str, index, 1)
      Select Case (Char)
         Case "\"
            index = index + 1
            Char = Mid(str, index, 1)
            Select Case (Char)
               Case """", "\", "/", "'"
                  SB.Append Char
                  index = index + 1
               Case "b"
                  SB.Append vbBack
                  index = index + 1
               Case "f"
                  SB.Append vbFormFeed
                  index = index + 1
               Case "n"
                  SB.Append vbLf
                  index = index + 1
               Case "r"
                  SB.Append vbCr
                  index = index + 1
               Case "t"
                  SB.Append vbTab
                  index = index + 1
               Case "u"
                  index = index + 1
                  Code = Mid(str, index, 4)
                  SB.Append ChrW(Val("&h" + Code))
                  index = index + 4
            End Select
         Case quote
            index = index + 1

            parseString = SB.ToString
            Set SB = Nothing

            Exit Function

         Case Else
            SB.Append Char
            index = index + 1
      End Select
   Loop

   parseString = SB.ToString
   Set SB = Nothing

End Function 

The refactored method does not make use of the cStringBuilder class. This already improved the performance. The major refactoring was done in line which has already been discussed earlier. One thing that is important to remember is that the m_str() array was created using the AscW() function. Take notice of the W.

  • AscW(S) returns the Unicode value of the first character in S
  • Asc(S) returns the ANSI value of the first character in S

So, we are working with Unicode here, and not with ANSI. Therefore, when concatenating the resulting parseString with the character at the currint index (cursor), we must also use the Unicode version of the Chr function.

VB.NET
parseString = parseString & ChrW$(charint) 

The complete function is listed below:

VB.NET
Private Function parseString(Byref str As string, ByRef index As Long) As String

   Dim quoteint As Integer
   Dim charint As Integer
   Dim Code    As String

   Call skipChar(index)

   quoteint = m_str(index)

   index = index + 1

   Do While index > 0 And index <= m_length

      charint = m_str(index)

      Select Case charint
        Case A_BACKSLASH

            index = index + 1
            charint = m_str(index)

            Select Case charint
            Case A_DOUBLE_QUOTE, A_BACKSLASH, A_FORWARDSLASH, A_SINGLE_QUOTE
                parseString = parseString & ChrW$(charint)
                index = index + 1
            Case A_b
                parseString = parseString & vbBack
                index = index + 1
            Case A_f
                parseString = parseString & vbFormFeed
                index = index + 1
            Case A_n
                    parseString = parseString & vbLf
                  index = index + 1
            Case A_r
                parseString = parseString & vbCr
                index = index + 1
            Case A_t
                parseString = parseString & vbTab
                  index = index + 1
            Case A_u
                index = index + 1
                Code = Mid$(str, index, 4)

                parseString = parseString & ChrW$(Val("&h" + Code))
                index = index + 4
            End Select

        Case quoteint

            index = index + 1
            Exit Function

         Case Else
            parseString = parseString & ChrW$(charint)
            index = index + 1
      End Select
   Loop

End Function

Improvements

  • Replaced the string functions with the $ counterpart, most used: Mid() -> Mid$()
  • Refactored the use of Mid()$ and replaced where possible with returning a value from the m_str array that is generated at the beginning of the Parser
  • Never recalculate the Len(jsonstring), calculate once and reuse private variable.
  • Use Unicode string functions: AscW(S) and ChrW$(U), do not mix with ANSI counterparts. The ANSI counterparts are also slower.

Points of Interest

History

  • Version 1: 2014-02-03
  • Version 2: 2014-02-08

License

This article, along with any associated source code and files, is licensed under The BSD License