Introduction
The idea started after I read the nice article by Sam about parsing string
s into JSON objects. Sam decided to copy the string
characters into an array of integers where it becomes much faster to read and parse data. I was surprised by the performance gain that we get even with the time wasted in copying data.
The problem was when we have very large string
that we have to buffer. In this case, we have to create an array of same size and duplicate the string
data inside it. This will consume a lot of memory and time to copy data.
Solution
My idea is to create an array that shares its data with the string
. In other words, the first integer of the array is exactly the first character of the string
and changing any of them will change the other.
In this scenario, we only need to create a small array (one or two elements) and then change its underlying length and data to point to the string length and characters.
Background
Both Visual Basic 6 and Visual Basic for Applications are built over COM structures and all of its data types are Automation data types. So most of the variables are pointers to COM structures.
Here, we are interested in two data types: Strings and Arrays.
1. String
Strings are saved as Unicode characters inside BSTR structure. We don’t need to know about BSTR here. It is enough to know that we can get a pointer to the character sequence of the string
by the famous undocumented function ‘StrPtr ()
’.
2. Array
Here, the situation is more complicated, because we need to change the underlying structure of the array (both length and data pointer).
Arrays are saved in a SAFEARRAY
structure which is defined as:
typedef struct tagSAFEARRAY {
USHORT cDims;
USHORT Features;
ULONG cbElements;
ULONG cLocks;
PVOID pvData;
SAFEARRAYBOUND rgsabound[1];
} SAFEARRAY, *LPSAFEARRAY
So when we get the safe array pointer, we have to change the data pointer of the array (pvData
) which is located after 12 bytes from the beginning of the SAFEARRAY
structure.
We also need to change the array size which is saved in ‘rgsabound
’ member to reflect the size of the string
. SAFEARRAYBOUND
structure is defined as:
typedef struct tagSAFEARRAYBOUND {
ULONG cElements;
ULONG lLbound;
} SAFEARRAYBOUND, *LPSAFEARRAYBOUND;
So we have to change cElements
(which is located after 16 bytes from the beginning of the SAFEARRAY
structure) to reflect the number of elements we need for our array.
Note 1: If you want to change the lower bound of your array, you can also change lLbound
after 20 bytes from the beginning of the SAFEARRAY
structure.
Note 2: Be careful if you want to create a multidimensional array. In this case, every dimension will have a single SAFEARRAYBOUND
elements stored in rgsabound
.
How to Point to SAFEARRAY Structure
It is easy to get a pointer to any object in VB6 or VBA by direct using of VarPtr ()
function. This is true for everything except for arrays!! If you try to pass an array to that function, you will get an error.
To solve this problem, we have to create a new declaration for the VarPtr ()
function that takes an array as parameter. The problem about this way is that we need to change our declaration if we changed our VB version…. The following code describes how to do that:
Private Declare Function VarPtrArray Lib "msvbvm60.dll" Alias "VarPtr" (var () As Any) As Long
Private Declare Function VarPtrArray Lib "msvbvm50.dll" Alias "VarPtr" (var () As Any) As Long
Private Declare Function VarPtrArray Lib "VBE7" Alias "VarPtr" (var () As Any) As Long
Private Declare Function VarPtrArray Lib "VBE6" Alias "VarPtr" (var () As Any) As Long
How to Apply This Method
1. Get the String
If you have a small string
, please don’t waste your time by using this complicated method:
Dim S as string, Count as long
S = "Some large string"
Count = Len(s)
2. Create a Buffer Array
We will create a very small array of integers to create the underlying SAFEARRAY
data:
Dim Buffer (1) as Integer ‘Two elements array
3. Get the SAFEARRAY Pointer
As we said before, the VarPtrArray
function will return a pointer to a pointer to the SAFEARRAY
structure, for that we used a CopyMemoryToAny
function to get the pointer value:
Private Declare Sub CopyMemoryToAny Lib "kernel32.dll" _
Alias "RtlMoveMemory"(ByRef Destination As Any, ByVal Source As Long, ByVal length As Long)
Dim pArray as Long, pSafeArray as Long
pArray = VarPtrArray(Buffer ())
CopyMemoryToAny pSafeArray, pArray, 4
4. Backup the SAFEARRAY Original Data
It is very important to back up the old values of the SAFEARRAY
structure and then restore them before VB cleans this variable. If we fail in this, the original values will remain uncleared and we may get into memory leakage problems:
Dim pOldData as Long
CopyMemoryToAny pOldData, pSafeArray + 12, 4
5. Change to SAFEARRAY Data to the String Data
Here, we have to change the data pointer of the safe array (located after 12 bytes from the beginning) and then number of elements (located after 16 bytes):
Private Declare Sub CopyAnyToMemory Lib "kernel32.dll" _
Alias "RtlMoveMemory"(ByVal Destination As Long, ByRef Source As Any, ByVal length As Long)
CopyAnyToMemory pSafeArray + 12, StrPtr (S), 4
CopyAnyToMemory pSafeArray + 16, Count, 4
6. Do Your Work
Now you can start manipulating, parsing or even changing string
data without changing its size. Please be careful not to erase your string
variable or array data. This may cause unexpected behavior.
7. Restore the Original SAFEARRAY Data
Now we can finish our work by restoring the old values of SAFEARRAY
structure, so when VB starts to clear unused data, it will clear the right data of the array:
CopyAnyToMemory pSafeArray + 12, pOldData, 4
CopyAnyToMemory pSafeArray + 16, 2, 4
Alternatives
We can get direct access to string
data also by using the undocumented function GetMem2
which can be declared as:
‘For VB6 users
Private Declare Sub GetInteger Lib "MSVBVM60.dll" _
Alias "GetMem2" (ByRef Src As Any, ByRef Dst As Integer)
‘For VB5 users
Private Declare Sub GetInteger Lib "MSVBVM50.dll" _
Alias "GetMem2" (ByRef Src As Any, ByRef Dst As Integer)
There are two problems about these functions:
- They are not available for VBA users.
- They are slower than direct access by arrays which we described earlier, because there is little overhead when calling DLL imported functions.
Example
In the attached sample, we created a 20 MB string
, then we started to count the numeric characters inside it using three five methods:
- By using
Mid$()
function (String Access
) - By creating a buffer array of integers and copy the
string
data into it (Array access) - By using
GetMem2()
function (Memory Access) - By creating an array of integers that point to the
string
data (Direct access) - By using a DLL function written using C++ 2012 (C++ API access), this will work only from the EXE file, not from the IDE unless you have changed its declaration.
The performance gain against String
access was really huge (30 times faster in EXE), where the speed’s gain between Array and Direct access was very small… But actually, there was a memory’s save of about 40 MB because we did not use a copy of data to parse the string
. On the other hand, Memory access gave us a good performance with no memory overload and less complicated code, it is about 2-3 times slower than the last two methods. And for sure, the out performance winner is C++ API access which is the fastest, but not much more than direct access method.
Caution!!!
There is a strange behavior in this method….. When you run your application inside the IDE, it will work perfectly, but when you compile it, it will raise a ”Subscript out of range (Error 9)”, unless you compile it with removing bound array checks. You can remove it when you compile the project by pressing Option button, then go to compile tab and press Advanced Optimizations button, then check Remove array bound checks.
More To Do...
In this article, we explained how to parse a string
by direct access to its characters using an array of integers. You can also change its data if you did not change its length, for example, formatting the string
s to change some characters to upper case. You can also use the same concept to access any type of memory data by using arrays, For example, GDI+ bitmaps data through their Lockbits
function.
History
- 10th Mar 2014: Initial version