Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

VB6 and VBA: Creating a Variable and an Array that Points to Same Data

0.00/5 (No votes)
10 Mar 2014 1  
This is a very fast way to access string (or other) data with minimum memory consumption.

Introduction

The idea started after I read the nice article by Sam about parsing strings into JSON objects. Sam decided to copy the string characters into an array of integers where it becomes much faster to read and parse data. I was surprised by the performance gain that we get even with the time wasted in copying data.

The problem was when we have very large string that we have to buffer. In this case, we have to create an array of same size and duplicate the string data inside it. This will consume a lot of memory and time to copy data.

Solution

My idea is to create an array that shares its data with the string. In other words, the first integer of the array is exactly the first character of the string and changing any of them will change the other.

In this scenario, we only need to create a small array (one or two elements) and then change its underlying length and data to point to the string length and characters.

Background

Both Visual Basic 6 and Visual Basic for Applications are built over COM structures and all of its data types are Automation data types. So most of the variables are pointers to COM structures.

Here, we are interested in two data types: Strings and Arrays.

1. String

Strings are saved as Unicode characters inside BSTR structure. We don’t need to know about BSTR here. It is enough to know that we can get a pointer to the character sequence of the string by the famous undocumented function ‘StrPtr ()’.

2. Array

Here, the situation is more complicated, because we need to change the underlying structure of the array (both length and data pointer).

Arrays are saved in a SAFEARRAY structure which is defined as:

typedef struct tagSAFEARRAY {
    USHORT        cDims;        //The number of dimensions.
    USHORT        Features;    //Flags.
    ULONG        cbElements;    //The size of a single element.
    ULONG        cLocks;        //Number of locks.
    PVOID        pvData;        //The data.
    SAFEARRAYBOUND    rgsabound[1];    //one bound for each dimension.
} SAFEARRAY, *LPSAFEARRAY 

So when we get the safe array pointer, we have to change the data pointer of the array (pvData) which is located after 12 bytes from the beginning of the SAFEARRAY structure.

We also need to change the array size which is saved in ‘rgsabound’ member to reflect the size of the string. SAFEARRAYBOUND structure is defined as:

typedef struct tagSAFEARRAYBOUND {
    ULONG    cElements;            //The number of elements in the dimension.
    ULONG    lLbound;              //The lower bound of the dimension.
} SAFEARRAYBOUND, *LPSAFEARRAYBOUND;

So we have to change cElements (which is located after 16 bytes from the beginning of the SAFEARRAY structure) to reflect the number of elements we need for our array.

Note 1: If you want to change the lower bound of your array, you can also change lLbound after 20 bytes from the beginning of the SAFEARRAY structure.

Note 2: Be careful if you want to create a multidimensional array. In this case, every dimension will have a single SAFEARRAYBOUND elements stored in rgsabound.

How to Point to SAFEARRAY Structure

It is easy to get a pointer to any object in VB6 or VBA by direct using of VarPtr () function. This is true for everything except for arrays!! If you try to pass an array to that function, you will get an error.

To solve this problem, we have to create a new declaration for the VarPtr () function that takes an array as parameter. The problem about this way is that we need to change our declaration if we changed our VB version…. The following code describes how to do that:

'For VB6 users:
Private Declare Function VarPtrArray Lib "msvbvm60.dll" Alias "VarPtr" (var () As Any) As Long
'For VB5 users:
Private Declare Function VarPtrArray Lib "msvbvm50.dll" Alias "VarPtr" (var () As Any) As Long
'For VBA users with Office 2010+:
Private Declare Function VarPtrArray Lib "VBE7" Alias "VarPtr" (var () As Any) As Long
'For VBA users with Office before 2010:
Private Declare Function VarPtrArray Lib "VBE6" Alias "VarPtr" (var () As Any) As Long 

How to Apply This Method

1. Get the String

If you have a small string, please don’t waste your time by using this complicated method:

Dim S as string, Count as long
S = "Some large string"
Count = Len(s)  

2. Create a Buffer Array

We will create a very small array of integers to create the underlying SAFEARRAY data:

Dim Buffer (1) as Integer ‘Two elements array 

3. Get the SAFEARRAY Pointer

As we said before, the VarPtrArray function will return a pointer to a pointer to the SAFEARRAY structure, for that we used a CopyMemoryToAny function to get the pointer value:

Private Declare Sub CopyMemoryToAny Lib "kernel32.dll" _
  Alias "RtlMoveMemory"(ByRef Destination As Any, ByVal Source As Long, ByVal length As Long) 
Dim pArray as Long, pSafeArray as Long
pArray = VarPtrArray(Buffer ())
CopyMemoryToAny pSafeArray, pArray, 4  

4. Backup the SAFEARRAY Original Data

It is very important to back up the old values of the SAFEARRAY structure and then restore them before VB cleans this variable. If we fail in this, the original values will remain uncleared and we may get into memory leakage problems:

Dim pOldData as Long
CopyMemoryToAny pOldData, pSafeArray + 12, 4 

5. Change to SAFEARRAY Data to the String Data

Here, we have to change the data pointer of the safe array (located after 12 bytes from the beginning) and then number of elements (located after 16 bytes):

Private Declare Sub CopyAnyToMemory Lib "kernel32.dll" _
  Alias "RtlMoveMemory"(ByVal Destination As Long, ByRef Source As Any, ByVal length As Long)
CopyAnyToMemory pSafeArray + 12, StrPtr (S), 4
CopyAnyToMemory pSafeArray + 16, Count, 4 

6. Do Your Work

Now you can start manipulating, parsing or even changing string data without changing its size. Please be careful not to erase your string variable or array data. This may cause unexpected behavior.

7. Restore the Original SAFEARRAY Data

Now we can finish our work by restoring the old values of SAFEARRAY structure, so when VB starts to clear unused data, it will clear the right data of the array:

CopyAnyToMemory pSafeArray + 12, pOldData, 4
CopyAnyToMemory pSafeArray + 16, 2, 4 

Alternatives

We can get direct access to string data also by using the undocumented function GetMem2 which can be declared as:

‘For VB6 users
Private Declare Sub GetInteger Lib "MSVBVM60.dll" _
    Alias "GetMem2" (ByRef Src As Any, ByRef Dst As Integer)
‘For VB5 users
Private Declare Sub GetInteger Lib "MSVBVM50.dll" _
    Alias "GetMem2" (ByRef Src As Any, ByRef Dst As Integer) 

There are two problems about these functions:

  1. They are not available for VBA users.
  2. They are slower than direct access by arrays which we described earlier, because there is little overhead when calling DLL imported functions.

Example

In the attached sample, we created a 20 MB string, then we started to count the numeric characters inside it using three five methods:

  1. By using Mid$() function (String Access)
  2. By creating a buffer array of integers and copy the string data into it (Array access)
  3. By using GetMem2() function (Memory Access)
  4. By creating an array of integers that point to the string data (Direct access)
  5. By using a DLL function written using C++ 2012 (C++ API access), this will work only from the EXE file, not from the IDE unless you have changed its declaration.

The performance gain against String access was really huge (30 times faster in EXE), where the speed’s gain between Array and Direct access was very small… But actually, there was a memory’s save of about 40 MB because we did not use a copy of data to parse the string. On the other hand, Memory access gave us a good performance with no memory overload and less complicated code, it is about 2-3 times slower than the last two methods. And for sure, the out performance winner is C++ API access which is the fastest, but not much more than direct access method.

Caution!!!

There is a strange behavior in this method….. When you run your application inside the IDE, it will work perfectly, but when you compile it, it will raise a ”Subscript out of range (Error 9)”, unless you compile it with removing bound array checks. You can remove it when you compile the project by pressing Option button, then go to compile tab and press Advanced Optimizations button, then check Remove array bound checks.

More To Do...

In this article, we explained how to parse a string by direct access to its characters using an array of integers. You can also change its data if you did not change its length, for example, formatting the strings to change some characters to upper case. You can also use the same concept to access any type of memory data by using arrays, For example, GDI+ bitmaps data through their Lockbits function.

History

  • 10th Mar 2014: Initial version

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here