Introduction
I've been doing some work lately with the MSHTML control, which takes most of its arguments as VARIANT
s. I can see you shuddering already. What's a self respecting C++ programmer doing dirtying his hands with VB/Scripting language datatypes? Well it's the lesser of two evils. Either I can learn about VARIANT
s or I can write my own HTML parser and editor. Which do you think is easier? I knew you'd agree :)
Truth be told, I think the VARIANT
concept is actually pretty cool. Wrap your data up in a nice little package with a type descriptor or two, throw it across a function call boundary and let the other side figure it out. If done right, it can solve a lot of otherwise nasty problems. I just wish they were easier to work with!
So that's the apology out of the way. Let's look at what a VARIANT
is.
Why a VARIANT?
In contrast to a strongly typed language like C++, Visual Basic and many scripting languages are weakly typed. What this means is that in a strongly typed language, you must pass the exact types of arguments to a function that the function was written to accept. If the function expects a pointer to a string
, you can't pass it an integer. Try to do so and you'll get a compile time error.
Weakly typed languages allow you to pass arguments that don't match the types expected. So the question should arise - if you can pass the wrong argument type to a function, how does the language respond? Most weakly typed languages 'coerce' the value that was passed into the expected type. What does this mean? It means that the language runtime will try and convert the data that was passed into the correct data type. For example, if you were to pass an integer to a function that expected to see a string
, the most natural 'coercion' is to convert the integer into a string
representation. Pass a date where a string
is expected and the natural 'coercion' is to convert it to a string
representation.
As C++ programmers, we're already used to coercion on a small scale - we're used to the idea that the compiler can do promotions from short
to int
and so on. Weakly typed languages just take it a step or two further.
So what has this to do with VARIANT
s?
Imagine you're designing your own programming language. You know the kinds of datatypes you want to support. You know the kinds of intrinsic operators you want. You can design your compiler to keep track of the datatype of everything in your program, so that when the programmer passes the wrong datatype
to a function, your compiler knows it and can insert the necessary code to convert the data.
Now imagine you're required to not only support your language but another language (say C++). You have complete control over your own language but no control whatsoever over the second language. Yet you want to be able to interoperate with that language. Since it's you who wants to interoperate with something you cannot change, it's up to you to adapt to the 'something you cannot change'. So you design your datatype
s in such a way that they contain sufficient information over and above the data they encapsulate to allow anyone else to decipher their contents.
Enter the VARIANT
A VARIANT
is a not such an exotic way of solving this problem. Simplified, a VARIANT
looks like this:
struct tagVARIANT
{
union
{
VARTYPE vt;
WORD wReserved1;
WORD wReserved2;
WORD wReserved3;
union
{
LONG lVal;
BYTE bVal;
SHORT iVal;
FLOAT fltVal;
DOUBLE dblVal;
VARIANT_BOOL boolVal;
DATE date;
BSTR bstrVal;
SAFEARRAY *parray;
VARIANT *pvarVal;
};
};
};
This is a very simplified version of the full VARIANT
definition to be found in your nearest copy of oaidl.h. I have no idea what the wReserved
values mean, nor do I care.
What we're interested in are the vt
values and the union. vt
is the valuetype
and the union is the value. You'll see that the union encompasses LONG
, BYTE
, SHORT
, FLOAT
and so on (there are a bucketload of em). vt
tells us how to interpret the value, using the member names. In C++, you might do it like this:
void SomeFunc(VARIANT& v)
{
USES_CONVERSION;
if (v.vt == VT_I4)
printf(_T("variant value is %d\n"), v.lVal);
else if (v.vt == VT_BSTR)
printf(_T("variant value is %s\n"), W2A(v.bstrVal));
}
This checks the vt
member of the VARIANT
. If it's a VT_I4
, then the data we want is contained in the lVal
member of the union. Since the lVal
member is a LONG
, we can use %d
as the format spec in the printf
call. If it's a VT_BSTR
, then the data is a BSTR
contained in the bstrVal
member of the union.
Notice how VARIANT
s use the BSTR
datatype to pass string
data. This is done so that a VARIANT
can be passed across a process boundary without incurring marshaling overhead. There are many other datatype
s (not discussed in this article) which do require marshaling to cross a process boundary but the passing of string
s is so common that using a BSTR
to sidestep marshaling is a nice optimisation.
Encapsulating a VARIANT in a Simple Class
Based on the code snippet we saw earlier, it might make sense to hide the dirty details of a VARIANT
in a class. We might do it thus:
class CVariant : public VARIANT
{
public:
CVariant();
CVariant(int iValue);
CVariant(LPCTSTR szValue);
LPCTSTR ToString() const;
int ToInt() const;
};
where the implementation of, say, the CVariant(int iValue)
overloaded constructor might look like this:
CVariant::CVariant(int iValue)
{
vt = VT_I4;
lVal = iValue;
}
and where the implementation of the ToString()
function might look like this:
LPCTSTR CVariant::ToString() const
{
USES_CONVERSION;
if (VT_BSTR == vt)
return W2A(bstrVal);
return _T("");
}
That simplifies the code a little by hiding the dirty details of figuring out the VARIANT
type or converting its contents inside a method call on the object but it's hardly enough to warrant a new class let alone an article about it.
Encapsulating a VARIANT in a More Complex Class
The simple class I showed above is probably adequate for most casual VARIANT
usage. It's certainly adequate for using the MSHTML control I alluded to in the introduction. It may not be sufficient for other environments. For example, some years ago, I wrote a whole bunch of software using the Microsoft Chat Protocol control, which seems to have been designed by a committee whose members only knew VB. Almost all data passed between the host and the control is passed as VARIANT
s and some of those VARIANT
s are arrays. A VARIANT
represents an array using the SAFEARRAY
structure.
The SAFEARRAY
definition looks like this (this is the Win32 definition - it's a trifle different for WinCE).
typedef struct tagSAFEARRAY
{
USHORT cDims;
USHORT fFeatures;
ULONG cbElements;
ULONG cLocks;
PVOID pvData;
SAFEARRAYBOUND rgsabound[1];
} SAFEARRAY;
You're going to love the purpose of the SAFEARRAYBOUND
member. It's a structure that specifies the number of elements in this dimension and the lower bound. This allows an index into a particular dimension of the SAFEARRAY
to start at any arbitrary number rather than the 0 that we C/C++ programmers know and love. There's an array of these structures, one for each cDim
.
So accessing a VARIANT
array in C++ involves interpreting the contents of the VARIANT
as a pointer to a SAFEARRAY
, validating the first array index against cDims
to be sure it's in range, then indexing into pvData
by the size of cbElements
, accounting for the contents of this indices entry in the rgsabound
array. Phew, what a mouthful!
Suddenly, it's starting to look like maybe a class to encapsulate this stuff might be useful.
The Class Itself
Caveats
The class presented here does not cover all possibilities; not by a long chalk. What it does cover are the situations I've encountered using the Microsoft Chat Protocol control and the MSHTML control. I suspect the code within Visual Basic that handles all the possibilities of the VARIANT
type is orders of magnitude more complex than the class presented here.
This class can handle simple VARIANTS
with signed integer datatype
s or string
s. It can also handle 1 dimensional arrays where each element of the array is a VARIANT
which can be any of the simple types handled by the class. If you want more, you can follow the code to see how to handle extra types. I've not needed types beyond those supported so I haven't written support for those types.
Ok, so that's the caveat out of the way. Here's the class header:
class CVariant : public VARIANT
{
public:
CVariant();
CVariant(bool bValue);
CVariant(int nValue);
CVariant(LPCTSTR szValue);
CVariant(VARIANT *pV);
CVariant(int lBound, int iElementCount);
~CVariant(void);
BOOL IsArray(int iElement = 0);
BOOL IsString(int iElement = 0);
BOOL IsInt(int iElement = 0);
BOOL IsBool(int iElement = 0);
VARIANT *operator&() { return this; }
VARIANT *ElementAt(int iElement = 0);
CString ToString(int iElement = 0);
int ToInt(int iElement = 0);
BOOL ToBool(int iElement = 0);
void Set(LPCTSTR szString, int iElement = 0);
void Set(int iValue, int iElement = 0);
void Set(bool bValue, int iElement = 0);
};
You've already seen the simple constructors. There are two other constructors. The first constructor lets you define an array. It takes the lower bound for an index, and a count of how many elements. The code looks like this:
CVariant::CVariant(int lBound, int iElementCount)
{
vt = VT_ARRAY | VT_VARIANT;
parray = new SAFEARRAY;
parray->cDims = 1;
parray->fFeatures = FADF_VARIANT | FADF_HAVEVARTYPE | FADF_FIXEDSIZE | FADF_STATIC;
parray->cbElements = sizeof(VARIANT);
parray->cLocks = 0;
parray->pvData = new VARIANT[iElementCount];
memset(parray->pvData, 0, sizeof(VARIANT) * iElementCount);
parray->rgsabound[0].lLbound = lBound;
parray->rgsabound[0].cElements = iElementCount;
}
From my description of the SAFEARRAY
structure earlier, this should all be pretty clear. We only support 1 dimensional arrays so we set the various members of the newly created SAFEARRAY
instance to reflect that fact. The new SAFEARRAY
s rgsabound[0]
structure is set with our lower bound and count variables. It's important to remember that the VARIANT
we're creating may be used to interoperate with a module created in another language and we can't assume that indexes start at 0
. Where you start your indexes depends on what you're interoperating with.
The fFeatures
member needs some explanation. The flag values I used specify that the array contains VARIANT
s of a fixed size and static (not created on the stack). I specify that it's static because if I need to allocate memory, I do it from the heap.
The other constructor lets you take an existing VARIANT
(passed perhaps to an event handler for some foreign object you're hosting) and attach it to a CVariant
. The code looks like this:
CVariant::CVariant(VARIANT *pV)
{
ASSERT(pV);
ASSERT(AfxIsValidAddress(pV, sizeof(VARIANT), TRUE));
vt = VT_VARIANT;
pvarVal = pV;
}
If it's a debug build, we do some asserts to be sure that it's a pointer to a block of valid memory at least large enough to actually contain a VARIANT
. There's not much more runtime validation we can do. Once we're sure it's something that could be a VARIANT
, we assign the pointer to the pvarVal
member and set the type to VT_VARIANT
. Once that's done, we can use any of the other member functions on the VARIANT
as though we'd created it ourselves.
Warning Warning Warning
Now listen up. Never ever use the CVariant::CVariant(VARIANT *pV)
constructor to attempt to preserve a VARIANT
across a function boundary. The only reason you'd use this constructor is to put the class wrapper around a VARIANT
you got from somewhere else. I don't want to say the only way you'd get such a VARIANT
is from an event but I'd put it at being asymptotically close to 100% of the time. This is why there's no Attach
function. The Attach
idiom is a temptation to try and preserve something across function boundaries. It works for objects that are going to be around for a long time, such as window handles, but it doesn't work for things like VARIANTS
that are created on the fly to communicate with some other module (such as yours).
Note well that there is no attempt at a copy constructor. Life is way too short to try and write such a beast. Think about it. Your code would have to cope with every possible variation and do deep copies of arrays within arrays within arrays.
VARIANT Attributes
Once we've created our CVariant
by whatever method, we use it. You wouldn't use a VARIANT
to communicate from one function of your program to another function in the same program. You probably wouldn't want to use it across a DLL boundary either. There's too much overhead to make a VARIANT
an attractive proposition. So it's almost a given that you're communicating with something you didn't write yourself. Thus, there are a few functions you can call to check the datatype
of something that's been passed to you from the something you didn't write.
BOOL IsArray(int iElement = 0);
BOOL IsString(int iElement = 0);
BOOL IsInt(int iElement = 0);
BOOL IsBool(int iElement = 0);
These IsAsomething()
functions mirror the datatype
s the class supports. If you're not sure about the type of a particular VARIANT
, use these functions to determine if some operation you're about to perform has any chance of succeeding.
Why don't I encourage access to the vt
member via an explicit member function? Glad you asked. Access to that member would return the exact type. Why is that bad? It's bad because you then have to allow for all the myriad options. It could be VT_USERDEFINED
or VT_BLOB_OBJECT
or VT_DISPATCH
. Since the class doesn't handle those types, you can do nothing useful with the information. Much better, in my opinion, to ask the class, are you a string
? Or are you an integer? If the answer is yes, then you can proceed to perform meaningful operations. If not, you do whatever error handling is appropriate.
Of course, there's nothing stopping you from accessing the vt
member explicitly, but if you do you're on your own.
VARIANT Access
Once you've determined the data type, you call the appropriate accessor. The accessors are used for both simple VARIANT
access and for array access and take a parameter which defaults to zero. The accessors figure out for themselves whether you've got an array or not and do the right thing depending on the exact contents of the VARIANT
.
What's the Index Base for Arrays?
Since these are C++ wrappers for VARIANT
and SAFEARRAY
operations, they treat arrays as being OPTION BASE 0
. Internally, they need not be (they could have come from VB for example with OPTION BASE 1
set but internally the functions correct for the OPTION BASE
.
The accessors use the ElementAt()
helper function to access the data requested and then apply the appropriate data conversion based on the datatype
. The ElementAt()
function looks like this:
VARIANT *CVariant::ElementAt(int iElement)
{
if (vt == VT_VARIANT)
return pvarVal;
if (!(vt & VT_ARRAY))
return this;
int offset = iElement - pvarVal->parray->rgsabound[0].lLbound;
if (offset >= 0 && offset <= int(pvarVal->parray->rgsabound[0].cElements))
return &((VARIANT *) pvarVal->parray->pvData)[offset];
else
return (VARIANT *) NULL;
}
You can see what I was talking about earlier. If the VARIANT
is wrapping a VARIANT
obtained from somewhere else, we return that VARIANT
. If the VARIANT
isn't an array, we return a pointer to ourselves (remember the class is derived from the VARIANT
structure and has no vtable
so this
is equivalent to a pointer to the base VARIANT
structure).
Otherwise, we have an array so we calculate an offset into the SAFEARRAY
taking into account the lower bound stored in the rgsabound
structure. Then we check that the offset is greater than or equal to 0
and less than the number of elements in the array and if it is, we return a pointer to the SAFEARRAY
element. If you've specified an index that's invalid, you get back a NULL
pointer.
The actual accessor looks like this:
CString CVariant::ToString(int iElement)
{
USES_CONVERSION;
VARIANT *v = ElementAt(iElement);
if (v != (VARIANT *) NULL && AfxIsValidAddress(v, sizeof(VARIANT), FALSE) && v->vt == VT_BSTR)
return W2A(v->bstrVal);
return _T("");
}
Pretty simple. The other accessors work in much the same way.
Notice that the bool
overloads use the lowercase bool
datatype, not the typedef
'd BOOL
. This is necessary to distinguish between the int
and bool
overloads. We need the different overloads so we can in fact create a VARIANT
with the VT_BOOL
type.
Type Coercion
The class presented above doesn't do any 'type coercion' (nor does the class implementation in the download). This is by design. While I accept the idea of 'coercion', I don't think it's appropriate in the C++ environment. I'd much rather know at runtime that an error occurred than have it masked by 'helpful' class design. The class does play safe by returning a zero if the VARIANT
isn't in fact numeric data of the expected kind or an empty string
if it's not a string
VARIANT
but it doesn't do type coercion. However, if you wanted to implement type coercion, you might do it like this.
LPCTSTR CVariant::ToString() const
{
USES_CONVERSION;
CString csTemp;
switch (vt)
{
case VT_BSTR:
return W2A(bstrVal);
case VT_I4;
csTemp.Format(_T("%d"), lVal);
break;
case VT_I2:
csTemp.Format(_T("%d"), iVal);
break;
}
return csTemp;
}
This little snippet returns a converted string
if the VARIANT
does indeed contain a string
. Otherwise, it attempts to convert numeric data into a string
representation and returns that. Finally, if the VARIANT
type isn't covered by the switch
statement, it returns an empty string
.
History
- 19th March, 2004 - Initial version
- 20th March, 2004 - Added
bool
overloads - 28th March, 2004 - Fixed a bug in the
ElementAt()
function