(untagged)

A Simple Class to Encapsulate VARIANTs

Rob Manderson

0.00/5 (No votes)

27 Mar 2004

Using Variants in your C++ code

Download source files - 2.1 KB

Introduction

I've been doing some work lately with the MSHTML control, which takes most of its arguments as VARIANTs. I can see you shuddering already. What's a self respecting C++ programmer doing dirtying his hands with VB/Scripting language datatypes? Well it's the lesser of two evils. Either I can learn about VARIANTs or I can write my own HTML parser and editor. Which do you think is easier? I knew you'd agree :)

Truth be told, I think the VARIANT concept is actually pretty cool. Wrap your data up in a nice little package with a type descriptor or two, throw it across a function call boundary and let the other side figure it out. If done right, it can solve a lot of otherwise nasty problems. I just wish they were easier to work with!

So that's the apology out of the way. Let's look at what a VARIANT is.

Why a VARIANT?

In contrast to a strongly typed language like C++, Visual Basic and many scripting languages are weakly typed. What this means is that in a strongly typed language, you must pass the exact types of arguments to a function that the function was written to accept. If the function expects a pointer to a string, you can't pass it an integer. Try to do so and you'll get a compile time error.

Weakly typed languages allow you to pass arguments that don't match the types expected. So the question should arise - if you can pass the wrong argument type to a function, how does the language respond? Most weakly typed languages 'coerce' the value that was passed into the expected type. What does this mean? It means that the language runtime will try and convert the data that was passed into the correct data type. For example, if you were to pass an integer to a function that expected to see a string, the most natural 'coercion' is to convert the integer into a string representation. Pass a date where a string is expected and the natural 'coercion' is to convert it to a string representation.

As C++ programmers, we're already used to coercion on a small scale - we're used to the idea that the compiler can do promotions from short to int and so on. Weakly typed languages just take it a step or two further.

So what has this to do with VARIANTs?

Imagine you're designing your own programming language. You know the kinds of datatypes you want to support. You know the kinds of intrinsic operators you want. You can design your compiler to keep track of the datatype of everything in your program, so that when the programmer passes the wrong datatype to a function, your compiler knows it and can insert the necessary code to convert the data.

Now imagine you're required to not only support your language but another language (say C++). You have complete control over your own language but no control whatsoever over the second language. Yet you want to be able to interoperate with that language. Since it's you who wants to interoperate with something you cannot change, it's up to you to adapt to the 'something you cannot change'. So you design your datatypes in such a way that they contain sufficient information over and above the data they encapsulate to allow anyone else to decipher their contents.

Enter the VARIANT

A VARIANT is a not such an exotic way of solving this problem. Simplified, a VARIANT looks like this:

struct tagVARIANT
{
    union 
    {
        VARTYPE vt;
        WORD wReserved1;
        WORD wReserved2;
        WORD wReserved3;
        union 
        {
            LONG lVal;
            BYTE bVal;
            SHORT iVal;
            FLOAT fltVal;
            DOUBLE dblVal;
            VARIANT_BOOL boolVal;
            DATE date;
            BSTR bstrVal;
            SAFEARRAY *parray;
            VARIANT *pvarVal;
        };
    };
};

This is a very simplified version of the full VARIANT definition to be found in your nearest copy of oaidl.h. I have no idea what the wReserved values mean, nor do I care.

What we're interested in are the vt values and the union. vt is the valuetype and the union is the value. You'll see that the union encompasses LONG, BYTE, SHORT, FLOAT and so on (there are a bucketload of em). vt tells us how to interpret the value, using the member names. In C++, you might do it like this:

void SomeFunc(VARIANT& v)
{
    USES_CONVERSION;

    if (v.vt == VT_I4)
        printf(_T("variant value is %d\n"), v.lVal);
    else if (v.vt == VT_BSTR)
        printf(_T("variant value is %s\n"), W2A(v.bstrVal));
}

This checks the vt member of the VARIANT. If it's a VT_I4, then the data we want is contained in the lVal member of the union. Since the lVal member is a LONG, we can use %d as the format spec in the printf call. If it's a VT_BSTR, then the data is a BSTR contained in the bstrVal member of the union.

Notice how VARIANTs use the BSTR datatype to pass string data. This is done so that a VARIANT can be passed across a process boundary without incurring marshaling overhead. There are many other datatypes (not discussed in this article) which do require marshaling to cross a process boundary but the passing of strings is so common that using a BSTR to sidestep marshaling is a nice optimisation.

Encapsulating a VARIANT in a Simple Class

Based on the code snippet we saw earlier, it might make sense to hide the dirty details of a VARIANT in a class. We might do it thus:

class CVariant : public VARIANT
{
public:
                CVariant();
                CVariant(int iValue);
                CVariant(LPCTSTR szValue);

    LPCTSTR     ToString() const;
    int         ToInt() const;
};

where the implementation of, say, the CVariant(int iValue) overloaded constructor might look like this:

CVariant::CVariant(int iValue)
{
    vt = VT_I4;
    lVal = iValue;
}

and where the implementation of the ToString() function might look like this:

LPCTSTR CVariant::ToString() const
{
    USES_CONVERSION;

    if (VT_BSTR == vt)
        return W2A(bstrVal);

    //  It's not a string so return an empty string
    return _T("");
}

That simplifies the code a little by hiding the dirty details of figuring out the VARIANT type or converting its contents inside a method call on the object but it's hardly enough to warrant a new class let alone an article about it.

Encapsulating a VARIANT in a More Complex Class

The simple class I showed above is probably adequate for most casual VARIANT usage. It's certainly adequate for using the MSHTML control I alluded to in the introduction. It may not be sufficient for other environments. For example, some years ago, I wrote a whole bunch of software using the Microsoft Chat Protocol control, which seems to have been designed by a committee whose members only knew VB. Almost all data passed between the host and the control is passed as VARIANTs and some of those VARIANTs are arrays. A VARIANT represents an array using the SAFEARRAY structure.

The SAFEARRAY definition looks like this (this is the Win32 definition - it's a trifle different for WinCE).

typedef struct tagSAFEARRAY
{
    USHORT cDims;     // How many dimensions in this array
    USHORT fFeatures; // Allocation control flags
    ULONG cbElements; // The size of each array element
    ULONG cLocks;     // Array lock count.
    PVOID pvData;     // Points at the data in the array
    SAFEARRAYBOUND rgsabound[1];
} SAFEARRAY;

You're going to love the purpose of the SAFEARRAYBOUND member. It's a structure that specifies the number of elements in this dimension and the lower bound. This allows an index into a particular dimension of the SAFEARRAY to start at any arbitrary number rather than the 0 that we C/C++ programmers know and love. There's an array of these structures, one for each cDim.

So accessing a VARIANT array in C++ involves interpreting the contents of the VARIANT as a pointer to a SAFEARRAY, validating the first array index against cDims to be sure it's in range, then indexing into pvData by the size of cbElements, accounting for the contents of this indices entry in the rgsabound array. Phew, what a mouthful!

Suddenly, it's starting to look like maybe a class to encapsulate this stuff might be useful.

The Class Itself

Caveats

The class presented here does not cover all possibilities; not by a long chalk. What it does cover are the situations I've encountered using the Microsoft Chat Protocol control and the MSHTML control. I suspect the code within Visual Basic that handles all the possibilities of the VARIANT type is orders of magnitude more complex than the class presented here.

This class can handle simple VARIANTS with signed integer datatypes or strings. It can also handle 1 dimensional arrays where each element of the array is a VARIANT which can be any of the simple types handled by the class. If you want more, you can follow the code to see how to handle extra types. I've not needed types beyond those supported so I haven't written support for those types.

Ok, so that's the caveat out of the way. Here's the class header:

class CVariant : public VARIANT
{
public:
                    CVariant();
                    CVariant(bool bValue);
                    CVariant(int nValue);
                    CVariant(LPCTSTR szValue);
                    CVariant(VARIANT *pV);
                    CVariant(int lBound, int iElementCount);
                    ~CVariant(void);

//  Attributes
    BOOL            IsArray(int iElement = 0);
    BOOL            IsString(int iElement = 0);
    BOOL            IsInt(int iElement = 0);
    BOOL            IsBool(int iElement = 0);

//  Conversions
    VARIANT         *operator&()        { return this; }

//  Get operations
    VARIANT         *ElementAt(int iElement = 0);

    CString         ToString(int iElement = 0);
    int             ToInt(int iElement = 0);
    BOOL            ToBool(int iElement = 0);

//  Set operations
    void            Set(LPCTSTR szString, int iElement = 0);
    void            Set(int iValue, int iElement = 0);
    void            Set(bool bValue, int iElement = 0);
};

You've already seen the simple constructors. There are two other constructors. The first constructor lets you define an array. It takes the lower bound for an index, and a count of how many elements. The code looks like this:

CVariant::CVariant(int lBound, int iElementCount)
{
    //  Set the type to an array of variants...
    vt = VT_ARRAY | VT_VARIANT;
    parray = new SAFEARRAY;

    //  We only support 1 dimensional arrays..
    parray->cDims = 1;
    parray->fFeatures = FADF_VARIANT | FADF_HAVEVARTYPE | FADF_FIXEDSIZE | FADF_STATIC;
    parray->cbElements = sizeof(VARIANT);
    parray->cLocks = 0;

    //  Allocate the array of variants we point to...
    parray->pvData = new VARIANT[iElementCount];
    memset(parray->pvData, 0, sizeof(VARIANT) * iElementCount);
    parray->rgsabound[0].lLbound = lBound;
    parray->rgsabound[0].cElements = iElementCount;
}

From my description of the SAFEARRAY structure earlier, this should all be pretty clear. We only support 1 dimensional arrays so we set the various members of the newly created SAFEARRAY instance to reflect that fact. The new SAFEARRAYs rgsabound[0] structure is set with our lower bound and count variables. It's important to remember that the VARIANT we're creating may be used to interoperate with a module created in another language and we can't assume that indexes start at 0. Where you start your indexes depends on what you're interoperating with.

The fFeatures member needs some explanation. The flag values I used specify that the array contains VARIANTs of a fixed size and static (not created on the stack). I specify that it's static because if I need to allocate memory, I do it from the heap.

The other constructor lets you take an existing VARIANT (passed perhaps to an event handler for some foreign object you're hosting) and attach it to a CVariant. The code looks like this:

CVariant::CVariant(VARIANT *pV)
{
    //  Validate the input (and make sure it's writeable)
    ASSERT(pV);
    ASSERT(AfxIsValidAddress(pV, sizeof(VARIANT), TRUE));

    vt = VT_VARIANT;
    pvarVal = pV;
}

If it's a debug build, we do some asserts to be sure that it's a pointer to a block of valid memory at least large enough to actually contain a VARIANT. There's not much more runtime validation we can do. Once we're sure it's something that could be a VARIANT, we assign the pointer to the pvarVal member and set the type to VT_VARIANT. Once that's done, we can use any of the other member functions on the VARIANT as though we'd created it ourselves.

Warning Warning Warning

Now listen up. Never ever use the CVariant::CVariant(VARIANT *pV) constructor to attempt to preserve a VARIANT across a function boundary. The only reason you'd use this constructor is to put the class wrapper around a VARIANT you got from somewhere else. I don't want to say the only way you'd get such a VARIANT is from an event but I'd put it at being asymptotically close to 100% of the time. This is why there's no Attach function. The Attach idiom is a temptation to try and preserve something across function boundaries. It works for objects that are going to be around for a long time, such as window handles, but it doesn't work for things like VARIANTS that are created on the fly to communicate with some other module (such as yours).

Note well that there is no attempt at a copy constructor. Life is way too short to try and write such a beast. Think about it. Your code would have to cope with every possible variation and do deep copies of arrays within arrays within arrays.

VARIANT Attributes

Once we've created our CVariant by whatever method, we use it. You wouldn't use a VARIANT to communicate from one function of your program to another function in the same program. You probably wouldn't want to use it across a DLL boundary either. There's too much overhead to make a VARIANT an attractive proposition. So it's almost a given that you're communicating with something you didn't write yourself. Thus, there are a few functions you can call to check the datatype of something that's been passed to you from the something you didn't write.

//  Attributes
BOOL            IsArray(int iElement = 0);
BOOL            IsString(int iElement = 0);
BOOL            IsInt(int iElement = 0);
BOOL            IsBool(int iElement = 0);

These IsAsomething() functions mirror the datatypes the class supports. If you're not sure about the type of a particular VARIANT, use these functions to determine if some operation you're about to perform has any chance of succeeding.

Why don't I encourage access to the vt member via an explicit member function? Glad you asked. Access to that member would return the exact type. Why is that bad? It's bad because you then have to allow for all the myriad options. It could be VT_USERDEFINED or VT_BLOB_OBJECT or VT_DISPATCH. Since the class doesn't handle those types, you can do nothing useful with the information. Much better, in my opinion, to ask the class, are you a string? Or are you an integer? If the answer is yes, then you can proceed to perform meaningful operations. If not, you do whatever error handling is appropriate.

Of course, there's nothing stopping you from accessing the vt member explicitly, but if you do you're on your own.

VARIANT Access

Once you've determined the data type, you call the appropriate accessor. The accessors are used for both simple VARIANT access and for array access and take a parameter which defaults to zero. The accessors figure out for themselves whether you've got an array or not and do the right thing depending on the exact contents of the VARIANT.

What's the Index Base for Arrays?

Since these are C++ wrappers for VARIANT and SAFEARRAY operations, they treat arrays as being OPTION BASE 0. Internally, they need not be (they could have come from VB for example with OPTION BASE 1 set but internally the functions correct for the OPTION BASE.

The accessors use the ElementAt() helper function to access the data requested and then apply the appropriate data conversion based on the datatype. The ElementAt() function looks like this:

VARIANT *CVariant::ElementAt(int iElement)
{
    if (vt == VT_VARIANT)
        //  It's a pointer to an external VARIANT
        //  so return that variant
        return pvarVal;

    if (!(vt & VT_ARRAY))
        //  It's not an array so return ourselves
        return this;

    //  Calculate our element offset
    int offset = iElement - pvarVal->parray->rgsabound[0].lLbound;

    //  Offset must be zero or greater and less than the bounds
    if (offset >= 0 && offset <= int(pvarVal->parray->rgsabound[0].cElements))
        return &((VARIANT *) pvarVal->parray->pvData)[offset];
    else
        return (VARIANT *) NULL;
}

You can see what I was talking about earlier. If the VARIANT is wrapping a VARIANT obtained from somewhere else, we return that VARIANT. If the VARIANT isn't an array, we return a pointer to ourselves (remember the class is derived from the VARIANT structure and has no vtable so this is equivalent to a pointer to the base VARIANT structure).

Otherwise, we have an array so we calculate an offset into the SAFEARRAY taking into account the lower bound stored in the rgsabound structure. Then we check that the offset is greater than or equal to 0 and less than the number of elements in the array and if it is, we return a pointer to the SAFEARRAY element. If you've specified an index that's invalid, you get back a NULL pointer.

The actual accessor looks like this:

CString CVariant::ToString(int iElement)
{
    USES_CONVERSION;

    //  Get the VARIANT at the iElement offset
    VARIANT *v = ElementAt(iElement);

    //  Must be a valid pointer and must be valid readable memory
    if (v != (VARIANT *) NULL && AfxIsValidAddress(v, sizeof(VARIANT), FALSE) && v->vt == VT_BSTR)
        return W2A(v->bstrVal);

    return _T("");
}

Pretty simple. The other accessors work in much the same way.

Notice that the bool overloads use the lowercase bool datatype, not the typedef'd BOOL. This is necessary to distinguish between the int and bool overloads. We need the different overloads so we can in fact create a VARIANT with the VT_BOOL type.

Type Coercion

The class presented above doesn't do any 'type coercion' (nor does the class implementation in the download). This is by design. While I accept the idea of 'coercion', I don't think it's appropriate in the C++ environment. I'd much rather know at runtime that an error occurred than have it masked by 'helpful' class design. The class does play safe by returning a zero if the VARIANT isn't in fact numeric data of the expected kind or an empty string if it's not a string VARIANT but it doesn't do type coercion. However, if you wanted to implement type coercion, you might do it like this.

LPCTSTR CVariant::ToString() const
{
    USES_CONVERSION;
    CString csTemp;

    switch (vt)
    {
    case VT_BSTR:
        return W2A(bstrVal);

    //  It's not a string, maybe a number?
    case VT_I4;
        csTemp.Format(_T("%d"), lVal);
        break;

    case VT_I2:
        csTemp.Format(_T("%d"), iVal);
        break;

    //  and so forth...
    }

    return csTemp;
}

This little snippet returns a converted string if the VARIANT does indeed contain a string. Otherwise, it attempts to convert numeric data into a string representation and returns that. Finally, if the VARIANT type isn't covered by the switch statement, it returns an empty string.

History

19^th March, 2004 - Initial version
20^th March, 2004 - Added bool overloads
28^th March, 2004 - Fixed a bug in the ElementAt() function

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here