Introduction
In this article I will try to demonstrate why none of the existing BSTR
wrappers are appropriate for
managing BSTR
objects with binary contents, and therefore the necessity to implement a specialized one
for this purpose, which I have done.
Why another BSTR wrapper?
Recently I was involved in developing some ATL components for encryption/decryption operations. Since
the results of encryption operations are not restricted only to a given set of characters, like in the case
of text strings, some special methods for managing strings of binary data are needed. The BSTR
data
structure seems very appropriate for such kind of operations, since it can contain any characters and the
string is not mandatory 0 terminated, the string length being specified before the string data. For example you can
build a string of a given length like this:
BSTR bstr = ::SysAllocStringLen(L"ABCB\0DEFG", 9);
This string contains a 0 in the middle. The problem with this data structure it that it needs a
wrapper in order to avoid some otherwise easy to make programming mistakes. In order to explain what
can go wrong I will first introduce a simple COM component for converting data from binary strings to hexa
strings and reversely. This is a subset of a real more complex component (notice that the code I give here
is really enough for somebody willing to build the actual hexa converter component, but this is not the real purpose of
this article). The IDL interface of this component is something like this (not all code details are given):
interface IHexObj : IDispatch
{
[id(10), helpstring("String convertion from Binary to Hexa")]
HRESULT Binary2Hex([in]BSTR* pbsBin, [out]BSTR* pbsHex);
[id(11), helpstring("String convertion from Hexa to Binary")]
HRESULT Hex2Binary([in]BSTR* pbsHex, [out]BSTR* pbsBin);
};
and the implementation is something like this (not all code details are given):
static char szError1[] = "HexCom ERROR: WideCharToMultiByte() conversion Error!";
static char szError2[] = "HexCom ERROR: MultiByteToWideChar() conversion Error!";
static char szError3[] = "HexCom ERROR in Hex2Binary(): in is not a Hex string!";
void Char2Hex(unsigned char ch, char* szHex)
{
static unsigned char saucHex[] = "0123456789ABCDEF";
szHex[0] = saucHex[ch >> 4];
szHex[1] = saucHex[ch&0xF];
szHex[2] = 0;
}
bool Hex2Char(char const* szHex, unsigned char& rch)
{
if(*szHex >= '0' && *szHex <= '9')
rch = *szHex - '0';
else if(*szHex >= 'A' && *szHex <= 'F')
rch = *szHex - 55;
else
return false;
szHex++;
if(*szHex >= '0' && *szHex <= '9')
(rch <<= 4) += *szHex - '0';
else if(*szHex >= 'A' && *szHex <= 'F')
(rch <<= 4) += *szHex - 55;
else
return false;
return true;
}
void Binary2Hex(unsigned char const* pucBinStr, int iBinSize, char* pszHexStr)
{
int i;
char szHex[3];
unsigned char const* pucBinStr1 = pucBinStr;
*pszHexStr = 0;
for(i=0; i<iBinSize; i++,pucBinStr1++)
{
Char2Hex(*pucBinStr1, szHex);
strcat(pszHexStr, szHex);
}
}
bool Hex2Binary(char const* pszHexStr, unsigned char* pucBinStr, int iBinSize)
{
int i;
unsigned char ch;
for(i=0; i<iBinSize; i++,pszHexStr+=2,pucBinStr++)
{
if(false == Hex2Char(pszHexStr, ch))
return false;
*pucBinStr = ch;
}
return true;
}
STDMETHODIMP CHexObj::Binary2Hex(BSTR* pbsBin, BSTR* pbsHex)
{
USES_CONVERSION;
int iBinLen = ::SysStringLen(*pbsBin);
char* pcBin = static_cast<char*>(_alloca(iBinLen));
if(!WideCharToMultiByte(CP_ACP, 0, *pbsBin, iBinLen, pcBin, iBinLen, NULL, FALSE))
{
return Error(szError1, IID_IHexObj);
}
char* pcHex = static_cast<char*>(_alloca((iBinLen<<1)+1));
::Binary2Hex(reinterpret_cast<unsigned char*>(pcBin), iBinLen, pcHex);
::SysReAllocString(pbsHex, T2OLE(pcHex));
return S_OK;
}
STDMETHODIMP CHexObj::Hex2Binary(BSTR* pbsHex, BSTR* pbsBin)
{
USES_CONVERSION;
int iBinLen = ::SysStringLen(*pbsHex);
if(iBinLen&1 != 0)
{
return Error(szError3, IID_IHexObj);
}
iBinLen >>= 1;
string ostrHex(OLE2T(*pbsHex));
char* pcBin = static_cast<char*>(_alloca(iBinLen));
if(false == ::Hex2Binary(ostrHex.c_str(),
reinterpret_cast<unsigned char*>(pcBin), iBinLen))
{
return Error(szError3, IID_IHexObj);
}
WCHAR* pW = (WCHAR*)_alloca(iBinLen*sizeof(WCHAR));
if(!MultiByteToWideChar(CP_ACP, 0, pcBin, iBinLen, pW, iBinLen))
{
return Error(szError2, IID_IHexObj);
}
::SysReAllocStringLen(pbsBin, pW, iBinLen);
return S_OK;
}
First notice that the input argument in method Binary2Hex()
is also a pointer like the output argument:
HRESULT Binary2Hex([in]BSTR* pbsBin, [out]BSTR* pbsHex);
Somebody can argue that it is not necessary, but experimentally I have found out that if I use a
signature like:
HRESULT Binary2Hex([in]BSTR bsBin, [out]BSTR* pbsHex);
and I want to pass a string like bstr
defined above, then the line:
int iBinLen = ::SysStringLen(bsBin);
inside method Binary2Hex()
would give length 4 instead of the correct value 9. It seems that during
marshalling
COM
is creating a copy of the original string, but is stopping to the first 0. It works fine for
text strings, but not for binary strings.
In conclusion, when you work with binary data you should transmit as pointers both the input and output
BSTR
arguments!
Now why do you need a wrapper for BSTR
?
Notice that in both Binary2Hex()
and Hex2Binary()
methods
I am using the ::SysReAllocStringLen()
function for reallocating the output string before returning. If the
BSTR
argument is not already allocated on the client side, then this function is generating a wonderful crash
on the client side. So what? somebody could argue, you could use the function ::SysAllocStringLen()
instead,
which is working fine in any situation. It is true, but if the string was already allocated on the client side, this
::SysAllocStringLen()
function would generate memory leaks in the system. So in this case, for avoiding the
memory leaks, the programmer should first, before calling the method, deallocate the string on the client side or he
should ensure that the string is not initialized, but you cannot impose it on him, and there is no compiling or execution
error if he is not doing it. Therefore I think the best solution is to use the reallocation functions on the server side and
recommend to the programmer on the client side to systematically use a BSTR
wrapper which is initializing the
encapsulated BSTR
to an empty string (or if he likes he can ensure that all the BSTR
strings are
allocated, which is more error prone and time consuming).
A second problem occurs when your component methods are throwing exceptions. In this case nobody
is deallocating the BSTR
strings allocated on the server side before exception
occurring. A wrapper could
do it in the destructor, so this is the second reason you should use one.
Now, analysing the wrappers already existing, I couldn't find one appropriate for the tasks I was concerned with,
namely managing BSTR
objects containing binary data. Let's consider for example the _bstr_t
wrapper.
There is no constructor for specifying the string length, for example the code:
_bstr_t _bstr(L"ABC\0DEF");
cout << _bstr.length() << endl;
will print the length as 3, i.e. it is stopping to the first 0. A second idea is to first allocate
the string and then take ownership of it, like:
BSTR bstr = ::SysAllocStringLen(L"ABC\0DEF", 7);
_bstr_t _bstr(bstr, false);
cout << _bstr.length() << endl;
with the fCopy
flag false
. This time the length has the correct value 7, but
if you just make a copy of the external BSTR
, like this:
BSTR bstr = ::SysAllocStringLen(L"ABC\0DEF", 7);
_bstr_t _bstr(bstr);
cout << _bstr.length() << endl;
you get the same wrong result, 3. It seems that there are inherent difficulties in _bstr_t
to
make copies of BSTR
objects with binary contents.
Another problem with _bstr_t
is that it cannot be passed a BSTR*
argument (for a
BSTR
argument it is OK, but as I showed above, we need pointers for correctly passing binary strings).
In this case the only purpose of the use of _bstr_t
wrapper would be to ensure the automatic deallocation.
For example if we would like to use _bstr_t
for calling the Binary2Hex()
method, the code
snippet on the client side would be something like this:
try
{
IHexObjPtr pIHexObj(__uuidof(HexObj));
BSTR bstrBin = ::SysAllocStringLen(L"ABC\0DEF", 7);
_bstr_t _bstrBin(bstrBin, false);
BSTR bstrHex = ::SysAllocString(L"");
_bstr_t _bstrHex(bstrHex, false);
pIHexObj->Binary2Hex(&bstrBin, &bstrHex);
cout << (char*)_bstrHex << endl;
}
catch(_com_error const& re)
{
cout << "HRESULT Message: " << re.ErrorMessage() << endl;
cout << "Description: " << (char*)re.Description() << endl;
}
Implementation
In order to address all the above presented problems I decided to implement my own BSTR
wrapper specialized for
binary data (it works correctly with text data too). I am giving below only the interface,
the implementation details being in the associated project:
class CBinBstr
{
public:
CBinBstr(wchar_t const* const& rpwStr=L"", int iLen=0);
CBinBstr(unsigned char const* bytes, int iLen=0);
CBinBstr(BSTR* pBSTR, bool bCopy= false);
CBinBstr(CBinBstr const& rBstr);
virtual ~CBinBstr();
int Compare(wchar_t const* pwStr, int iLen=0) const;
int Compare(CBinBstr const& rBstr) const;
int Length() const;
BSTR Copy() const;
bool IsEmpty() const;
void Empty();
wchar_t GetAt(int nIndex) const;
void SetAt(int nIndex, wchar_t ch);
void ToBytes(unsigned char* bytes, int& riLen) const;
void BinaryToHex();
void HexToBinary();
wchar_t operator[](int nIndex) const;
operator BSTR*();
operator BSTR&();
CBinBstr& operator=(CBinBstr const& rBstr);
CBinBstr& operator=(wchar_t const* pwszStr);
friend bool operator==(CBinBstr const& rBstr1, CBinBstr const& rBstr2);
friend bool operator==(CBinBstr const& rBstr, wchar_t const* pwszStr);
friend bool operator==(wchar_t const* pwszStr, CBinBstr const& rBstr);
friend bool operator!=(CBinBstr const& rBstr1, CBinBstr const& rBstr2);
friend bool operator!=(CBinBstr const& rBstr, wchar_t const* pwszStr);
friend bool operator!=(wchar_t const* pwszStr, CBinBstr const& rBstr);
friend bool operator<(CBinBstr const& rBstr1, CBinBstr const& rBstr2);
friend bool operator<(CBinBstr const& rBstr, wchar_t const* pwszStr);
friend bool operator<(wchar_t const* pwszStr, CBinBstr const& rBstr);
friend bool operator>(CBinBstr const& rBstr1, CBinBstr const& rBstr2);
friend bool operator>(CBinBstr const& rBstr, wchar_t const* pwszStr);
friend bool operator>(wchar_t const* pwszStr, CBinBstr const& rBstr);
friend bool operator<=(CBinBstr const& rBstr1, CBinBstr const& rBstr2);
friend bool operator<=(CBinBstr const& rBstr, wchar_t const* pwszStr);
friend bool operator<=(wchar_t const* pwszStr, CBinBstr const& rBstr);
friend bool operator>=(CBinBstr const& rBstr1, CBinBstr const& rBstr2);
friend bool operator>=(CBinBstr const& rBstr, wchar_t const* pwszStr);
friend bool operator>=(wchar_t const* pwszStr, CBinBstr const& rBstr);
CBinBstr& operator+=(CBinBstr const& rBstr);
CBinBstr& operator+=(wchar_t const* pwszStr);
friend CBinBstr operator+(CBinBstr const& rBstr1, CBinBstr const& rBstr2);
friend CBinBstr operator+(CBinBstr const& rBstr, wchar_t const* pwszStr);
friend CBinBstr operator+(wchar_t const* pwszStr, CBinBstr const& rBstr);
friend std::wostream& operator<<(std::wostream& s, CBinBstr const& rBstr);
};
Now look how easy and elegant it is to use, compared to the _bstr_t
case above! Let's consider the following code
snippet on the client side:
try
{
IHexObjPtr pIHexObj(__uuidof(HexObj));
CBinBstr oBstrBin(L"ABC\0DEF", 7);
CBinBstr oBstrHex;
pIHexObj->Binary2Hex(oBstrBin, oBstrHex);
wcout << (BSTR&)oBstrHex << endl;
}
catch(_com_error const& re)
{
cout << "HRESULT Message: " << re.ErrorMessage() << endl;
cout << "Description: " << (char*)re.Description() << endl;
}
The string length this time will be the correct value 7, not 3 like for _bstr_t
.
The initialization in the default constructor is done automatically to L""
.
It can be passed as BSTR*
argument (conversion done automatically by the BSTR*
operator).
It can be easily printed and in the case an exception is thrown, the destructor will take care of the deallocation.
Is this really the end of the troubles? Not really, but I hope it is making life easier.
Working with binary data is requires a lot of discipline from the programmer. For example
if in the definition
CBinBstr oBstrBin(L"ABC\0DEF", 7);
the programmer is putting 20 instead of 7, some undefined results can be generated.
But this is a general problem when working with binary strings.
Some possible sources of errors which should still be considered are:
- Declaring a string size larger then the real string size (as in the example above). In this
case you should know what you are doing.
- Taking ownership of an external unallocated
BSTR
. Generally this should be avoided,
but you would need to do it inside functions to take ownership of the BSTR*
pointer arguments (a case for which
it would be safer if, in the calling function, you passed a CBinBstr
).
- Abusing the
BSTR*
and BSTR&
conversion operators. These operators should be used only
when you need to pass arguments (conversion which is done automatically) or when you need to print
the contents with wcout
(this is also stops at the first 0, therefore is better for printing before
transformation into hexa format using the BinaryToHex()
method). Otherwise all the operations
should be done inside the wrapper, without direct access to the encapsulated BSTR
.
If you follow the rules the problems can be kept under control!
Conclusion
The project zip file BinBstr.zip attached to this article includes the source code of the
presented CBinBstr
class and a test program. I am interested in any opinions and new ideas about
this implementation.