Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

A BSTR Wrapper for Operations with Binary Data

0.00/5 (No votes)
10 Dec 2002 1  
Presenting a C++ class for correct operations on BSTR strings with binary data contents

Introduction

In this article I will try to demonstrate why none of the existing BSTR wrappers are appropriate for managing BSTR objects with binary contents, and therefore the necessity to implement a specialized one for this purpose, which I have done.

Why another BSTR wrapper?

Recently I was involved in developing some ATL components for encryption/decryption operations. Since the results of encryption operations are not restricted only to a given set of characters, like in the case of text strings, some special methods for managing strings of binary data are needed. The BSTR data structure seems very appropriate for such kind of operations, since it can contain any characters and the string is not mandatory 0 terminated, the string length being specified before the string data. For example you can build a string of a given length like this:

BSTR bstr = ::SysAllocStringLen(L"ABCB\0DEFG", 9);

This string contains a 0 in the middle. The problem with this data structure it that it needs a wrapper in order to avoid some otherwise easy to make programming mistakes. In order to explain what can go wrong I will first introduce a simple COM component for converting data from binary strings to hexa strings and reversely. This is a subset of a real more complex component (notice that the code I give here is really enough for somebody willing to build the actual hexa converter component, but this is not the real purpose of this article). The IDL interface of this component is something like this (not all code details are given):

//...

interface IHexObj : IDispatch
{
  [id(10), helpstring("String convertion from Binary to Hexa")] 
         HRESULT Binary2Hex([in]BSTR* pbsBin, [out]BSTR* pbsHex);
  [id(11), helpstring("String convertion from Hexa to Binary")] 
         HRESULT Hex2Binary([in]BSTR* pbsHex, [out]BSTR* pbsBin);
};
//...

and the implementation is something like this (not all code details are given):

//Error Messages 

static char szError1[] = "HexCom ERROR: WideCharToMultiByte() conversion Error!";
static char szError2[] = "HexCom ERROR: MultiByteToWideChar() conversion Error!";
static char szError3[] = "HexCom ERROR in Hex2Binary(): in is not a Hex string!";
//...

//Some Auxiliary Functions

//Optimized Function to convert an unsigned char to a Hex string of length 2

void Char2Hex(unsigned char ch, char* szHex)
{
  static unsigned char saucHex[] = "0123456789ABCDEF";
  szHex[0] = saucHex[ch >> 4];
  szHex[1] = saucHex[ch&0xF];
  szHex[2] = 0;
}

//Function to convert a Hex string of length 2 to an unsigned char

bool Hex2Char(char const* szHex, unsigned char& rch)
{
  if(*szHex >= '0' && *szHex <= '9')
    rch = *szHex - '0';
  else if(*szHex >= 'A' && *szHex <= 'F')
    rch = *szHex - 55; //-'A' + 10

  else
    //Is not really a Hex string

    return false; 
  szHex++;
  if(*szHex >= '0' && *szHex <= '9')
    (rch <<= 4) += *szHex - '0';
  else if(*szHex >= 'A' && *szHex <= 'F')
    (rch <<= 4) += *szHex - 55; //-'A' + 10;

  else
    //Is not really a Hex string

    return false;
  return true;
}

//Function to convert binary string to hex string

void Binary2Hex(unsigned char const* pucBinStr, int iBinSize, char* pszHexStr)
{
  int i;
  char szHex[3];
  unsigned char const* pucBinStr1 = pucBinStr;
  *pszHexStr = 0;
  for(i=0; i<iBinSize; i++,pucBinStr1++)
  {
    Char2Hex(*pucBinStr1, szHex);
    strcat(pszHexStr, szHex);
  }
}

//Function to convert hex string to binary string

bool Hex2Binary(char const* pszHexStr, unsigned char* pucBinStr, int iBinSize)
{
  int i;
  unsigned char ch;
  for(i=0; i<iBinSize; i++,pszHexStr+=2,pucBinStr++)
  {
    if(false == Hex2Char(pszHexStr, ch))
      return false;
    *pucBinStr = ch;
  }
  return true;
}

STDMETHODIMP CHexObj::Binary2Hex(BSTR* pbsBin, BSTR* pbsHex)
{
  USES_CONVERSION;
  int iBinLen = ::SysStringLen(*pbsBin);
  char* pcBin = static_cast<char*>(_alloca(iBinLen));
  if(!WideCharToMultiByte(CP_ACP, 0, *pbsBin, iBinLen, pcBin, iBinLen, NULL, FALSE))
  {
    return Error(szError1, IID_IHexObj);
  }
  char* pcHex = static_cast<char*>(_alloca((iBinLen<<1)+1));
  ::Binary2Hex(reinterpret_cast<unsigned char*>(pcBin), iBinLen, pcHex);
  ::SysReAllocString(pbsHex, T2OLE(pcHex));
  return S_OK;
}

STDMETHODIMP CHexObj::Hex2Binary(BSTR* pbsHex, BSTR* pbsBin)
{
  USES_CONVERSION;
  int iBinLen = ::SysStringLen(*pbsHex);
  if(iBinLen&1 != 0)
  {
    return Error(szError3, IID_IHexObj);
  }
  iBinLen >>= 1;
  string ostrHex(OLE2T(*pbsHex));
  char* pcBin = static_cast<char*>(_alloca(iBinLen));
  if(false == ::Hex2Binary(ostrHex.c_str(), 
                           reinterpret_cast<unsigned char*>(pcBin), iBinLen))
  {
    return Error(szError3, IID_IHexObj);
  }
  WCHAR* pW = (WCHAR*)_alloca(iBinLen*sizeof(WCHAR));
  if(!MultiByteToWideChar(CP_ACP, 0, pcBin, iBinLen, pW, iBinLen))
  {
    return Error(szError2, IID_IHexObj);
  }
  ::SysReAllocStringLen(pbsBin, pW, iBinLen);
  return S_OK;
}

First notice that the input argument in method Binary2Hex() is also a pointer like the output argument:

HRESULT Binary2Hex([in]BSTR* pbsBin, [out]BSTR* pbsHex);

Somebody can argue that it is not necessary, but experimentally I have found out that if I use a signature like:

HRESULT Binary2Hex([in]BSTR bsBin, [out]BSTR* pbsHex);

and I want to pass a string like bstr defined above, then the line:

int iBinLen = ::SysStringLen(bsBin);

inside method Binary2Hex() would give length 4 instead of the correct value 9. It seems that during marshalling COM is creating a copy of the original string, but is stopping to the first 0. It works fine for text strings, but not for binary strings.

In conclusion, when you work with binary data you should transmit as pointers both the input and output BSTR arguments!

Now why do you need a wrapper for BSTR?

Notice that in both Binary2Hex() and Hex2Binary() methods I am using the ::SysReAllocStringLen() function for reallocating the output string before returning. If the BSTR argument is not already allocated on the client side, then this function is generating a wonderful crash on the client side. So what? somebody could argue, you could use the function ::SysAllocStringLen() instead, which is working fine in any situation. It is true, but if the string was already allocated on the client side, this ::SysAllocStringLen() function would generate memory leaks in the system. So in this case, for avoiding the memory leaks, the programmer should first, before calling the method, deallocate the string on the client side or he should ensure that the string is not initialized, but you cannot impose it on him, and there is no compiling or execution error if he is not doing it. Therefore I think the best solution is to use the reallocation functions on the server side and recommend to the programmer on the client side to systematically use a BSTR wrapper which is initializing the encapsulated BSTR to an empty string (or if he likes he can ensure that all the BSTR strings are allocated, which is more error prone and time consuming).

A second problem occurs when your component methods are throwing exceptions. In this case nobody is deallocating the BSTR strings allocated on the server side before exception occurring. A wrapper could do it in the destructor, so this is the second reason you should use one.

Now, analysing the wrappers already existing, I couldn't find one appropriate for the tasks I was concerned with, namely managing BSTR objects containing binary data. Let's consider for example the _bstr_t wrapper.

There is no constructor for specifying the string length, for example the code:

_bstr_t _bstr(L"ABC\0DEF");
cout << _bstr.length() << endl;

will print the length as 3, i.e. it is stopping to the first 0. A second idea is to first allocate the string and then take ownership of it, like:

BSTR bstr = ::SysAllocStringLen(L"ABC\0DEF", 7);
_bstr_t _bstr(bstr, false);
cout << _bstr.length() << endl;

with the fCopy flag false. This time the length has the correct value 7, but if you just make a copy of the external BSTR, like this:

BSTR bstr = ::SysAllocStringLen(L"ABC\0DEF", 7);
_bstr_t _bstr(bstr); //fCopy=true by default

cout << _bstr.length() << endl;

you get the same wrong result, 3. It seems that there are inherent difficulties in _bstr_t to make copies of BSTR objects with binary contents.

Another problem with _bstr_t is that it cannot be passed a BSTR* argument (for a BSTR argument it is OK, but as I showed above, we need pointers for correctly passing binary strings). In this case the only purpose of the use of _bstr_t wrapper would be to ensure the automatic deallocation. For example if we would like to use _bstr_t for calling the Binary2Hex() method, the code snippet on the client side would be something like this:

//...

try
{
  //Create the object

  IHexObjPtr pIHexObj(__uuidof(HexObj));
  //Allocation outside the wrapper

  BSTR bstrBin = ::SysAllocStringLen(L"ABC\0DEF", 7);
  //Take ownership

  _bstr_t _bstrBin(bstrBin, false);
  //Allocation outside the wrapper

  BSTR bstrHex = ::SysAllocString(L"");
  //Take ownership

  _bstr_t _bstrHex(bstrHex, false);
  //Still need direct access to the encapsulated BSTRs

  pIHexObj->Binary2Hex(&bstrBin, &bstrHex);
  cout << (char*)_bstrHex << endl;
}
catch(_com_error const& re)
{
  cout << "HRESULT Message: " << re.ErrorMessage() << endl;
  cout << "Description: " << (char*)re.Description() << endl;
}
//...

Implementation

In order to address all the above presented problems I decided to implement my own BSTR wrapper specialized for binary data (it works correctly with text data too). I am giving below only the interface, the implementation details being in the associated project:

class CBinBstr
{
public:
  //Constructor

  CBinBstr(wchar_t const* const& rpwStr=L"", int iLen=0);
  //From Bytes

  CBinBstr(unsigned char const* bytes, int iLen=0);
  //Copy or Take Ownership depending on the bCopy flag

  CBinBstr(BSTR* pBSTR, bool bCopy= false);
  //Copy Constructor

  CBinBstr(CBinBstr const& rBstr);
  //Destructor

  virtual ~CBinBstr();
  //Comparison Functions

  int Compare(wchar_t const* pwStr, int iLen=0) const;
  int Compare(CBinBstr const& rBstr) const;
  //Length

  int Length() const;
  //Returns a copy of the encapsulate BSTR

  BSTR Copy() const;
  //Check if Empty

  bool IsEmpty() const;
  //Make Empty

  void Empty();

  wchar_t GetAt(int nIndex) const;
  void SetAt(int nIndex, wchar_t ch);

  void ToBytes(unsigned char* bytes, int& riLen) const;

  //Transform from Binary to Hex

  void BinaryToHex();
  //Transform from Hex to Binary

  void HexToBinary();
  //Operators:

  wchar_t operator[](int nIndex) const;
  //Pointer to BSTR

  operator BSTR*();
  //Reference to BSTR

  operator BSTR&();
  //Assignment Operator

  CBinBstr& operator=(CBinBstr const& rBstr);
  //Conversions from wchar_t*

  CBinBstr& operator=(wchar_t const* pwszStr);
	
  friend bool operator==(CBinBstr const& rBstr1, CBinBstr const& rBstr2);
  friend bool operator==(CBinBstr const& rBstr, wchar_t const* pwszStr);
  friend bool operator==(wchar_t const* pwszStr, CBinBstr const& rBstr);
  friend bool operator!=(CBinBstr const& rBstr1, CBinBstr const& rBstr2);
  friend bool operator!=(CBinBstr const& rBstr, wchar_t const* pwszStr);
  friend bool operator!=(wchar_t const* pwszStr, CBinBstr const& rBstr);
  friend bool operator<(CBinBstr const& rBstr1, CBinBstr const& rBstr2);
  friend bool operator<(CBinBstr const& rBstr, wchar_t const* pwszStr);
  friend bool operator<(wchar_t const* pwszStr, CBinBstr const& rBstr);
  friend bool operator>(CBinBstr const& rBstr1, CBinBstr const& rBstr2);
  friend bool operator>(CBinBstr const& rBstr, wchar_t const* pwszStr);
  friend bool operator>(wchar_t const* pwszStr, CBinBstr const& rBstr);
  friend bool operator<=(CBinBstr const& rBstr1, CBinBstr const& rBstr2);
  friend bool operator<=(CBinBstr const& rBstr, wchar_t const* pwszStr);
  friend bool operator<=(wchar_t const* pwszStr, CBinBstr const& rBstr);
  friend bool operator>=(CBinBstr const& rBstr1, CBinBstr const& rBstr2);
  friend bool operator>=(CBinBstr const& rBstr, wchar_t const* pwszStr);
  friend bool operator>=(wchar_t const* pwszStr, CBinBstr const& rBstr);

  //Concatenation Operator

  CBinBstr& operator+=(CBinBstr const& rBstr);
  CBinBstr& operator+=(wchar_t const* pwszStr);

  friend CBinBstr operator+(CBinBstr const& rBstr1, CBinBstr const& rBstr2);
  friend CBinBstr operator+(CBinBstr const& rBstr, wchar_t const* pwszStr);
  friend CBinBstr operator+(wchar_t const* pwszStr, CBinBstr const& rBstr);

  //Printing with wide streams. Printing is stopping at first 0. Is recommended to call

  //first BinaryToHex for correct results.

  friend std::wostream& operator<<(std::wostream& s, CBinBstr const& rBstr);
};

Now look how easy and elegant it is to use, compared to the _bstr_t case above! Let's consider the following code snippet on the client side:

//...

try
{
  //Create the object

  IHexObjPtr pIHexObj(__uuidof(HexObj));
  CBinBstr oBstrBin(L"ABC\0DEF", 7);
  CBinBstr oBstrHex; //initialized to L""

  //Can be passed as BSTR* argument

  pIHexObj->Binary2Hex(oBstrBin, oBstrHex);
  //Can be easily printed

  wcout << (BSTR&)oBstrHex << endl;
}
catch(_com_error const& re)
{
  cout << "HRESULT Message: " << re.ErrorMessage() << endl;
  cout << "Description: " << (char*)re.Description() << endl;
}
//...

The string length this time will be the correct value 7, not 3 like for _bstr_t. The initialization in the default constructor is done automatically to L"". It can be passed as BSTR* argument (conversion done automatically by the BSTR* operator). It can be easily printed and in the case an exception is thrown, the destructor will take care of the deallocation.

Is this really the end of the troubles? Not really, but I hope it is making life easier. Working with binary data is requires a lot of discipline from the programmer. For example if in the definition

CBinBstr oBstrBin(L"ABC\0DEF", 7);

the programmer is putting 20 instead of 7, some undefined results can be generated. But this is a general problem when working with binary strings. Some possible sources of errors which should still be considered are:

  • Declaring a string size larger then the real string size (as in the example above). In this case you should know what you are doing.
  • Taking ownership of an external unallocated BSTR. Generally this should be avoided, but you would need to do it inside functions to take ownership of the BSTR* pointer arguments (a case for which it would be safer if, in the calling function, you passed a CBinBstr).
  • Abusing the BSTR* and BSTR& conversion operators. These operators should be used only when you need to pass arguments (conversion which is done automatically) or when you need to print the contents with wcout (this is also stops at the first 0, therefore is better for printing before transformation into hexa format using the BinaryToHex() method). Otherwise all the operations should be done inside the wrapper, without direct access to the encapsulated BSTR.

If you follow the rules the problems can be kept under control!

Conclusion

The project zip file BinBstr.zip attached to this article includes the source code of the presented CBinBstr class and a test program. I am interested in any opinions and new ideas about this implementation.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here