Xerces is a powerful validating XML parser that supports both DOM and SAX. It’s written in a simple subset of C++, and designed to be portable across the greatest possible number of platforms. For a number of reasons, the strings used in Xerces are zero-terminated 16-bit integer arrays, and data tends to be passed around by pointers. The responsibility for managing the lifetime of the DOM data passed around is usually Xerces’, but not always. Some types must always be released explicitly, while for others, this is optional.
In other words, this is a job for the RAII idiom. Alas, we can't reach for our boost::shared_ptr
[1] or std::auto_ptr
, since Xerces has its own memory manager, and when Xerces creates an object for you, it is not guaranteed to be safe to simply call delete
. Instead, you must call the object’s release()
function.
Something like this would probably do the job:
class auto_xerces_ptr
{
DOMNode* item_;
public:
auto_xerces_ptr(DOMNode* i)
: item_(i)
{}
~auto_xerces_ptr()
{
item_->release();
}
DOMNode* get()
{
return item_;
}
};
...
auto_xerces_ptr domDocument(parser->adoptDocument());
domDocument.get()->getDocumentElement();
...
However, while the DOMNode
class serves as base class for all the classes that need to be released, most of the classes it is base for do not need to be released explicitly. (See documentation for full list.) While they usually can be released without ill effects, it’s probably safest to avoid releasing objects that are already looked after elsewhere. Basically, if the object has an owner, we should leave it alone. So let’s amend that destructor a bit, and add some extra safety and helpfulness.
~auto_xerces_ptr()
{
xerces_release();
}
void xerces_release()
{
if ((0 != item_) && (0 == item_->getOwnerDocument()))
{
item_->release();
item_ = 0;
}
}
DOMNode* yield()
{
DOMNode* temp = item_;
item_ = 0;
return temp;
}
As you see, I've made a function to explicitly release, should you wish to do so, with some sanity checking, and a function to give up the held pointer. Because nomenclature can never be simple and common, I've chosen to call the releasing function xerces_release()
rather than simply release()
, because the std::auto_ptr
, which is a quite well known RAII utility class, also has a function called release()
. In that case, however, it doesn't release the memory safely, like Xerces does, but its hold of the data, like my function yield()
above. Without looking at the actual implementation, someone seeing an auto_xerces_ptr::release()
function being called in the code might think it does a Xerces DOMNode::release()
, or that it does the equivalent of std::auto_ptr::release()
. Rather than risk that sort of confusion, I've opted for the verbose.
Now, that’s all fine and dandy, but doesn't help with the biggest Xerces memory leaker – the strings. The Xerces type XMLCh
is a UTF-16 character, and there is a helpful class – XMLString – to help you convert between XMLCh*
and other formats, particularly char*
, and copy these strings. We don't have to worry about any strings we have given to a Xerces object, since these are well managed internally. However, we must be wary when making copies, with the XMLString::replicate
and XMLString::transcode
functions, as they create strings we are responsible for, and which we must release with a call to the XMLString::release
function.
...
const XMLCh* s1 = pNode1->getNodeValue(); XMLCh s2 = XMLString::replicate(s1);
...
XMLString::release(s2);
char* s3 = XMLString::transcode(s1);
...
XMLString::release(s3);
std::string s4 = XMLString::transcode(s1);
Takes you back, doesn't it? Just like the olden days, before std::string
(and TString
, and CString
and …) when strings were pure C like K&R intended. [shudder]
So, that’s just another couple of classes to write, right? One to manage XMLCh*
and one to manage char*
. Let’s call them auto_xerces_XMLCH_ptr
and auto_xerces_char_ptr
… No, scrap that, that’s bad design. Instead, let’s extend the auto_xerces_ptr
to handle multiple types. In other words, let’s make it a template class:
template <typename T>
class auto_xerces_ptr
{
T* item_;
public:
auto_xerces_ptr(T* i)
: item_(i)
{}
~auto_xerces_ptr()
{
item_->release();
...
Hang on, that won't work; there’s no release()
member function for char
. If the data type is a XMLCh
or char
, we must call XMLString::release
, otherwise we should call the data object’s member function. Can we have an internal releasing function – let’s call it do_release
– and overload it? Well, not quite:
template <typename T>
class auto_xerces_ptr
{
void do_release(T* i)
...
void do_release(char* i)
Here, the compiler will complain that for a auto_xerces_ptr<char>
there are two definitions of void do_release(char* i)
. However, you can achieve the desired functionality through template specialisation, where you tell the compiler that for a certain template type, it should use a specialised function (or class, in the case of class templates) rather than the generic one.
template <typename T>
class auto_xerces_ptr
{
auto_xerces_ptr(const auto_xerces_ptr&);
auto_xerces_ptr& operator=(const auto_xerces_ptr&);
template <typename T>
static void do_release(T*& item)
{
if (0 == item->getOwnerDocument())
item->release();
}
template <>
static void do_release(char*& item)
{
XMLString::release(&item);
}
template <>
static void do_release(XMLCh*& item)
{
XMLString::release(&item);
}
T* item_;
public:
auto_xerces_ptr()
: item_(0)
{}
explicit auto_xerces_ptr(T* i)
: item_(i)
{}
~auto_xerces_ptr()
{
xerces_release();
}
void operator=(T* i)
{
reassign(i);
}
void xerces_release()
{
if (!is_released())
{
do_release(item_);
item_ = 0;
}
}
T* yield()
{
T* tempItem = item_;
item_ = 0;
return tempItem;
}
void assign(T* i)
{
xerces_release();
item_ = i;
}
T* get()
{
return item_;
}
bool is_released() const
{
return (0 == item_);
}
};
auto_xerces_ptr domDocument(parser->adoptDocument());
...
const XMLCh* s1 = pNode1->getNodeValue(); auto_xerces_ptr<XMLCh> s2(XMLString::replicate(s1));
...
auto_xerces_ptr<char> s3(XMLString::transcode(s1));
...
std::string s4 = auto_xerces_ptr<char>(XMLString::transcode(s1)).get();
There it is, code completed. We don't even have to worry about accidentally using it to wrap a string
that is pointing to element data, since those are given as const XMLCh*
, and the compiler will complain that there is no constructor for auto_xerces_ptr
that takes a const
pointer. Take it for a spin and see if it’s useful for you, and let me know what you think.
[1] Now also available as tr1::shared_ptr
, and soon (at the time of writing) as std::shared_ptr
.
Tagged:
C++,
template,
Xerces,
XML