Introduction
This article describes xmlValidator, a simple command-line
utility that validates XML files, using MSXML. It is written using the VOLE C++/COM Automation driver library,
allowing the application to be extremely succinct without binding it to a
particular compiler (vendor).
Background
When working with a client recently, we had a requirement to validate the
correctness of several hundred XML files, in verifying the configuration of an
enterprise Java system. Naturally, we did not want to do this manually, so I
knocked together the tool described in this article,
xmlValidator, in just a few minutes, and we were able to run
the validation in batch mode.
The application has very simple functionality: open and parse the XML file
passed as its single command-line argument. If the XML file can be opened and is
valid, the application follows the "Rule of Silence", and does nothing, and
returns EXIT_SUCCESS
. If the XML file cannot be opened, or is not
XML, or has errors, then the application emits a description of the error and
its location, and returns EXIT_FAILURE
.
Because we were on a Windows system, we decided to use Microsoft's XML
parser, MSXML. MSXML is a COM component. There are several ways to use it from a
C++ program. You can, if you want the heartache, program directly to the COM
interfaces - but that's a lot of work. It's a lot easier to use a
wrapper library. One such library is VOLE, which is a compiler-independent,
open-source project that I released earlier this year. By using
VOLE, we were able to effect all the COM operations - create
the MSXML XML Document object, cause it to parse an XML file, and elicit parse
error information from it - in just six lines of code:
object xmlDocument = object::create("Msxml2.DOMDocument");
bool success = xmlDocument.invoke_method<bool>(L"load", argv[1]);
object parseError = xmlDocument.get_property<object>(L"ParseError");
std::string reason = parseError.get_property<std::string>(L"reason");
long line = parseError.get_property<long>(L"line");
long linePos = parseError.get_property<long>(L"linepos");
Implementation
Application Structure
The basic structure of the application is as follows:
int main(int argc, char** argv)
{
Strategy:
- Check arguments
- Initialise the COM libraries
- Declare VOLE components to be "use"d
- Create an instance of the MSXML document object
- Load the XML
- If succeeded, return success code to outside world
- If failed, elicit parsing error details and display
. . .
}
Includes
Naturally, we need to ensure we have all the requisite
#include
s. As well as including the main VOLE
header file, vole/vole.hpp, we also include the header files for two
components from the STLSoft
libraries, and the requisite standard C
and C++ header files:
#include <vole/vole.hpp> // for VOLE
#include <comstl/util/initialisers.hpp> // for comstl::com_initialiser;
#include <winstl/error/error_desc.hpp> // for winstl::error_desc
#include <iostream> // for std::cout, std::cerr,
#include <string> // for std::string
#include <stdlib.h> // For EXIT_SUCCESS,
EXIT_FAILURE
Step 1: Check arguments
This is pretty boilerplate stuff:
if(2 != argc)
{
std::cerr << "USAGE: xmlValidator <xml-file>" << std::endl;
}
else try
{
. . .
}
catch( . . . )
{
. . .
}
Step 2: Initialize the COM libraries
This is done using the first of the STLSoft
components, the
com_initialiser
from the COMSTL
sub-project. As the comments explain, this ensures that the initialization and
un-initialization of the COM libraries is handled appropriately.
This is done by creating a local instance of the
comstl::com_initialiser
component, which employs
internally-initialized RAII to initialize the COM libs in its ctor, and
un-initialize them (if successfully initialized) in its dtor.
comstl::com_initialiser coinit;
Step 3: Declare the VOLE components to be used
As with most C++ libraries, the VOLE components are defined
within a namespace, vole
. We use "using declarations" to save
ourselves the eye-strain (and finger-ache) of having to qualify each use of a
VOLE component.
The two main public types provided by VOLE are
object
and collection
. vole::object
is a
generic wrapper for a COM server. vole::collection
is a generic
wrapper for a COM "collection", and provides STL-compatible iterators for
enumerating the collection's elements. vole::collection
is not
needed in this utility, but I plan to write a follow up article illustrating its
use. If you can't wait for that, feel free to check out the examples here.
using vole::object;
using vole::of_type;
vole::of_type
is a helper function template that is used by old
compilers (i.e. VC++ 6) which have problems with the standard
VOLE syntax usable by all modern compilers. That'll be
explained in the coming sections. We'll discriminate between the alternate forms
shown in the following code using the XMLVALIDATOR_USE_OLD_SYNTAX
pre-processor symbol, defined as follows:
#if defined(STLSOFT_COMPILER_IS_MSVC) && \
_MSC_VER == 1200
# define XMLVALIDATOR_USE_OLD_SYNTAX
#endif
Step 4: Create the MSXML server and wrap it.
This is very simple, using the static method
vole::object::create()
, as follows:
Create an instance of the MSXML document object. We use the static
object::create()
method, which can take either a CLSID, a
string-form of a CLSID, or, as in this case, a ProgId. If it fails, a
vole_exception will be thrown.
object xmlDocument = object::create("Msxml2.DOMDocument");
This method has three overloads, allowing creation via a CLSID, a ProgId, or
the string-form of a CLSID (i.e.
"{F6D90F11-9C73-11D3-B32E-00C04F990BB4}"
). Each method has two
additional defaulted parameters, with which you can specify the creation context
(e.g. CLSCTX_ALL
) and the "coercion level" - the degree of effort
with which returned values will be coerced from the Automation type
VARIANT
to C++ types. Neither of these two will feature further in
this article.
Step 5: Load the XML
Once again, this is a very simple operation,
involving one line:
bool success =
xmlDocument.invoke_method<bool>(L"load", argv[1]);
Unfortunately for users of Visual C++ 6.0, this syntax makes the compiler
have a cow. This is where the vole::of_type()
function template
comes in. It is used for the sole purpose of providing a type-advisory to the
vole::object::invoke_method
and
vole::object::get_property
method templates. Hence, the actual code
for step 5 is as follows:
#ifdef XMLVALIDATOR_USE_OLD_SYNTAX
bool success = xmlDocument.invoke_method(of_type<bool>(), L"load", argv[1]);
#else
bool success = xmlDocument.invoke_method<bool>(L"load", argv[1]);
#endif
Step 6: Parsing success
This is very simple. All we do is return EXIT_SUCCESS
.
if(success)
{
return EXIT_SUCCESS;
}
Step 7: Parsing failure
This is the grist of our dissertation. If we fail, we need to elicit from the
XML Document instance its ParseError project (also an automation object), and
then elicit from it the details of the error. All the values are obtained from
properties, via the vole::object::get_property
method templates.
else
{
object parseError = xmlDocument.get_property<object>(L"ParseError");
std::string reason = parseError.get_property<std::string>(L"reason");
long line = parseError.get_property<long>(L"line");
long linePos = parseError.get_property<long>(L"linepos");
std::cout << "Parse error at (" << line << ", " << linePos << "): " << reason <<
std::endl;
}
VOLE provides support for returning values of other COM
objects, in the form of vole::object
, and for most common C++
types, including long
and std::string
, so all the
above code just works. If you wish to obtain a type not supported you can
specialize the vole::com_return_traits
traits class template; this
is outside the scope of this discussion, but will be covered in a future
article.
Just as we saw with the method call, the syntax shown above causes
consternation with Visual C++, but there is an alternate syntax that works with
all compilers. So, once again, the application code for step 6 actually contains
the following:
else
{
#ifdef XMLVALIDATOR_USE_OLD_SYNTAX
object parseError = xmlDocument.get_property(of_type<object>(), L"ParseError");
std::string reason = parseError.get_property(of_type<std::string>(), L"reason");
long line = parseError.get_property(of_type<long>(), L"line");
long linePos = parseError.get_property(of_type<long>(), L"linepos");
#else
object parseError = xmlDocument.get_property<object>(L"ParseError");
std::string reason = parseError.get_property<std::string>(L"reason");
long line = parseError.get_property<long>(L"line");
long linePos = parseError.get_property<long>(L"linepos");
#endif
std::cout << "Parse error at (" << line << ", " << linePos << "): " << reason << std::endl;
}
Handling errors
Because VOLE returns objects and values from its (method and
property) functions, it indicates errors by throwing exceptions, those derived
from vole::vole_exception
. Thus, the last part of the application
comprises two catch clauses, as follows:
catch(vole::vole_exception &x)
{
std::cerr << "Validation failed: " << x.what() << ": " << winstl::basic_error_desc<char>(x.hr()) << std::endl;
}
catch(std::exception &x)
{
std::cerr << "Validation failed: " << x.what() << std::endl;
}
return EXIT_FAILURE;
The second is a generic clause that will catch all standard exceptions,
including std::bad_alloc
. The first is more interesting. It catches
vole::vole_exception
, which derives from the
COMSTL exception class comstl::com_exception
,
which has a hr()
accessor that returns the COM error code
(HRESULT
) associated with the error. This is then used with an
ANSI/multibyte WinSTL class template
basic_error_desc
, which is a helper Facade for the
Win32 API function FormatString()
. For example, if we change the
ProgId to, say, Msxml99999.DOMDocument, the program prints out:
Validation failed: Could not create coclass: 800401f3, Invalid class
string
which is a lot more useful than:
Validation failed: Could not create coclass
Note: VOLE is a new, and still developing library, and I've
not yet completed the actual implementation of the rich exception hierarchy.
Thus, all the VOLE exception types:
vole::vole_exception
vole::creation_exception
vole::invocation_exception
vole::type_conversion_exception
are currently just aliases for comstl::com_exception
, so don't
go trying multiple catch clauses involving different VOLE
exception types just yet. (Of course, if you want to pitch in on the project to
drive along this or any other remaining issues, you'll be most welcome to here.)
Setting up the environment
To build the program, you'll need to have access to the VOLE
and STLSoft libraries. Both are open-source. Both use the modified BSD
license. The latest version of VOLE is 0.2.2,
and this requires STLSoft version 1.9.1 beta 47, or later.
Since both libraries are 100% header-only, setting up for their use involves
nothing more than downloading and setting up the requisite environment
variables. I suggest the environment variables VOLE
(e.g.
VOLE=C:\ThirdPartyLibs\VOLE\vole-0.2.2) and STLSOFT
(e.g.
STLSOFT=C:\ThirdPartyLibs\STLSoft\stlsoft-1.9.1-beta47). Then you can
either incorporate the include paths VOLE/include
and
STLSOFT/include
into your project settings, as in:
or on the command-line, as in:
C:\ThirdPartyTools\xmlValidator>cl -nologo -EHsc
-I%VOLE%/include -I%STLSOFT%/include -DWIN32 -D_CRT_SECURE_NO_DEPRECATE
..\xmlValidator.cpp ole32.lib oleaut32.lib
Using the xmlValidator tool
The following two simple XML files illustrate how easy the tool is to use.
First a correctly formed XML file.
good.xml:
="1.0"
="UTF-8"
<good>
<no-problem-here />
</good>
Now a badly formed one.
bad.xml:
="1.0"
="UTF-8"
<bad>
<problem-here>
</bad>
The following graphic shows the build command and the responses of the tool
to these two XML files:
More to come ...
I plan to write another article about VOLE soon,
illustrating the vole::collection
class's ability to provide STL
iterators over a COM Collection's elements (via the IEnumXXXX
protocol).
Your comments/criticisms/feature requests for VOLE are
welcome via the VOLE project home here.
Your comments/criticisms/feature requests for STLSoft are
welcome via the STLSoft newsgroup (here), which is kindly
provided by Digital Mars, providers of
free high-quality C/C++/D compilers.
History
7th April 2007: First version