Back to the WFC main page
CExtensibleMarkupLanguageDocument
$Revision: 44 $
Description
This class is the mother of all
XML
classes. It holds the
things like the element tree and settings that apply to
the entire document. It is designed to help application developers
handle XML-like data. It will parse (and construct) well formed,
standalone XML documents. It will also allow you to loosen the
parsing rules when dealing with XML from sources you can't control.
Construction
CExtensibleMarkupLanguageDocument()
CExtensibleMarkupLanguageDocument( const CExtensibleMarkupLanguageDocument& source )
-
Creates another CExtensibleMarkupLanguageDocument.
Methods
BOOL AddCallback( const char * element_name, XML_ELEMENT_CALLBACK callback, void * callback_parameter )
-
Allows you to specify a function (and a parameter for that function) that
will be called when an element with a tag matching
element_name
has been successfully parsed. The element_name
comparison
is not case sensitive.
void Append( const CExtensibleMarkupLanguageDocument& source )
-
Appends the elements of
source
to this document.
void Copy( const CExtensibleMarkupLanguageDocument& source )
-
Copies the contents of
source
to this object. It will not
copy the callback functions as this may cause unintentional results.
void CopyCallbacks( const CExtensibleMarkupLanguageDocument& source )
-
Copies the callback functions from
source
to this object.
If you are a careful programmer, this is perfectly safe to do. Generally
speaking, you shouldn't have to copy the callbacks of source
because parsing should have already taken place.
DWORD CountElements( const CString& element_name ) const
-
Counts the number of elements.
element_name
takes much the
same form as used in the GetElement() method.
Consider the following
XML
snippet:
<Southpark>
<Characters>
<Boy>Cartman</Boy>
<Boy>Kenny</Boy>
<Boy>Kyle</Boy>
<Boy>Stan</Boy>
</Characters>
<Characters>
<Girl>Wendy</Girl>
<Boy>Chef</Boy>
<Girl>Ms. Ellen</Girl>
</Characters>
</Southpark>
If you wanted to know how many "Boy" elements there
are in the first set of characters, you would use an element name
of "SouthPark.Characters"
If you wanted to
know how many "Girl" elements there are in the second
set of characters, you would use this for element_name
:
"Southpark.Characters(1).Girl"
void Empty( void )
-
Empties the contents of the document. The object is reset to an
intial state. All elements are deleted. All callbacks are deleted.
BOOL EnumerateCallbacks( DWORD& enumerator ) const
-
Initializes the
enumerator
in preparation for calling
GetNextCallback(). If there are
no callbacks (i.e. AddCallback() has
not been called), FALSE will be returned. If there are callbacks, TRUE
will be returned.
void ExecuteCallbacks( CExtensibleMarkupLanguageElement * element_p )
-
This is generally called during the parsing of a document by the
CExtensibleMarkupLanguageElement
that just parsed itself. However, you can pull an element out of the
document and call ExecuteCallbacks() yourself.
void GetAutomaticIndentation( BOOL& automatically_indent, DWORD& indentation_level, DWORD& indent_by ) const
-
Retrieves the automatic indentation parameters. Automatic indentation does
nothing but make the XML output look pretty. It makes it easier for humans
to read. If your application is sensitive to white space, don't use automatic
indentation.
DWORD GetConversionCodePage( void ) const
-
Returns the code page that will be used for conversion from UNICODE.
CExtensibleMarkupLanguageElement * GetElement( const CString& element_name ) const
-
Searches and finds the specified element in the document. The
element_name
is in the form of "Parent(0).Child(0)"
Consider the following
XML snippet:
<Southpark>
<Characters>
<Boy>Cartman</Boy>
<Boy>Kenny</Boy>
<Boy>Kyle</Boy>
<Boy>Stan</Boy>
</Characters>
<Characters>
<Girl>Wendy</Girl>
<Boy>Chef</Boy>
<Girl>Ms. Ellen</Girl>
</Characters>
</Southpark>
To retrieve the element for Cartman, element_name
should
be "Southpark.Characters.Boy" If you want Ms. Ellen (even
though she doesn't play for the home team) you would use
"Southpark.Characters(1).Girl(1)"
void GetEncoding( CString& encoding ) const
-
Returns the encoding of the document.
const CExtensibleMarkupLanguageEntities& GetEntities( void ) const
-
Returns a const reference to the entities for this document.
Basically all you can do with it is enumerate the entries.
BOOL GetIgnoreWhiteSpace( void ) const
-
Returns whether or not the document will suppress the output
of elements that contain only space characters. This output
occurs when you call WriteTo().
BOOL GetNextCallback( DWORD& enumerator, CString& element_name, XML_ELEMENT_CALLBACK& callback, void *& callback_parameter )
-
Retrieves the next callback. It will return TRUE if the callback has been
retrieved or FALSE if you are at the end of the list. If FALSE is returned,
all parameters are set to NULL. Callbacks are added via the
AddCallback() method.
DWORD GetNumberOfElements( void ) const
-
Returns the number of elements in this document.
TCHAR GetParentChildSeparatorCharacter( void ) const
-
Returns the character that will be used to separate parent element names
from child element names in the GetElement() method.
DWORD GetParseOptions( void ) const
-
Returns the parse options. This is a bit field (32 wide) that
controls the sloppiness of the parser.
void GetParsingErrorInformation( CString& tag_name, CParsePoint& beginning, CParsePoint& error_location, CString * error_message = NULL ) const
-
If
Parse()
returns FALSE, you can call this method to find out
interesting information as to where the parse failed. This will help you
correct the
XML.
If
error_message
is not NULL, it will be filled
with a human readable error message.
The beginning
parameter is filled with the location in
the document where the element began.
The error_location
parameter is filled with the location
where the parser encountered the fatal problem.
CExtensibleMarkupLanguageElement * GetRootElement( void ) const
-
Returns the pointer to the ultimate parent element. This will be the element
that contains the data from the
<?xml ... ?>
line.
void GetVersion( CString& version ) const
-
Returns the version of the document.
DWORD GetWriteOptions( void ) const
-
Returns the writing options. This is a bit field (32 wide) that
controls how the
XML
documents are written.
BOOL IsStandalone( void ) const
-
Returns TRUE if this is a standalone document.
BOOL Parse( const CDataParser& source )
-
Parses the data from
source
. This will construct
the document tree.
BOOL RemoveCallback( const char * element_name, XML_ELEMENT_CALLBACK callback, void * callback_parameter )
-
This will remove the specified callback from the list. All parameters
must match for the callback to be removed.
BOOL ResolveEntity( const CString& entity, CString& resolved_to ) const
-
This method will resolve the
entity
and put the result into resolved_to
.
If the entity cannot be resolved, it will return FALSE.
void SetAutomaticIndentation( BOOL automatically_indent = TRUE, DWORD starting_column = 0, DWORD indent_by = 2 )
-
This will turn automatic indentation on or off.
BOOL SetConversionCodePage( DWORD new_page )
-
When you must convert from UNICODE to something else, this is the
code page that will be used. See the
WideCharToMultiByte()
Win32 API
for more information. If the code is run on a real operating system (NT), the
default code page is CP_UTF8
. If you are running on a piece of crap
(Windows 95) the default code page is CP_ACP
.
void SetEncoding( LPCTSTR encoding )
-
Sets the encoding of the document. You will usuall do this when you are about
to write the document.
BOOL SetIgnoreWhiteSpace( BOOL ignore_whitespace )
-
Tells the document whether or not to ignore text segments that contain
only space characters. It returns what the previous setting was.
BOOL SetParentChildSeparatorCharacter( TCHAR separator )
-
Allows you to specify the character that will separate parent and child
names in the GetElement() call.
DWORD SetParseOptions( DWORD new_options )
-
Sets the parsing options. This allows you to customize the parser to
be as loose or as strict as you want. The default is to be as strict
as possible when parsing. SetParseOptions() returns the previous
options. Here are the current parse options that can be set:
WFC_XML_IGNORE_CASE_IN_XML_DECLARATION
- When set, this option
will allow uppercase letters in the XML declaration. For example:
<?XmL ?>
will be allowed even though it does not
conform to the
specification.
WFC_XML_ALLOW_REPLACEMENT_OF_DEFAULT_ENTITIES
- Though the
XML specification
doesn't talk about it, what should a parser do if default entities
are replaced? If you set this option, the parser will allow replacement
of the default entities. Here is a list of the default entities:
&
'
>
<
"
WFC_XML_FAIL_ON_ILL_FORMED_ENTITIES
- Not yet implemented.
It will allow the parser to ignore ill formed entities such as
<!ENTITY amp "&">
WFC_XML_IGNORE_ALL_WHITE_SPACE_ELEMENTS
- Tells the parser
to ignore elements (of type typeTextSegment
) that contain
nothing but white space characters. WARNING! If you use this option, it will
not be possible to reproduce that input file exactly. Elements that contain
nothing but white spaces will be deleted from the document.
WFC_XML_IGNORE_MISSING_XML_DECLARATION
- Tells the parser
to ignore the fact that the <?xml ?>
element is missing.
If it was not specified in the data stream, one will be automatically
added to the document. This is the default behavior.
WFC_XML_DISALLOW_MULTIPLE_ELEMENTS
- Tells the parser
to allow multiple elements to be present in the document. The first rule (Rule 1)
of the
XML specification
says (like Connor MacLeod of the clan MacLeod) There can be only one
element in an XML document. That element can have a billion child elements
but there can be only one root element. If this option is set
(it is not set by default), the parser will strictly enforce this rule. This rule
really gets in the way of using XML for things like log files (where you
want to open the file, append a record to it and close the file).
WFC_XML_LOOSE_COMMENT_PARSING
- Tells the parser
to allow double dashes (--) to appear in comment tags.
WFC_XML_ALLOW_AMPERSANDS_IN_ELEMENTS
- Tells the parser
to allow &'s to appear in the contents of an element without being
a reference of some kind..
void SetParsingErrorInformation( const CString& tag_name, const CParsePoint& beginning, const CParsePoint& error_location, LPCTSTR error_message = NULL )
-
This method is usually called by the element that cannot parse
itself. There is logic that prevents the information from being
overwritten by subsequent calls to SetParsingErrorInformation().
This means you can call SetParsingErrorInformation() as
many times as you want but only information from the first call
will be recorded (and reported via
GetParsingErrorInformation())
for each call to
Parse().
void SetStandalone( BOOL standalone )
-
Sets the standalone attribute of the document.
This is usually done just before you start writing the document.
void SetVersion( LPCTSTR version )
-
Sets the version of the document.
This is usually done just before you start writing the document.
DWORD SetWriteOptions( DWORD new_options )
-
Sets the writing options. This allows you to customize how the
XML is written.
The default is to be as strict as possible when writing.
SetWriteOptions() returns the previous
options. Here are the current options that can be set:
void WriteTo( CByteArray& destination )
-
Write the data to
destination
in
XML form.
Operators
CExtensibleMarkupLanguageDocument& operator = ( const CExtensibleMarkupLanguageDocument& source )
-
Calls Copy().
CExtensibleMarkupLanguageDocument& operator += ( const CExtensibleMarkupLanguageDocument& source )
-
Calls Append().
Example
#include <wfc.h>
#pragma hdrstop
BOOL get_bytes( const CString& filename, CByteArray& bytes )
{
WFCTRACEINIT( TEXT( "get_bytes()" ) );
bytes.RemoveAll();
CFile file;
if ( file.Open( filename, CFile::modeRead ) == FALSE )
{
return( FALSE );
}
bytes.SetSize( file.GetLength() );
file.Read( bytes.GetData(), bytes.GetSize() );
file.Close();
return( TRUE );
}
BOOL parse_document( const CString& filename, CExtensibleMarkupLanguageDocument& document )
{
WFCTRACEINIT( TEXT( "parse_document()" ) );
CByteArray bytes;
if ( get_bytes( filename, bytes ) != TRUE )
{
return( FALSE );
}
CDataParser parser;
parser.Initialize( &bytes, FALSE );
if ( document.Parse( parser ) == TRUE )
{
_tprintf( TEXT( "Parsed OK\n" ) );
}
else
{
_tprintf( TEXT( "Can't parse document\n" ) );
}
return( TRUE );
}
void stanza_callback( void * parameter, CExtensibleMarkupLanguageElement * element_p )
{
WFCTRACEINIT( TEXT( "stanza_callback()" ) );
_tprintf( TEXT( "Got a stanza with %lu children\n" ), (DWORD) element_p->GetNumberOfChildren() );
}
int _tmain( int number_of_command_line_arguments, LPCTSTR command_line_arguments[] )
{
WFCTRACEINIT( TEXT( "_tmain()" ) );
CExtensibleMarkupLanguageDocument document;
document.AddCallback( TEXT( "stanza" ), stanza_callback, NULL );
if ( parse_document( TEXT( "poem.xml" ), document ) == TRUE )
{
CByteArray bytes;
document.SetWriteOptions( WFC_XML_DONT_OUTPUT_XML_DECLARATION );
document.WriteTo( bytes );
_tprintf( TEXT( "Wrote %d bytes\n" ), bytes.GetSize() );
CFile file;
if ( file.Open( TEXT( "xml.out" ), CFile::modeCreate | CFile::modeWrite ) != FALSE )
{
file.Write( bytes.GetData(), bytes.GetSize() );
file.Close();
}
}
return( EXIT_SUCCESS );
}
Copyright, 2000, Samuel R. Blackburn
$Workfile: CExtensibleMarkupLanguageDocument.cpp $
$Modtime: 1/17/00 9:01a $