Back to the WFC main page
CDataParser
$Revision: 29 $
Description
This class is a generic class to assist in parsing data.
It provides some basic searching capability as well as
idiot proofed retrieval.
Construction
CDataParser()
-
Constructs the object.
Methods
void AdvanceByOneCharacter( CParsePoint& parse_point, DWORD character = 0 ) const
-
Advances the parse point by at least one character.
parse_point
will be incremented (see AutoIncrement())
by the character it finds. If you want to be incremented based on
a character that you want, put that character's value in character
.
void Empty( void )
-
Re-initializes the object. If a CByteArray was attached
(and set to automatically delete) it will be deleted.
BOOL Find( const CParsePoint& parse_point, BYTE byte_to_find, CParsePoint& found_at ) const
BOOL Find( const CParsePoint& parse_point, const CString& string_to_find, CParsePoint& found_at ) const
BOOL Find( const CParsePoint& parse_point, const CByteArray& bytes_to_find, CParsePoint& found_at ) const
-
Searches for
byte_to_find
, string_to_find
or
bytes_to_find
beginning at parse_point
.
If what you're looking for is found, the location will be put into
found_at
and the return value will be TRUE. If Find()
cannot find what you're looking for, it will return FALSE.
BOOL FindNoCase( const CParsePoint& parse_point, const CString& string_to_find, CParsePoint& found_at ) const
BOOL FindNoCase( const CParsePoint& parse_point, const CByteArray& bytes_to_find, CParsePoint& found_at ) const
-
Will search for
string_to_find
or bytes_to_find
without
regard to case. It will match 'a' with 'A'.
If what you're looking for is found, the location will be put into
found_at
and the return value will be TRUE. If FindNoCase()
cannot find what you're looking for, it will return FALSE.
BOOL Get( CParsePoint& parse_point, DWORD length, CByteArray& bytes_to_get ) const
BOOL Get( CParsePoint& parse_point, DWORD length, CString& string_to_get ) const
-
Retrieves the
length
number of bytes beginning at parse_point
.
BYTE GetAt( DWORD index ) const
-
Retrieves the byte at the given
index
.
DWORD GetCharacter( const CParsePoint& const_parse_point, const DWORD number_of_characters_ahead = 0 ) const
-
Returns a character at the given location. NOTE: Don't assume that characters
are one byte each like in ASCII. Characters can be made up of multiple bytes
each. This will happen when SetTextToASCII() is set to
FALSE or SetTextToUCS4() is set to TRUE.
BOOL GetNextCharacter( CParsePoint& parse_point, DWORD& character ) const
-
Like GetCharacter() except the parse point will be advanced
by however many bytes make up one character (1, 2 or 4). It allows you to basically
enumerate through the data stream. It will return TRUE of
character
was
filled or FALSE if you have reached the end (or passed the end) of the data.
DWORD GetUCS4Order( void ) const
-
Returns one of the following:
BYTE GetUnicodeToASCIITranslationFailureCharacter( void ) const
-
Returns the ASCII character that will be substituted when a translation from
UNICODE to ASCII fails.
DWORD GetSize( void ) const
-
Returns the number of bytes in the data area.
BOOL GetUntilAndIncluding( CParsePoint& parse_point, BYTE termination_byte, CString& string_to_get ) const
BOOL GetUntilAndIncluding( CParsePoint& parse_point, BYTE termination_byte, CByteArray& bytes_to_get ) const
-
This method retrieves data (filling
string_to_get
or bytes_to_get
)
until and including the termination_byte
. The parse_point
is advanced in the process.
BOOL Initialize( CByteArray * data, BOOL automatically_delete = FALSE )
BOOL Initialize( const CStringArray& strings )
-
Tells the parser where to go for data.
BOOL IsTextASCII( void ) const
-
Returns TRUE if characters are to be treated as one byte each.
BOOL IsTextBigEndian( void ) const
-
Returns TRUE if text is big endian (Sun) format. This has meaning when the
underlying characters are treated as UNICODE or ICS-4.
BOOL IsTextUCS4( void ) const
-
Returns TRUE if characters are to be treated as four bytes per character.
BOOL PeekAtCharacter( const CParsePoint& parse_point, DWORD& character, const DWORD number_of_characters_ahead = 1 ) const
-
Allows you to peek ahead at characters. It will return TRUE if
character
was filled with a character from the data stream.
It will return FALSE when you have tried to read passed the end of the stream.
DWORD PeekCharacter( const CParsePoint& parse_point, const LONG number_of_characters_ahead ) const
-
Allows you to peek ahead at characters. It will the character at the current location plus
number_of_characters_ahead
. If you attempt to read a character passed the
end of the data, it will return NULL.
BOOL SetTextToASCII( BOOL text_is_ascii = TRUE )
-
Tells the class to interpret characters as one byte each.
BOOL SetTextToBigEndian( BOOL unicode_is_big_endian = TRUE )
-
Tells the class to interpret UNICODE or UCS-4 characters as big endian (Sun) format.
Little endian is Intel format.
BOOL SetTextToUCS4( BOOL text_is_ucs4 = TRUE )
-
Tells the class to interpret characters as four bytes each.
BOOL SetUCS4Order( DWORD order = 4321 )
-
Tells the parser to interpret UCS-4 characters in 4321 format.
void SetUnicodeToASCIITranslationFailureCharacter( BYTE asci_character )
-
This sets the character that will be substituted when a translation must be made
from UNICODE to ASCII. Since ASCII only has 256 possible values and UNICODE has 65536,
some provision must be made for bad translations.
Example
#include <wfc.h>
BOOL parse_document( const CString& filename, CExtensibleMarkupLanguageDocument& document )
{
WFCTRACEINIT( TEXT( "parse_document()" ) );
CByteArray bytes;
if ( get_bytes( filename, bytes ) != TRUE )
{
return( FALSE );
}
CDataParser parser;
parser.Initialize( &bytes, FALSE );
if ( document.Parse( parser ) == TRUE )
{
WFCTRACE( TEXT( "Parsed OK" ) );
return( TRUE );
}
else
{
WFCTRACE( TEXT( "Can't parse document" ) );
return( FALSE );
}
}
Copyright, 2000, Samuel R. Blackburn
$Workfile: CDataParser.cpp $
$Modtime: 1/04/00 5:11a $