CDataParser

$Revision: 29 $

Description

This class is a generic class to assist in parsing data. It provides some basic searching capability as well as idiot proofed retrieval.

Construction

CDataParser(): Constructs the object.

Methods

void AdvanceByOneCharacter( CParsePoint& parse_point, DWORD character = 0 ) const

Advances the parse point by at least one character. parse_point will be incremented (see AutoIncrement()) by the character it finds. If you want to be incremented based on a character that you want, put that character's value in character.

void Empty( void )

Re-initializes the object. If a CByteArray was attached (and set to automatically delete) it will be deleted.

BOOL Find( const CParsePoint& parse_point, BYTE byte_to_find, CParsePoint& found_at ) const BOOL Find( const CParsePoint& parse_point, const CString& string_to_find, CParsePoint& found_at ) const BOOL Find( const CParsePoint& parse_point, const CByteArray& bytes_to_find, CParsePoint& found_at ) const

Searches for byte_to_find, string_to_find or bytes_to_find beginning at parse_point. If what you're looking for is found, the location will be put into found_at and the return value will be TRUE. If Find() cannot find what you're looking for, it will return FALSE.

BOOL FindNoCase( const CParsePoint& parse_point, const CString& string_to_find, CParsePoint& found_at ) const BOOL FindNoCase( const CParsePoint& parse_point, const CByteArray& bytes_to_find, CParsePoint& found_at ) const

Will search for string_to_find or bytes_to_find without regard to case. It will match 'a' with 'A'. If what you're looking for is found, the location will be put into found_at and the return value will be TRUE. If FindNoCase() cannot find what you're looking for, it will return FALSE.

BOOL Get( CParsePoint& parse_point, DWORD length, CByteArray& bytes_to_get ) const BOOL Get( CParsePoint& parse_point, DWORD length, CString& string_to_get ) const

Retrieves the length number of bytes beginning at parse_point.

BYTE GetAt( DWORD index ) const

Retrieves the byte at the given index.

DWORD GetCharacter( const CParsePoint& const_parse_point, const DWORD number_of_characters_ahead = 0 ) const

Returns a character at the given location. NOTE: Don't assume that characters are one byte each like in ASCII. Characters can be made up of multiple bytes each. This will happen when SetTextToASCII() is set to FALSE or SetTextToUCS4() is set to TRUE.

BOOL GetNextCharacter( CParsePoint& parse_point, DWORD& character ) const

Like GetCharacter() except the parse point will be advanced by however many bytes make up one character (1, 2 or 4). It allows you to basically enumerate through the data stream. It will return TRUE of character was filled or FALSE if you have reached the end (or passed the end) of the data.

DWORD GetUCS4Order( void ) const

Returns one of the following:

1234
2143
3412
4321

BYTE GetUnicodeToASCIITranslationFailureCharacter( void ) const

Returns the ASCII character that will be substituted when a translation from UNICODE to ASCII fails.

DWORD GetSize( void ) const

Returns the number of bytes in the data area.

BOOL GetUntilAndIncluding( CParsePoint& parse_point, BYTE termination_byte, CString& string_to_get ) const BOOL GetUntilAndIncluding( CParsePoint& parse_point, BYTE termination_byte, CByteArray& bytes_to_get ) const

This method retrieves data (filling string_to_get or bytes_to_get) until and including the termination_byte. The parse_point is advanced in the process.

BOOL Initialize( CByteArray * data, BOOL automatically_delete = FALSE ) BOOL Initialize( const CStringArray& strings )

Tells the parser where to go for data.

BOOL IsTextASCII( void ) const

Returns TRUE if characters are to be treated as one byte each.

BOOL IsTextBigEndian( void ) const

Returns TRUE if text is big endian (Sun) format. This has meaning when the underlying characters are treated as UNICODE or ICS-4.

BOOL IsTextUCS4( void ) const

Returns TRUE if characters are to be treated as four bytes per character.

BOOL PeekAtCharacter( const CParsePoint& parse_point, DWORD& character, const DWORD number_of_characters_ahead = 1 ) const

Allows you to peek ahead at characters. It will return TRUE if character was filled with a character from the data stream. It will return FALSE when you have tried to read passed the end of the stream.

DWORD PeekCharacter( const CParsePoint& parse_point, const LONG number_of_characters_ahead ) const

Allows you to peek ahead at characters. It will the character at the current location plus number_of_characters_ahead. If you attempt to read a character passed the end of the data, it will return NULL.

BOOL SetTextToASCII( BOOL text_is_ascii = TRUE )

Tells the class to interpret characters as one byte each.

BOOL SetTextToBigEndian( BOOL unicode_is_big_endian = TRUE )

Tells the class to interpret UNICODE or UCS-4 characters as big endian (Sun) format. Little endian is Intel format.

BOOL SetTextToUCS4( BOOL text_is_ucs4 = TRUE )

Tells the class to interpret characters as four bytes each.

BOOL SetUCS4Order( DWORD order = 4321 )

Tells the parser to interpret UCS-4 characters in 4321 format.

void SetUnicodeToASCIITranslationFailureCharacter( BYTE asci_character )

This sets the character that will be substituted when a translation must be made from UNICODE to ASCII. Since ASCII only has 256 possible values and UNICODE has 65536, some provision must be made for bad translations.

Example

#include <wfc.h> BOOL parse_document( const CString& filename, CExtensibleMarkupLanguageDocument& document ) { WFCTRACEINIT( TEXT( "parse_document()" ) ); CByteArray bytes; if ( get_bytes( filename, bytes ) != TRUE ) { return( FALSE ); } CDataParser parser; parser.Initialize( &bytes, FALSE ); if ( document.Parse( parser ) == TRUE ) { WFCTRACE( TEXT( "Parsed OK" ) ); return( TRUE ); } else { WFCTRACE( TEXT( "Can't parse document" ) ); return( FALSE ); } }

Copyright, 2000, Samuel R. Blackburn
$Workfile: CDataParser.cpp $
$Modtime: 1/04/00 5:11a $