Download installation file - 558 Kb
Ultimate FastMaps!
Introduction
Hierarchy
The Ultimate FastMaps! product contains classes to map keys to
corresponding data and to store and retrieve the data quickly by key; for
example, looking up a data structure from thousands of similar structures where
each has an ASCII name that is used as its key. Data storage and lookup is
implemented by means of a self-balancing binary tree. Maps implemented using
balanced binary trees have two significant advantages over those implemented by
other methods such as hashed lists:
- Balanced trees ensure the fastest possible lookup.
- Balanced trees are by nature ordered, and as such can be processed in
order; in addition to the basic store/retrieve functionality, balanced trees
can be used as a fast list-insertion sort.
The product consists of a base class for serialization to iostreams, an
abstract base class for balanced binary trees, a level of non-MFC
application-usable tree classes that support serialization to streams, a level
of MFC application-usable tree classes that support serialization to both
streams and CArchive objects, and three MFC application-usable tree classes that
support serialization to CArchive objects only. The product can be used in
either 16 or 32 bit applications, with or without MFC.
The abstract base class for balanced binary trees, tBalTree, uses a balancing algorithm that was developed
from scratch and has at least one advantage over most other balancing
algorithms; the amount of imbalance that is allowable before an automatic
rebalance occurs is controllable by the programmer and can also be disabled and
re-enabled. Controlling the allowable level of imbalance is performed by using
the SetAutoBalance() interface. The default
imbalance level is 15 nodes; this means that the left and right sides of any
subtree can be out of balance by as much as 15 nodes before an automatic
rebalance occurs, which in turn means that the total number of accesses in a
search operation is increased by 15. This can slow access trivially, but
significantly reduces the number of automatic rebalance operations that occur
during normal operation. A Balance()
interface is also provided for cases where the imbalance level has been reduced
and it is desirable to ensure that the tree is optimally balanced.
The tree class itself requires 20 bytes of data space, and each node requires
28 bytes of data space in addition to any dynamically allocated data needed by a
derived class to store key and data. In debug builds the tree class uses an
additional 16 bytes of data for maintaining statistics that can be used to
monitor application behaviour (see DumpStatistics() for details). The DumpStructure() interface provides a dump of the
tree structure (via TRACE statements) for use during development.
The sets of classes provided for MFC and non-MFC use
differ in the following ways:
- The non-MFC classes begin with the letter t and the MFC classes
begin with Ct.
- The non-MFC classes do not support serialization to
CArchive objects.
- The MFC classes support serialization to either
streams or CArchive objects.
- Three classes exist in the MFC set which do not have counterparts in the
non-MFC set since they deal specifically with CObject data, and which do not
support serialization to streams.
The non-MFC classes can be freely used in MFC programs unless serialization
to CArchive is required. However if the application is written using MFC, the
MFC classes are recommended regardless of current needs for serialization, since
these needs may change in the future.
The programming interface for all derived tree
classes is identical except for the datatypes of key and data, and whether
serialization is supported to streams, CArchive, or both. The classes provided
in the product support string, unsigned long, and long
keys, and map to string, unsigned long, and CObject*
data. See "Selecting a Class" for help in choosing the correct class for
your application.
Programming Interface
Overview
The programming interface provided by the classes in Ultimate
FastMaps! is
consistent between classes; all classes support the same set of public members,
which differ only by datatypes accepted as paremeters and returned as results.
The exception to this is that some classes support serialization to streams,
some to CArchive, and some to either; thus the Load(), Store(), and
Serialize() members may be present or absent in a given class.
No copy constructors or operators are provided by the base programming
interface. Copy constructors and equal-operators are of limited usefulness with
trees, and other operators are likewise of limited use and additionally are
datatype-specific. If your application has need for these, you will need to derive a specialized class to implement them.
The product does not attempt to be thread-safe. Support for thread locking
would require additional calls to virtuals which would not apply to the Win-16
interface but which would affect performance regardless, and the type of locking
required is felt to be more of an application-level issue. It is recommended
that access to maps from multiple threads be handled by a common subroutine that
performs the thread synchronization appropriate to the application environment;
an alternative is to derive thread-safe classes of
the types required by the application.
Classes provided in the product that accept string keys always make a copy of
the key and manage its deallocation. Likewise, classes that accept string data
make a copy of the data and manage its deallocation. Allocation and deallocation
within the product is based on malloc(), free(), and strdup(). Classes that
accept CObject data take ownership of the object passed and manage its
destruction. When a tree class is destroyed all storage it owns, including keys
and data, is released, and any CObject-derived objects stored as data are
destroyed through the CObject virtual destructor.
If you wish to store an object pointer as data, but wish to avoid giving
ownership of it to the tree, either use a class that stores unsigned long data
and cast the object pointer appropriately, or derive a
specialized class. Signed and unsigned short data can be stored in the
classes provided by performing the appropriate casting.
Note that the major difference between classes that use unsigned long keys
and classes that use long keys is where items with their high-order bit set are
stored; unsigned long keys cause these to be stored rightmost in the tree
(largest) and signed long keys cause these to be stored leftmost in the tree
(smallest).
The classes provided which use string keys use them in a case-sensitive
manner, thus items whose keys contain the same letters but differ in case will
be stored separately in the tree. Case-insensitive keys can be handled by
deriving a specialized class, or by simply converting the keys to a standard
case prior to using them to access the tree.
Data is always initially stored in the tree by defining
a key,data association via the Set(key,data) member. Once stored in the
tree, data can be returned by referencing its key as the argument of a
Get(key) member call. Note that Get(key) returns zero if no such item is
found in the tree; alternatively the program can call Find(key) to determine
whether a node exists with that key, then use the resulting POSITION value
to directly access that node's data (or key) without performing an additional
search.
The tree can be navigated as a list, using the
First(), Last(), Prev(), and Next() members, or as an array by
using the GetCount() and GetItem() members. The key or data of a given
node can be retrieved during this navigation by using the GetKey() and
GetData() members, and the data can be updated by using the SetData() member.
The members provided are listed below.
Construction/Destruction
Constructor
Virtual Destructor
General
long GetCount(void); |
Get items in tree
|
int Remove(key); |
Remove node by key
|
int Remove(POSITION& node); |
Remove node
|
void RemoveAll(void); |
Empty tree
|
Data Manipulation
data Get(key); |
Get data for key
|
data GetData(POSITION node); |
Get node's data
|
key GetKey(POSITION node); |
Get node's key
|
int Set(key, data); |
Set data for key
|
int SetData(POSITION node, data); |
Update node's data
|
Balancing
int Balance(int force = 0); |
Rebalance tree
|
long SetAutoBalance(long maxImbalance); |
Control balancing
|
Navigation
POSITION Find(key); |
Find node by key
|
POSITION GetItem(long item); |
Get node by index
|
long GetItemIndex(POSITION node); |
Get node's index
|
POSITION First(void); |
Get first node position
|
POSITION Last(void); |
Get last node position
|
POSITION Next(POSITION node); |
Get position next node
|
POSITION Prev(POSITION node); |
Get position prev node
|
Debugging/Tuning
DumpStatistics() |
TRACE tree statistics
|
DumpStructure() |
TRACE tree structure |
ResetStatistics() |
Zero tree statistics
|
Serialization
int Load(istream* istrm); |
Load tree from stream
|
void Serialize(CArchive& ar); |
Serialize using CArchive
|
int Store(ostream* ostrm); |
Store tree in stream |
Using FastMaps! In Your Project
This section describes the components that are added during FastMaps!
installation, specifies the names of the header files you need to include in
order to use the product, and describes the preprocessor symbols you can define
to tailor product operation.
Installed Directories And Files
The installation program is a simple one that, given the name of a directory
"install", creates several subdirectories and installs files in them (to
uninstall the product simply delete the subdirectories and the files in them.)
The subdirectories created are shown in the following picture; click on any of
these to see what it contains:
Including Ultimate FastMaps! In Your Project
Because the source code for FastMaps! is compact, and can be used with
or without MFC, in a DLL or static library, or included in an application's EXE
file, the decision on how to "package" FastMaps! for use in your project
is left up to you. (NB: your license agreement prohibits distribution of
FastMaps! source code.)
Note that the FastMaps!
source files need to be compiled with the rest of your project to ensure that
the correct version of iostreams is used.
The source code for FastMaps! is organized in four levels as shown in
the following picture; select the functionality level you need and place copies
of all files at or below that level in your project directory:
The zlib.h Header - Cross-System Customization
The zlib.h header file is included, either directly or indirectly, by all of
the other include files in the product. This file includes a number of other
files that are required by the product, such as "malloc.h". It defines the
POSITION type if it is not already defined. It nullifies the ASSERT() and
TRACE() macros if MFC is not being used. It defines the EXT_CLASS symbol
that is used to export classes when the product is built into a library.
If you are using the library with a non-Microsoft compiler, or with an
operating system other than MS-Windows, you should examine this header file in
detail and make the changes necessary for correct compilation in your
programming environment.
Pre-Processor Symbols You May Need
Three pre-processor symbols are defined by the product for use in defining
the language and support configuration that is used.
USE_TEMPLATED_STREAMS
The USE_TEMPLATED_STREAMS symbol determines whether the STL or MFC version of
iostreams is to be used. Visual C++ (starting with 4.2) includes two
incompatible versions of iostreams. The older version is pulled in using
the "iostream.h" header, and the newer (STL, or Standard Template Library)
version is pulled in using the "iostream" header. Since FastMaps! uses
iostreams and it must use the same version that the rest of your application
uses, the USE_TEMPLATED_STREAMS pre-processor symbol is provided. To use the
STL version of iostreams, define the USE_TEMPLATED_STREAMS symbol in your
project file (to be passed to the compiler as a pre-processor symbol). By
default the older version of iostreams is used.
NO_MFC
If your project does not use MFC, define the NO_MFC symbol in your project
file (to be passed to the compiler as a pre-processor symbol). By default
the product assumes that MFC will be used.
HAVE_TEMPLATES
The HAVE_TEMPLATES pre-processor symbol is provided in case you are using a
non-Microsoft compiler that supports templates.
Note: If you are using a non-MS
compiler that supports templates, define the HAVE_TEMPLATES symbol in your
project file (to be passed to the compiler as a pre-processor symbol).
By
default the product determines whether template support is present by examining
the values of the _WIN32 and _MSC_VER pre-defined symbols provided by the
Microsoft C++ compiler.
NQ
The NQ pre-processor symbol is a namespace qualifier that defines the
namespace for STL classes. This symbol is used in FastMaps! because Microsoft
moved the STL definitions from the global namespace to "namespace std" in VC++
5.0. When used in versions below 5.0 or when USE_TEMPLATED_STREAMS is not
defined, the value of NQ is null; when used in 5.0 and above with
USE_TEMPLATED_STREAMS it is "std::" . You can use this symbol to qualify the
names of STL classes you use, for example if you use "fstream" you could code
"NQ fstream" and the correct namespace qualification would be substituted for
your environment.
Class Heirachy
Selecting a class
Use the following matrix to select the class most approriate for your needs.
Deriving Specialized Classes
You can derive specialized classes that use keys and store data of any type.
Each node within the tree contains two void* items, one for the node key and
the other for the node data. Virtuals are called to compare keys, set and get
data, set and get key values, and delete key and data values. When these
virtuals are called, they are passed references to the void* items for
key or data as appropriate. Your implementation can thus use the void* items to
store the address of the key or data, or the value of the key or
data, depending on the kind of data that makes up the key or data.
For example, the tULongToULong class stores the value of the
unsigned-long key and the unsigned-long data in the appropriate void* references
passed to its virtuals; on the other hand, the tStringToULong stores the
address of the key and the value of the data.
If you derive a class directly from tBalTree (the
abstract base tree class), you must implement overrides for the following pure
virtuals:
onDeleteKey() To delete a key value
onDeleteData() To delete a data value
To store a new key value
onSetData() To store a new data value
onCompareKeys() To compare two keys
Additionally, it is recommended that you implement an override for the
following member,
onGetKeyName() Used during DumpStructure()
It is important to note that the above pure virtual functions need perform
only those operations for which they are specifically called. For
example, if a class's implementation stores the address of an object in the data
pointer, the class's onSetData() member does
not need to delete the existing data before storing a pointer to the new data;
the onDeleteData() member will already have been
called by the tBalTree class. Likewise the onDeleteData() member does not need to zero the pointer
after deleting dynamic data, because onDeleteData()
is only called in two cases; immediately before the node itself is destroyed,
and immediately before onSetData() is called
to update the node's data.
It is also important to note that the virtual destructor of every class
derived directly from tBalTree must call the
RemoveAll() member, or memory leaks will
occur. This is because the tree is emptied only by RemoveAll(), and because RemoveAll() depends on the existence of the
pure virtual functions onDeleteData() and onDeleteKey(). The base-class destructor cannot
call these pure virtuals because they are no longer valid at the time the base
class destructor is called. Always code a virtual destructor that contains a
call to RemoveAll().
You should refer to Handles Serialization
for additional information before implementing your derived class if you require
serialization support.
How FastMaps! Handles Serialization
FastMaps! supports serialization to iostreams or CArchive obejcts.
Serialization Using iostream
Serialization to iostreams is based on the zSerial class, which defines the Load() and Store() interfaces and several helper functions.
Serialization to an ostream object is performed by the Store() member. The tBalTree implementation of Store() begins by writing the stream header to the output stream. It then
calls the storeTree() member to navigate the
tree; for each node in the tree, the onStore() virtual is called to write the key and
data of the node to the stream. After the tree has been fully navigated, the stream header is updated and the serialization
operation is complete.
Serialization from an istream object is performed by the Load() member. The tBalTree implementation of Load() begins by disabling autobalance and examining the stream header. If
the onStore() virtual tracks buffer sizes for key and data
buffers, these are allocated at the start of the Load() operation. The Load() implementation then calls the onLoad() virtual repeatedly until the stream
segment has been fully loaded. Once the tree has been reloaded, autobalance is re-enabled. The order in which
the storeTree() member navigates the tree
ensures that the tree that results from the Load() operation will have the same structure as
the tree had when it was serialized by Store(). No balancing is required during
serialization to or from an iostream.
To support serialization to iostreams, your derived
class will need to implement the onStore() and onLoad() virtuals. All classes provided in the
product include implementations of these members; note that the classes which
store CObject-derived data block them since CObject does not support
serialization to iostreams.
Serialization Using CArchive
Serialization using CArchive is implemented by the Serialize() member
inherited from CObject and overridden by the tree class.
Serialization to a CArchive object begins by writing the number of items in
the tree to the CArchive object. After the total number of items has been
written to the CArchive object, the storeTree() member is called to navigate the
tree and call the onStore() virtual for each
node. The onStore() virtual simply writes the
key followed by the data to the CArchive object.
Serialization from a CArchive object begins by disabling autobalance and reading the total number of
items from the CArchive object. Then key,data pairs are read from the CArchive
object until the item count is exhausted, and the set(key,data) member is called to add each pair
to the tree. Once all items have been reloaded, autobalance is re-enabled. The order in which
the storeTree() member navigates the tree
ensures that no balancing is required during serialization from a CArchive
object.
To support serialization using CArchive objects, your class needs to override
the Serialize() and onStore() member
functions. All MFC-oriented classes in the product contain implementations for
these virtuals.
Serialization Using EITHER iostream Or CArchive
Serialization using either an iostream or a CArchive object is
performed as described above; the derived class must implement overrides for the
Serialize(), onStore(), and onLoad() members.
The onLoad() member is only called during
serialization from iostreams, but the onStore() member is called during serialization
to either an iostream or a CArchive object, and must behave differently for
each.
The implementations provided in the product use the value of the m_bufSize protected member variable as a flag
to determine whether onStore() is being
called from Serialize() or Store(). The tBalTree implementation of Store() initializes the value of m_bufSize to zero in expectation that the onStore() member will track the size of the
largest buffer needed. The implementations of Serialize() that are provided in
the product classes all set m_bufSize to -1 at the start of their operation;
thus their onStore() member can tell that if
m_bufSize is -1 the call originates in Serialize(), otherwise it originates in
Store().
It is recommended that you examine the source code of a class similar to the
one you are deriving before beginning the actual coding process.
Demonstration Applications
Wrdcount Demo
The wrdcount demo is an MFC SDI application that counts the words in a
text file using the FastMaps! CtStringToULong
class, the MFC CMapStringToPtr class, and the STL map<string,int> template
(32-bit version only).
The wrdcount demo can be built for 16-bits using VC++ 1.52c, or
32-bits using VC++ 4.2 or higher. The 16-bit version does not handle the STL
maps since they are unavailable in that environment, and displays data for only
the FastMaps! and MFC classes.
Whenever a file is opened via the File/Open command, the file is
scanned several times and parsed for words. The parsing is performed by a
subroutine that accepts a word-count-exit parameter. Each time a word is
encountered the word-count-exit is called.
The first time the file is scanned, the parsing routine is passed a pointer
to a null exit, which simply returns. This allows the amount of time required
for the parsing operation itself to be calculated.
The second time the file is scanned, the parsing routine is passed a pointer
to an exit routine that uses the FastMaps! class. The total time is
reduced by the parsing time determined in the first scan, the resulting map time
is displayed, and an unsorted listbox is filled with the words and their
associated counts in ascending order (with the key lowest in the ASCII collating
order first).
The third time the file is scanned, the parsing routine is passed a pointer
to an exit that uses the MFC CMapStringToPtr class; again the total time is
reduced by the parsing time and the resulting map time is displayed. Again the
words and their associated counts are displayed in an unsorted listbox in the
order in which they are returned by CMapStringToPtr (note that since
CMapStringToPtr uses hashed lists, the order is unpredictable).
The fourth time the file is scanned (32-bit version only) the parsing routine
is passed a pointer to an exit that uses the STL map<string,int> template.
The same timing calculations are performed, and the words and their associated
counts are again loaded into an unsorted listbox in the order they are returned
by the map (ascending ASCII collating sequence).
The code that performs these timing calculations is in the
wordcodoc.cpp file.
It turns out that counting the words in an arbitrary text file is a fairly
good overall test for associative container performance. For each word, a lookup
is performed to obtain the word's previous count, then the count is updated and
the element is updated or added to the container.
Non-MFC demo
The non_mfc demo is a simple DOS application that accepts a single
argument. The argument passed is taken to be the full path specification of a
text file.
The file specified is parsed for words. Each time a word is located, the
total number of words in the file is incremented, and a usage-count for the word
is likewise incremented. When the file has been completely parsed, the total
number of words is displayed, the number of unique words is displayed, and the
first 15 words (those lowest in the ASCII sort-order) and their associated
counts are displayed.
The purpose of the non_mfc demo is simply to illustrate that the non-MFC map
classes can be used independently of MFC. The class used in this demo is tStringToULong.