Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

Ultimate FastMaps!

0.00/5 (No votes)
19 Jan 2000 4  
A fully featured map class that uses balanced trees to store and retrieve data quickly by key
  • Download installation file - 558 Kb
  • Ultimate FastMaps!

    Introduction

    Hierarchy

    The Ultimate FastMaps! product contains classes to map keys to corresponding data and to store and retrieve the data quickly by key; for example, looking up a data structure from thousands of similar structures where each has an ASCII name that is used as its key. Data storage and lookup is implemented by means of a self-balancing binary tree. Maps implemented using balanced binary trees have two significant advantages over those implemented by other methods such as hashed lists:

    • Balanced trees ensure the fastest possible lookup.
    • Balanced trees are by nature ordered, and as such can be processed in order; in addition to the basic store/retrieve functionality, balanced trees can be used as a fast list-insertion sort.

    The product consists of a base class for serialization to iostreams, an abstract base class for balanced binary trees, a level of non-MFC application-usable tree classes that support serialization to streams, a level of MFC application-usable tree classes that support serialization to both streams and CArchive objects, and three MFC application-usable tree classes that support serialization to CArchive objects only. The product can be used in either 16 or 32 bit applications, with or without MFC.

    The abstract base class for balanced binary trees, tBalTree, uses a balancing algorithm that was developed from scratch and has at least one advantage over most other balancing algorithms; the amount of imbalance that is allowable before an automatic rebalance occurs is controllable by the programmer and can also be disabled and re-enabled. Controlling the allowable level of imbalance is performed by using the SetAutoBalance() interface. The default imbalance level is 15 nodes; this means that the left and right sides of any subtree can be out of balance by as much as 15 nodes before an automatic rebalance occurs, which in turn means that the total number of accesses in a search operation is increased by 15. This can slow access trivially, but significantly reduces the number of automatic rebalance operations that occur during normal operation. A Balance() interface is also provided for cases where the imbalance level has been reduced and it is desirable to ensure that the tree is optimally balanced.

    The tree class itself requires 20 bytes of data space, and each node requires 28 bytes of data space in addition to any dynamically allocated data needed by a derived class to store key and data. In debug builds the tree class uses an additional 16 bytes of data for maintaining statistics that can be used to monitor application behaviour (see DumpStatistics() for details). The DumpStructure() interface provides a dump of the tree structure (via TRACE statements) for use during development.

    The sets of classes provided for MFC and non-MFC use differ in the following ways:

    • The non-MFC classes begin with the letter t and the MFC classes begin with Ct.
    • The non-MFC classes do not support serialization to CArchive objects.
    • The MFC classes support serialization to either streams or CArchive objects.
    • Three classes exist in the MFC set which do not have counterparts in the non-MFC set since they deal specifically with CObject data, and which do not support serialization to streams.

    The non-MFC classes can be freely used in MFC programs unless serialization to CArchive is required. However if the application is written using MFC, the MFC classes are recommended regardless of current needs for serialization, since these needs may change in the future.

    The programming interface for all derived tree classes is identical except for the datatypes of key and data, and whether serialization is supported to streams, CArchive, or both. The classes provided in the product support string, unsigned long, and long keys, and map to string, unsigned long, and CObject* data. See "Selecting a Class" for help in choosing the correct class for your application.

    Programming Interface

    Overview

    The programming interface provided by the classes in Ultimate FastMaps! is consistent between classes; all classes support the same set of public members, which differ only by datatypes accepted as paremeters and returned as results. The exception to this is that some classes support serialization to streams, some to CArchive, and some to either; thus the Load(), Store(), and Serialize() members may be present or absent in a given class.

    No copy constructors or operators are provided by the base programming interface. Copy constructors and equal-operators are of limited usefulness with trees, and other operators are likewise of limited use and additionally are datatype-specific. If your application has need for these, you will need to derive a specialized class to implement them.

    The product does not attempt to be thread-safe. Support for thread locking would require additional calls to virtuals which would not apply to the Win-16 interface but which would affect performance regardless, and the type of locking required is felt to be more of an application-level issue. It is recommended that access to maps from multiple threads be handled by a common subroutine that performs the thread synchronization appropriate to the application environment; an alternative is to derive thread-safe classes of the types required by the application.

    Classes provided in the product that accept string keys always make a copy of the key and manage its deallocation. Likewise, classes that accept string data make a copy of the data and manage its deallocation. Allocation and deallocation within the product is based on malloc(), free(), and strdup(). Classes that accept CObject data take ownership of the object passed and manage its destruction. When a tree class is destroyed all storage it owns, including keys and data, is released, and any CObject-derived objects stored as data are destroyed through the CObject virtual destructor.

    If you wish to store an object pointer as data, but wish to avoid giving ownership of it to the tree, either use a class that stores unsigned long data and cast the object pointer appropriately, or derive a specialized class. Signed and unsigned short data can be stored in the classes provided by performing the appropriate casting.

    Note that the major difference between classes that use unsigned long keys and classes that use long keys is where items with their high-order bit set are stored; unsigned long keys cause these to be stored rightmost in the tree (largest) and signed long keys cause these to be stored leftmost in the tree (smallest).

    The classes provided which use string keys use them in a case-sensitive manner, thus items whose keys contain the same letters but differ in case will be stored separately in the tree. Case-insensitive keys can be handled by deriving a specialized class, or by simply converting the keys to a standard case prior to using them to access the tree.

    Data is always initially stored in the tree by defining a key,data association via the Set(key,data) member. Once stored in the tree, data can be returned by referencing its key as the argument of a Get(key) member call. Note that Get(key) returns zero if no such item is found in the tree; alternatively the program can call Find(key) to determine whether a node exists with that key, then use the resulting POSITION value to directly access that node's data (or key) without performing an additional search.

    The tree can be navigated as a list, using the First(), Last(), Prev(), and Next() members, or as an array by using the GetCount() and GetItem() members. The key or data of a given node can be retrieved during this navigation by using the GetKey() and GetData() members, and the data can be updated by using the SetData() member.

    The members provided are listed below.

    Construction/Destruction

    Constructor
    Virtual Destructor

    General

    long GetCount(void); Get items in tree
    int Remove(key); Remove node by key
    int Remove(POSITION& node); Remove node
    void RemoveAll(void); Empty tree

    Data Manipulation

    data Get(key); Get data for key
    data GetData(POSITION node); Get node's data
    key GetKey(POSITION node); Get node's key
    int Set(key, data); Set data for key
    int SetData(POSITION node, data); Update node's data

    Balancing

    int Balance(int force = 0); Rebalance tree
    long SetAutoBalance(long maxImbalance); Control balancing

    Navigation

    POSITION Find(key); Find node by key
    POSITION GetItem(long item); Get node by index
    long GetItemIndex(POSITION node); Get node's index
    POSITION First(void); Get first node position
    POSITION Last(void); Get last node position
    POSITION Next(POSITION node); Get position next node
    POSITION Prev(POSITION node); Get position prev node

    Debugging/Tuning

    DumpStatistics() TRACE tree statistics
    DumpStructure() TRACE tree structure
    ResetStatistics() Zero tree statistics

    Serialization

    int Load(istream* istrm); Load tree from stream
    void Serialize(CArchive& ar); Serialize using CArchive
    int Store(ostream* ostrm); Store tree in stream

    Using FastMaps! In Your Project

    This section describes the components that are added during FastMaps! installation, specifies the names of the header files you need to include in order to use the product, and describes the preprocessor symbols you can define to tailor product operation.

    Installed Directories And Files

    The installation program is a simple one that, given the name of a directory "install", creates several subdirectories and installs files in them (to uninstall the product simply delete the subdirectories and the files in them.) The subdirectories created are shown in the following picture; click on any of these to see what it contains:

    Including Ultimate FastMaps! In Your Project

    Because the source code for FastMaps! is compact, and can be used with or without MFC, in a DLL or static library, or included in an application's EXE file, the decision on how to "package" FastMaps! for use in your project is left up to you. (NB: your license agreement prohibits distribution of FastMaps! source code.)

    Note that the FastMaps! source files need to be compiled with the rest of your project to ensure that the correct version of iostreams is used.

    The source code for FastMaps! is organized in four levels as shown in the following picture; select the functionality level you need and place copies of all files at or below that level in your project directory:

    The zlib.h Header - Cross-System Customization

    The zlib.h header file is included, either directly or indirectly, by all of the other include files in the product. This file includes a number of other files that are required by the product, such as "malloc.h". It defines the POSITION type if it is not already defined. It nullifies the ASSERT() and TRACE() macros if MFC is not being used. It defines the EXT_CLASS symbol that is used to export classes when the product is built into a library.

    If you are using the library with a non-Microsoft compiler, or with an operating system other than MS-Windows, you should examine this header file in detail and make the changes necessary for correct compilation in your programming environment.

    Pre-Processor Symbols You May Need

    Three pre-processor symbols are defined by the product for use in defining the language and support configuration that is used.

    USE_TEMPLATED_STREAMS

    The USE_TEMPLATED_STREAMS symbol determines whether the STL or MFC version of iostreams is to be used. Visual C++ (starting with 4.2) includes two incompatible versions of iostreams. The older version is pulled in using the "iostream.h" header, and the newer (STL, or Standard Template Library) version is pulled in using the "iostream" header. Since FastMaps! uses iostreams and it must use the same version that the rest of your application uses, the USE_TEMPLATED_STREAMS pre-processor symbol is provided. To use the STL version of iostreams, define the USE_TEMPLATED_STREAMS symbol in your project file (to be passed to the compiler as a pre-processor symbol). By default the older version of iostreams is used.

    NO_MFC

    If your project does not use MFC, define the NO_MFC symbol in your project file (to be passed to the compiler as a pre-processor symbol). By default the product assumes that MFC will be used.

    HAVE_TEMPLATES

    The HAVE_TEMPLATES pre-processor symbol is provided in case you are using a non-Microsoft compiler that supports templates.

    Note: If you are using a non-MS compiler that supports templates, define the HAVE_TEMPLATES symbol in your project file (to be passed to the compiler as a pre-processor symbol).

    By default the product determines whether template support is present by examining the values of the _WIN32 and _MSC_VER pre-defined symbols provided by the Microsoft C++ compiler.

    NQ

    The NQ pre-processor symbol is a namespace qualifier that defines the namespace for STL classes. This symbol is used in FastMaps! because Microsoft moved the STL definitions from the global namespace to "namespace std" in VC++ 5.0. When used in versions below 5.0 or when USE_TEMPLATED_STREAMS is not defined, the value of NQ is null; when used in 5.0 and above with USE_TEMPLATED_STREAMS it is "std::" . You can use this symbol to qualify the names of STL classes you use, for example if you use "fstream" you could code "NQ fstream" and the correct namespace qualification would be substituted for your environment.

    Class Heirachy

    Selecting a class

    Use the following matrix to select the class most approriate for your needs.

    Deriving Specialized Classes

    You can derive specialized classes that use keys and store data of any type.

    Each node within the tree contains two void* items, one for the node key and the other for the node data. Virtuals are called to compare keys, set and get data, set and get key values, and delete key and data values. When these virtuals are called, they are passed references to the void* items for key or data as appropriate. Your implementation can thus use the void* items to store the address of the key or data, or the value of the key or data, depending on the kind of data that makes up the key or data.

    For example, the tULongToULong class stores the value of the unsigned-long key and the unsigned-long data in the appropriate void* references passed to its virtuals; on the other hand, the tStringToULong stores the address of the key and the value of the data.

    If you derive a class directly from tBalTree (the abstract base tree class), you must implement overrides for the following pure virtuals:

    onDeleteKey() To delete a key value

    onDeleteData() To delete a data value

    To store a new key value

    onSetData() To store a new data value

    onCompareKeys() To compare two keys

    Additionally, it is recommended that you implement an override for the following member,

    onGetKeyName() Used during DumpStructure()

    It is important to note that the above pure virtual functions need perform only those operations for which they are specifically called. For example, if a class's implementation stores the address of an object in the data pointer, the class's onSetData() member does not need to delete the existing data before storing a pointer to the new data; the onDeleteData() member will already have been called by the tBalTree class. Likewise the onDeleteData() member does not need to zero the pointer after deleting dynamic data, because onDeleteData() is only called in two cases; immediately before the node itself is destroyed, and immediately before onSetData() is called to update the node's data.

    It is also important to note that the virtual destructor of every class derived directly from tBalTree must call the RemoveAll() member, or memory leaks will occur. This is because the tree is emptied only by RemoveAll(), and because RemoveAll() depends on the existence of the pure virtual functions onDeleteData() and onDeleteKey(). The base-class destructor cannot call these pure virtuals because they are no longer valid at the time the base class destructor is called. Always code a virtual destructor that contains a call to RemoveAll().

    You should refer to Handles Serialization for additional information before implementing your derived class if you require serialization support.

    How FastMaps! Handles Serialization

    FastMaps! supports serialization to iostreams or CArchive obejcts.

    Serialization Using iostream

    Serialization to iostreams is based on the zSerial class, which defines the Load() and Store() interfaces and several helper functions.

    Serialization to an ostream object is performed by the Store() member. The tBalTree implementation of Store() begins by writing the stream header to the output stream. It then calls the storeTree() member to navigate the tree; for each node in the tree, the onStore() virtual is called to write the key and data of the node to the stream. After the tree has been fully navigated, the stream header is updated and the serialization operation is complete.

    Serialization from an istream object is performed by the Load() member. The tBalTree implementation of Load() begins by disabling autobalance and examining the stream header. If the onStore() virtual tracks buffer sizes for key and data buffers, these are allocated at the start of the Load() operation. The Load() implementation then calls the onLoad() virtual repeatedly until the stream segment has been fully loaded. Once the tree has been reloaded, autobalance is re-enabled. The order in which the storeTree() member navigates the tree ensures that the tree that results from the Load() operation will have the same structure as the tree had when it was serialized by Store(). No balancing is required during serialization to or from an iostream.

    To support serialization to iostreams, your derived class will need to implement the onStore() and onLoad() virtuals. All classes provided in the product include implementations of these members; note that the classes which store CObject-derived data block them since CObject does not support serialization to iostreams.

    Serialization Using CArchive

    Serialization using CArchive is implemented by the Serialize() member inherited from CObject and overridden by the tree class.

    Serialization to a CArchive object begins by writing the number of items in the tree to the CArchive object. After the total number of items has been written to the CArchive object, the storeTree() member is called to navigate the tree and call the onStore() virtual for each node. The onStore() virtual simply writes the key followed by the data to the CArchive object.

    Serialization from a CArchive object begins by disabling autobalance and reading the total number of items from the CArchive object. Then key,data pairs are read from the CArchive object until the item count is exhausted, and the set(key,data) member is called to add each pair to the tree. Once all items have been reloaded, autobalance is re-enabled. The order in which the storeTree() member navigates the tree ensures that no balancing is required during serialization from a CArchive object.

    To support serialization using CArchive objects, your class needs to override the Serialize() and onStore() member functions. All MFC-oriented classes in the product contain implementations for these virtuals.

    Serialization Using EITHER iostream Or CArchive

    Serialization using either an iostream or a CArchive object is performed as described above; the derived class must implement overrides for the Serialize(), onStore(), and onLoad() members.

    The onLoad() member is only called during serialization from iostreams, but the onStore() member is called during serialization to either an iostream or a CArchive object, and must behave differently for each.

    The implementations provided in the product use the value of the m_bufSize protected member variable as a flag to determine whether onStore() is being called from Serialize() or Store(). The tBalTree implementation of Store() initializes the value of m_bufSize to zero in expectation that the onStore() member will track the size of the largest buffer needed. The implementations of Serialize() that are provided in the product classes all set m_bufSize to -1 at the start of their operation; thus their onStore() member can tell that if m_bufSize is -1 the call originates in Serialize(), otherwise it originates in Store().

    It is recommended that you examine the source code of a class similar to the one you are deriving before beginning the actual coding process.

    Demonstration Applications

    Wrdcount Demo

    The wrdcount demo is an MFC SDI application that counts the words in a text file using the FastMaps! CtStringToULong class, the MFC CMapStringToPtr class, and the STL map<string,int> template (32-bit version only).

    The wrdcount demo can be built for 16-bits using VC++ 1.52c, or 32-bits using VC++ 4.2 or higher. The 16-bit version does not handle the STL maps since they are unavailable in that environment, and displays data for only the FastMaps! and MFC classes.

    Whenever a file is opened via the File/Open command, the file is scanned several times and parsed for words. The parsing is performed by a subroutine that accepts a word-count-exit parameter. Each time a word is encountered the word-count-exit is called.

    The first time the file is scanned, the parsing routine is passed a pointer to a null exit, which simply returns. This allows the amount of time required for the parsing operation itself to be calculated.

    The second time the file is scanned, the parsing routine is passed a pointer to an exit routine that uses the FastMaps! class. The total time is reduced by the parsing time determined in the first scan, the resulting map time is displayed, and an unsorted listbox is filled with the words and their associated counts in ascending order (with the key lowest in the ASCII collating order first).

    The third time the file is scanned, the parsing routine is passed a pointer to an exit that uses the MFC CMapStringToPtr class; again the total time is reduced by the parsing time and the resulting map time is displayed. Again the words and their associated counts are displayed in an unsorted listbox in the order in which they are returned by CMapStringToPtr (note that since CMapStringToPtr uses hashed lists, the order is unpredictable).

    The fourth time the file is scanned (32-bit version only) the parsing routine is passed a pointer to an exit that uses the STL map<string,int> template. The same timing calculations are performed, and the words and their associated counts are again loaded into an unsorted listbox in the order they are returned by the map (ascending ASCII collating sequence).

    The code that performs these timing calculations is in the wordcodoc.cpp file.

    It turns out that counting the words in an arbitrary text file is a fairly good overall test for associative container performance. For each word, a lookup is performed to obtain the word's previous count, then the count is updated and the element is updated or added to the container.

    Non-MFC demo

    The non_mfc demo is a simple DOS application that accepts a single argument. The argument passed is taken to be the full path specification of a text file.

    The file specified is parsed for words. Each time a word is located, the total number of words in the file is incremented, and a usage-count for the word is likewise incremented. When the file has been completely parsed, the total number of words is displayed, the number of unique words is displayed, and the first 15 words (those lowest in the ASCII sort-order) and their associated counts are displayed.

    The purpose of the non_mfc demo is simply to illustrate that the non-MFC map classes can be used independently of MFC. The class used in this demo is tStringToULong.

    License

    This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

    A list of licenses authors might use can be found here