Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

On-heap fixed allocation outside MFC

0.00/5 (No votes)
13 Mar 2006 2  
An alternative to MFC undocumented classes to perform fixed memory allocation.

Contents

Introduction

This article starts its move from this other article, published by lano1106 some weeks ago.

In that article, the author was proposing the use of an undocumented internal MFC class (CFixAlloc, in that case) to optimize on-heap object construction / destruction in case of frequent small objects allocation/deallocations.

Here, I'll propose an alternative implementation that's not MFC dependent and has almost no requirements.

I'll not go in to the details of why and when (lano1106's article provides good explanations about), but I'll be more detailed about how.

Concepts

Behind fixed allocation, there is the need to avoid to continuously ask and return memory from/to the system for short-living small objects, to avoid system memory fragmentation.

The problem workaround is to perform in-place object construction/destruction on a pre-allocated and reused memory, and the different implementations -essentially- differ in the way this pre-allocated memory is obtained, managed, and released.

In this implementation, I want to be simple and fast, hence I'll not optimize every choice, but I'll point out the best efficiency in memory allocation and dispatching.

Trivia

While searching for such optimizations, I attempted different variants.

The initial attempt was to create a base for a class overriding the new and delete operators.

It has the problem of a not known object size (the base can be derived ...), the advantage to be easy to be used, but it is completely useless where allocation is hidden in some other classes (think of std::list, for example).

This problem can be solved by re-implementing the std::allocator<.> class, so, now there are two classes providing a different interface to a same problem.

I so decided to implement the raw bytes algorithm in a separate class (providing essentially as a set of static member functions) and use those functions in a number of classes that provide convenient interfaces to be used with STL containers rather than classic inheritance-based polymorphisms.

The implementation

It is based on the "libs/stdx/fixedallocator.h" file contained inline classes. Such a file depends on a bounce of small headers all inside the "libs/stdx" directory that provides some frequently used code of many of my projects.

STDX dependency

The "general.h" file contains -in the GE_::stdx and GE_::dbg namespaces- a number of small classes and general purpose inline / template functions. I think the header is self-explanatory enough, and it is not the subject of this article, so I'll not spend any more words on it. The "trace.h" header contains the GE_::dbg::trace class that is used to prompt strings in the debug window with a stream-like syntax (essentially, based on the override of the operator <<).

The raw algorithms

The class stdx::fixalloc_base provides the static raw_allocate and raw_free functions. They have no knowledge about objects and classes: they just handle bytes.

To be fast in memory dispatching, I decided to organize the memory pool by directly indexing linked lists of elements of a homogeneous size. The required element's size is so used as an index for a static array of pointers to these lists.

Practically, to avoid huge memory allocation, I limited the maximum handled object size to ELEM_MAX_SIZE (512 bytes). Wider objects are allocated by fall-through to a direct operator new call. This should not be a big limitation, since the purpose of fixed allocation is to handle frequent alloc/rellease of small objects.

The required size is used to index a static array of data, that is an internal struct, carrying the following members:

    struct data
    {
        element_header* freelist; //the element free list

        size_t plexedobjs; //the plex size (in contained objects)

        ...
    };

The element_header along with the plex_header are internal structures used to form linked lists. The plexedobjs is the number of elements of the required size+header contained in the plex that will be allocated next.

Essentially, memory is allocated in "plexes" containing an integer amount of "elements" of the required size. "Plexes" are linked to each other to allow them to be walked at program termination to be destroyed in a clean way. "Elements" are also linked to each other to form "free lists".

Like every other static data, the static data array is destroyed at program termination, as well the plexes chain.

By calling raw_allocate(size_t sz), the following actions are taken:

  • If a free list -for the given size- exists, the head element is removed from the list, the size stored in the element header, and the element's raw bytes address returned.
  • If the list is empty, a plex for plexedobjs elements (and relative headers) is allocated, and its space is internally formatted as a linked list of elements. The plex and the element list head are then linked by the sz corresponding data. Then we are ready to execute the previous point.
  • After a plex is allocated, the plexedobjs value is doubled. This lets the program to minimize the number of different requested plexes, in case of large amount of memory requirements. Such doubling is limited to a maximum defined in MAX_PLXSIZE (actually 64 KBytes), while the starting plex capacity is defined in MIN_PLEXED_OBJ (actually 4).

Vice versa, when raw_free(void* p) is called, the element header is fetched, the size determined, and the element itself pushed into the corresponding free-list.

Note that -after an arbitrary number of arbitrary raw_allocate / raw_free- the sequence of the elements inside the free list doesn't necessarily match the sequence of the plexes. As a consequence, unless implementing reordering algorithms, it is not possible to delete plexes before program termination since it is not known if they are or not partially in use. I didn't implement any tracking to avoid slowing down the process.

Non-trivial classes

The trivial fixalloc_base is used in two less trivial classes: fixallocator<Type> and fixalloc.

An STL compliant allocator

The class fixallocator<Type> is an STL compliant allocator that inherits from std::allocator and overrides the allocate and deallocate functions to call the fixalloc_base raw functions.

This class can be used as the allocation policy with all the STL containers, like in std::list<MyType, fixallocator<MyType> >, thus making the list to buy its node from the fixed allocator rather than directly from the heap, as with the default std:allocator.

A common base for inheritance

Similarly, the class fixalloc defines the proper inheritable new and delete operators, allowing all the derived classes to be used with fixed allocation.

Its typical use is like in class MyClass: public fixalloc { ... };.

Testing the code

The "test.cpp" source declares a Fixallocated class and a Collected class.

The first inherits from stdx::fixalloc, and is designed to be allocated on the heap, while the second is a value class, with overridden copy and assignment.

The main function operates four distinct tests, all of them allocating and deallocating a given number of objects for a given number of time in a sort of repeated do/undo/do/undo...:

  • By calling new Fixallocated and delete, i.e., by using inherited fixed allocation.
  • By pushing nodes into an std::list that uses fixallocator, cleared on each loop scope exiting, causing the list to continuously buy/tidy.
  • By calling new Collected and delete, thus performing continuous regular OS heap allocation and release.
  • By pushing nodes into an std::list with its own default allocator, thus buying/tidying from the OS heap.

The number of objects and of loops vary from the debug and release versions: small numbers in debug versions don't allow to appreciate performance, but make debug tracking easy to be understood. Conversely, wider numbers (250 objects for 100'000 cycles) allow to appreciate (especially when running outside the debugger) the sensible performance difference between the first test pair and the second.

Some precautions in particular cases

The vector that indexes the free lists is static, and allocated the very first time that raw_allocate is called. The same is for the plex chain handler.

In normal situations, this works fine, but care must be taken when a static object (say A) containing a pointer is initialized to a fixalloc-ated object (say B): when this happens, the construction order will be A (by the module startup), then the fixalloc indexes, then the B object (by the A constructor).

Since the allocation indexes are constructed after, B will be deleted first at program termination, letting the A destructor to be called later, probably attempting to delete B that already had been freed through its plex destruction, operated by the already happened index destruction.

To avoid this kind of problems, use a static dummy initializer object to be constructed first, that allocates and immediately deallocates something inside its constructor, so that when A will be constructed, the allocation index already exists.

Conclusion

This is not -and doesn't want to be- the best professional solution for this kind of a problem. I was only meant to provide an idea about a useful concept that can be implemented even outside the place it originally started. Many of you will probably think of different optimizations and other possible algorithms. For example, by making fixallocator to take one extra template parameter and by making fixalloc a template class, the fixalloc_base class can be used as a policy... and other considerations can arise.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here