Contents
Introduction
This article starts its move from this other article, published by lano1106 some weeks ago.
In that article, the author was proposing the use of an undocumented internal MFC class (CFixAlloc
, in that case) to optimize on-heap object construction / destruction in case of frequent small objects allocation/deallocations.
Here, I'll propose an alternative implementation that's not MFC dependent and has almost no requirements.
I'll not go in to the details of why and when (lano1106's article provides good explanations about), but I'll be more detailed about how.
Concepts
Behind fixed allocation, there is the need to avoid to continuously ask and return memory from/to the system for short-living small objects, to avoid system memory fragmentation.
The problem workaround is to perform in-place object construction/destruction on a pre-allocated and reused memory, and the different implementations -essentially- differ in the way this pre-allocated memory is obtained, managed, and released.
In this implementation, I want to be simple and fast, hence I'll not optimize every choice, but I'll point out the best efficiency in memory allocation and dispatching.
Trivia
While searching for such optimizations, I attempted different variants.
The initial attempt was to create a base for a class overriding the new
and delete
operators.
It has the problem of a not known object size (the base can be derived ...), the advantage to be easy to be used, but it is completely useless where allocation is hidden in some other classes (think of std::list
, for example).
This problem can be solved by re-implementing the std::allocator<.>
class, so, now there are two classes providing a different interface to a same problem.
I so decided to implement the raw bytes algorithm in a separate class (providing essentially as a set of static member functions) and use those functions in a number of classes that provide convenient interfaces to be used with STL containers rather than classic inheritance-based polymorphisms.
The implementation
It is based on the "libs/stdx/fixedallocator.h" file contained inline classes. Such a file depends on a bounce of small headers all inside the "libs/stdx" directory that provides some frequently used code of many of my projects.
STDX dependency
The "general.h" file contains -in the GE_::stdx
and GE_::dbg
namespaces- a number of small classes and general purpose inline / template functions. I think the header is self-explanatory enough, and it is not the subject of this article, so I'll not spend any more words on it. The "trace.h" header contains the GE_::dbg::trace
class that is used to prompt strings in the debug window with a stream-like syntax (essentially, based on the override of the operator <<
).
The raw algorithms
The class stdx::fixalloc_base
provides the static raw_allocate
and raw_free
functions. They have no knowledge about objects and classes: they just handle bytes.
To be fast in memory dispatching, I decided to organize the memory pool by directly indexing linked lists of elements of a homogeneous size. The required element's size is so used as an index for a static array of pointers to these lists.
Practically, to avoid huge memory allocation, I limited the maximum handled object size to ELEM_MAX_SIZE
(512 bytes). Wider objects are allocated by fall-through to a direct operator new
call. This should not be a big limitation, since the purpose of fixed allocation is to handle frequent alloc/rellease of small objects.
The required size is used to index a static array of data
, that is an internal struct, carrying the following members:
struct data
{
element_header* freelist;
size_t plexedobjs;
...
};
The element_header
along with the plex_header
are internal structures used to form linked lists. The plexedobjs
is the number of elements of the required size+header contained in the plex that will be allocated next.
Essentially, memory is allocated in "plexes" containing an integer amount of "elements" of the required size. "Plexes" are linked to each other to allow them to be walked at program termination to be destroyed in a clean way. "Elements" are also linked to each other to form "free lists".
Like every other static data, the static data
array is destroyed at program termination, as well the plexes chain.
By calling raw_allocate(size_t sz)
, the following actions are taken:
- If a free list -for the given size- exists, the head element is removed from the list, the size stored in the element header, and the element's raw bytes address returned.
- If the list is empty, a plex for
plexedobjs
elements (and relative headers) is allocated, and its space is internally formatted as a linked list of elements. The plex and the element list head are then linked by the sz
corresponding data
. Then we are ready to execute the previous point.
- After a plex is allocated, the
plexedobjs
value is doubled. This lets the program to minimize the number of different requested plexes, in case of large amount of memory requirements. Such doubling is limited to a maximum defined in MAX_PLXSIZE
(actually 64 KBytes), while the starting plex capacity is defined in MIN_PLEXED_OBJ
(actually 4).
Vice versa, when raw_free(void* p)
is called, the element header is fetched, the size determined, and the element itself pushed into the corresponding free-list.
Note that -after an arbitrary number of arbitrary raw_allocate
/ raw_free
- the sequence of the elements inside the free list doesn't necessarily match the sequence of the plexes. As a consequence, unless implementing reordering algorithms, it is not possible to delete plexes before program termination since it is not known if they are or not partially in use. I didn't implement any tracking to avoid slowing down the process.
Non-trivial classes
The trivial fixalloc_base
is used in two less trivial classes: fixallocator<Type>
and fixalloc
.
An STL compliant allocator
The class fixallocator<Type>
is an STL compliant allocator that inherits from std::allocator
and overrides the allocate
and deallocate
functions to call the fixalloc_base
raw functions.
This class can be used as the allocation policy with all the STL containers, like in std::list<MyType, fixallocator<MyType> >
, thus making the list to buy its node from the fixed allocator rather than directly from the heap, as with the default std:allocator
.
A common base for inheritance
Similarly, the class fixalloc
defines the proper inheritable new
and delete
operators, allowing all the derived classes to be used with fixed allocation.
Its typical use is like in class MyClass: public fixalloc { ... };
.
Testing the code
The "test.cpp" source declares a Fixallocated
class and a Collected
class.
The first inherits from stdx::fixalloc
, and is designed to be allocated on the heap, while the second is a value class, with overridden copy and assignment.
The main
function operates four distinct tests, all of them allocating and deallocating a given number of objects for a given number of time in a sort of repeated do/undo/do/undo...:
- By calling
new Fixallocated
and delete
, i.e., by using inherited fixed allocation.
- By pushing nodes into an
std::list
that uses fixallocator
, cleared on each loop scope exiting, causing the list to continuously buy/tidy.
- By calling
new Collected
and delete
, thus performing continuous regular OS heap allocation and release.
- By pushing nodes into an
std::list
with its own default allocator, thus buying/tidying from the OS heap.
The number of objects and of loops vary from the debug and release versions: small numbers in debug versions don't allow to appreciate performance, but make debug tracking easy to be understood. Conversely, wider numbers (250 objects for 100'000 cycles) allow to appreciate (especially when running outside the debugger) the sensible performance difference between the first test pair and the second.
Some precautions in particular cases
The vector that indexes the free lists is static, and allocated the very first time that raw_allocate
is called. The same is for the plex chain handler.
In normal situations, this works fine, but care must be taken when a static object (say A
) containing a pointer is initialized to a fixalloc
-ated object (say B
): when this happens, the construction order will be A
(by the module startup), then the fixalloc
indexes, then the B
object (by the A
constructor).
Since the allocation indexes are constructed after, B
will be deleted first at program termination, letting the A
destructor to be called later, probably attempting to delete B
that already had been freed through its plex destruction, operated by the already happened index destruction.
To avoid this kind of problems, use a static dummy initializer object to be constructed first, that allocates and immediately deallocates something inside its constructor, so that when A
will be constructed, the allocation index already exists.
Conclusion
This is not -and doesn't want to be- the best professional solution for this kind of a problem. I was only meant to provide an idea about a useful concept that can be implemented even outside the place it originally started. Many of you will probably think of different optimizations and other possible algorithms. For example, by making fixallocator
to take one extra template parameter and by making fixalloc
a template class, the fixalloc_base
class can be used as a policy... and other considerations can arise.