A Fixed Block Memory Allocator in C

David Lafreniere

5.00/5 (15 votes)

24 Dec 2018CPOL8 min read

32K

839

Unique allocator features improve performance and protect against heap fragmentation faults on any C or C++ project.

Download C_Allocator.zip - 23.2 KB

Introduction

In 1998, I wrote an article on fixed block memory allocators for Embedded Systems Programming magazine. I received $750 for the piece. Now, articles are written for free on websites such as Code Project. Oh, how times keep a-changin’.

One thing that hasn’t changed is the usefulness of fixed block allocators. It’s a no-no using the global heap on some devices. Throughout my career, I’ve written numerous fixed block memory allocators to provide high-speed dynamic-like allocation in memory constrained systems. A generic, well-designed fixed block allocator opens up application implementation possibilities by offering dynamic memory allocations in systems where heap usage is forbidden.

On Code Project, I’ve documented various C++ allocator implementations (see the Reference Articles section). This time, I’ll present a C language version with unique features suitable for embedded devices or otherwise.

The solution presented here will:

be faster than the global heap
be easy to use
be thread safe
support fixed block versions of malloc, free, realloc and calloc
use minimal code space
execute in fixed time
eliminate heap fragmentation memory faults
require no additional storage overhead (except for a few bytes of static memory)
handle memory alignment
automatically dispense variable block size based on demand (a la malloc)

Two simple C modules that dispense and reclaim memory will provide all of the aforementioned benefits, as I'll show.

CMake is used to create the build files. CMake is free and open-source software. Windows, Linux and other toolchains are supported. See the CMakeLists.txt file for more information.

See GitHub for latest source code:

A Fixed Block Memory Allocator in C - by David Lafreniere

Background

Custom fixed block memory allocators are used to solve at least two types of memory related problems. First, global heap allocations/deallocations can be slow and nondeterministic. You never know how long the memory manager is going to take. Secondly, to eliminate the possibility of a memory allocation fault caused by a fragmented heap – a valid concern, especially on mission-critical type systems.

Even if the system isn't considered mission-critical, some embedded systems are designed to run for weeks or years without a reboot. Depending on allocation patterns and heap implementation, long-term heap use can lead to heap faults.

fb_allocator

Each fb_allocator instance handles a single block size. The interface is shown below:

C++

void ALLOC_Init(void);
void ALLOC_Term(void);
void* ALLOC_Alloc(ALLOC_HANDLE hAlloc, size_t size);
void* ALLOC_Calloc(ALLOC_HANDLE hAlloc, size_t num, size_t size);
void ALLOC_Free(ALLOC_HANDLE hAlloc, void* pBlock);

ALLOC_Init() is called one time at startup and ALLOC_Term() at shutdown. Each of the remaining APIs operate in the same manner as the CRT counterparts; ALLOC_Alloc() allocates memory and ALLOC_Free() deallocates.

An fb_allocator instance is created using the ALLOC_DEFINE macro at file scope. The TestAlloc allocator below defines a fixed block allocator with a maximum of five 16-byte blocks.

C++

ALLOC_DEFINE(TestAlloc, 16, 5);

Allocating a 16-byte block is simple.

C++

void* data = ALLOC_Alloc(TestAlloc, 16);

Deallocate the block when done.

C++

ALLOC_Free(TestAlloc, data);

x_allocator

The x_allocator module handles multiple memory block sizes using two or more fb_allocator instances; one fb_allocator per block size. During allocation, x_allocator returns a block sized from one of the fb_allocator instances based on the caller’s requested size. The x_allocator API is shown below:

C++

void* XALLOC_Alloc(XAllocData* self, size_t size);
void XALLOC_Free(void* ptr);
void* XALLOC_Realloc(XAllocData* self, void *ptr, size_t new_size);
void* XALLOC_Calloc(XAllocData* self, size_t num, size_t size);

Users of x_allocator typically create a thin wrapper module that (a) defines two or more fb_allocator instances and (b) provides a custom API to access the x_allocator memory. It’s easier to explain with a simple example.

my_allocator

Let’s say we want a fixed block allocator to dispense two block sizes: 32 and 128. We’ll call it my_allocator and the API is shown below:

C++

void* MYALLOC_Alloc(size_t size);
void MYALLOC_Free(void* ptr);
void* MYALLOC_Realloc(void *ptr, size_t new_size);
void* MYALLOC_Calloc(size_t num, size_t size);

The implementation creates multiple fb_allocator instances; one to handle each desired block size. In this case, we’ll allow at most ten 32-byte blocks and five 128-byte blocks.

C++

#include "my_allocator.h"
#include "x_allocator.h"

// Maximum number of blocks for each size
#define MAX_32_BLOCKS   10
#define MAX_128_BLOCKS  5

// Define size of each block including meta data overhead
#define BLOCK_32_SIZE     32 + XALLOC_BLOCK_META_DATA_SIZE
#define BLOCK_128_SIZE    128 + XALLOC_BLOCK_META_DATA_SIZE

// Define individual fb_allocators
ALLOC_DEFINE(myDataAllocator32, BLOCK_32_SIZE, MAX_32_BLOCKS)
ALLOC_DEFINE(myDataAllocator128, BLOCK_128_SIZE, MAX_128_BLOCKS)

// An array of allocators sorted by smallest block first
static ALLOC_Allocator* allocators[] = {
    &myDataAllocator32Obj,
    &myDataAllocator128Obj
};

#define MAX_ALLOCATORS   (sizeof(allocators) / sizeof(allocators[0]))

static XAllocData self = { allocators, MAX_ALLOCATORS };

Now, simple one line wrapper functions provide access to the underlying x_allocator module.

C++

void* MYALLOC_Alloc(size_t size)
{
    return XALLOC_Alloc(&self, size);
}

void MYALLOC_Free(void* ptr)
{
    XALLOC_Free(ptr);
}

void* MYALLOC_Realloc(void *ptr, size_t new_size)
{
    return XALLOC_Realloc(&self, ptr, new_size);
}

void* MYALLOC_Calloc(size_t num, size_t size)
{
    return XALLOC_Calloc(&self, num, size);
}

When the caller calls MYALLOC_Alloc() with a size between 1 to 32, a 32-byte block is returned. If the requested size is between 33 and 128, a 128-byte block is provided. MYALLOC_Free() returns the block to the originating fb_allocator instance. In this way, a collection of fixed block memory allocators are grouped together providing variable sized memory blocks at runtime based on application demand. The sample wrapper pattern is used again and again offering groups of memory blocks for specific purposes within the system.

Implementation Details

Most of the allocator implementation is relatively straight forward. However, I’ll explain a few details to assist with key concepts.

fb_allocator Free-List

This is a handy technique for linking blocks together in the free-list without consuming any extra storage for the pointers. After the user calls ALLOC_Free(), a fixed memory block is no longer being utilized and is freed to be used for other things, like a next pointer. Since the fb_allocator module needs to keep the deleted blocks around, we put the list's next pointer in that currently unused block space. When the block is reused by the application, the pointer is no longer needed and will be overwritten by the user object. This way, there is no per-instance storage overhead incurred linking blocks together.

The free-list is actually implemented as a singly linked list, but the code only adds/removes from the head so the behavior is that of a stack. Using a stack makes the allocation/deallocations really fast and deterministic. No loop searching for a free block – just push or pop a block and go.

Using freed object space as the memory to link blocks together means the object must be large enough to hold a pointer. The ALLOC_BLOCK_SIZE macro ensures that the minimum size is met.

fb_allocator Memory Alignment

Some embedded systems require memory to be aligned on a particular byte boundary. Since the allocator’s memory is a contiguous static byte array, having blocks start on an unaligned boundary could cause a hardware exception on some CPUs. For instance, 13-byte blocks will cause a problem if 4-byte alignment is required. Change ALLOC_MEM_ALIGN to the byte boundary desired. The block size will be rounded up to the next nearest aligned boundary.

x_allocator Meta Data

The x_allocator implementation adds 4-bytes of meta data per block. For instance, if 32-byte blocks are required by the user, the x_allocator actually uses 36-byte blocks. The extra 4-bytes are used to hide an fb_allocator pointer inside the block (assuming the pointer is 4-bytes in size).

When deleting memory, x_allocator needs the original fb_allocator instance so the deallocation request can be routed to the correct fb_allocator instance for processing. Unlike XALLOC_Alloc(), XALLOC_Free() does not take a size and only uses a void* argument. Therefore, XALLOC_Alloc() actually hides a pointer to the fb_allocator within an unused portion of the memory block by adding an additional 4-bytes to the request. The caller gets a pointer to the block’s client region where the hidden fb_allocator pointer is not overwritten.

C++

void* XALLOC_Alloc(XAllocData* self, size_t size)
{
    ALLOC_Allocator* pAllocator;
    void* pBlockMemory = NULL;
    void* pClientMemory = NULL;

    ASSERT_TRUE(self);

    // Get an allocator instance to handle the memory request
    pAllocator = XALLOC_GetAllocator(self, size);

    // An allocator found to handle memory request?
    if (pAllocator)
    {
        // Get a fixed memory block from the allocator instance
        pBlockMemory = ALLOC_Alloc(pAllocator, size + XALLOC_BLOCK_META_DATA_SIZE);
        if (pBlockMemory)
        {
            // Set the block ALLOC_Allocator* ptr within the raw memory block region
            pClientMemory = XALLOC_PutAllocatorPtrInBlock(pBlockMemory, pAllocator);
        }
    }
    else
    {
        // Too large a memory block requested
        ASSERT();
    }

    return pClientMemory;
}

When XALLOC_Free() is called, the allocator pointer is extracted from the memory block so the correct fb_allocator instance can be called to deallocate the block.

C++

void XALLOC_Free(void* ptr)
{
    ALLOC_Allocator* pAllocator = NULL;
    void* pBlock = NULL;

    if (!ptr)
        return;

    // Extract the original allocator instance from the caller's block pointer
    pAllocator = XALLOC_GetAllocatorPtrFromBlock(ptr);
    if (pAllocator)
    {
        // Convert the client pointer into the original raw block pointer
        pBlock = XALLOC_GetBlockPtr(ptr);

        // Deallocate the fixed memory block
        ALLOC_Free(pAllocator, pBlock);
    }
}

Benchmarking

Benchmarking the allocator performance vs. the global heap on a Windows PC shows just how fast the code is. A basic test of allocating and deallocating 20000 4096 and 2048 sized blocks in a somewhat interleaved fashion tests the speed improvement. See the attached source code for the exact algorithm.

Windows Allocation Times in Milliseconds

Allocator	Mode	Run	Benchmark Time (mS)
Global Heap	Release	1	36.3
Global Heap	Release	2	33.8
Global Heap	Release	3	32.8
`fb_allocator`	Static Pool	1	22.6
`fb_allocator`	Static Pool	2	3.7
`fb_allocator`	Static Pool	3	4.9
`x_allocator`	Static Pool	1	33.9
`x_allocator`	Static Pool	2	6.9
`x_allocator`	Static Pool	3	7.7

Windows uses a debug heap when executing within the debugger. The debug heap adds extra safety checks slowing its performance. The release heap is much faster as the checks are disabled. The debug heap can be disabled within Visual Studio by setting _NO_DEBUG_HEAP=1 in the Debugging > Environment project option.

This benchmark test is very simplistic and a more realistic scenario with varying blocks sizes and random new/delete intervals might produce different results. However, the basic point is illustrated nicely; the memory manager is slower than allocator and highly dependent on the platform's implementation.

The fb_allocator uses a static memory pool and doesn't rely upon the heap. This has a fast execution time of around 4ms once the free-list is populated with blocks. The 22.6ms on fb_allocator Run 1 accounts for dicing up the fixed memory pool into individual blocks on the first run.

The x_allocator used within the bm_allocator module is a bit slower at ~7ms since it has overhead to allocate/deallocate multiple sized blocks whereas fb_allocator only supports one block size.

In comparison to the Windows global heap, the fb_allocator is about 8 times faster and x_allocator is about 5 times faster. On embedded devices, I've seen as high as a 15x speed increase over the global heap.

Thread Safety

The LK_LOCK and LK_UNLOCK macros within the LockGuard module implement the software locks needed for thread safety. Update the lock implementation as required for your platforms operating system.

Reference Articles

An Efficient C++ Fixed Block Memory Allocator - by David Lafreniere
Replace malloc/free with a Fast Fixed Block Memory Allocator - by David Lafreniere
A Custom STL std::allocator Replacement Improves Performance - by David Lafreniere

Conclusion

The C-based fixed block memory allocator presented here is suitable for any C or C++ system. For a C++ specific implementation with its own unique features, see the referenced articles.

Use the fb_allocator whenever you need to allocate a single block size. Use x_allocator when dispensing multiple block sizes is desired. Create multiple x_allocator wrappers to segregate memory pools depending on intended usage.

If you have an application that really hammers the heap and is causing slow performance, or if you’re worried about a fragmented heap fault, integrating fb_allocator and x_allocator may help solve those problems. The implementation was kept to a minimum facilitating use on even the smallest of embedded systems.

History

23^rd December, 2018
- Initial release
24^th December, 2018
- Added benchmark section to article
- Updated attached source code

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)