Introduction
I started to write an article regarding Garbage Collection in C++ and one of my comparisons was that real garbage collected languages may be faster than C++ in some situations because they allocate memory in blocks, which makes the allocation of many small objects become extremely fast, and this doesn't happen in C++.
Well, actually we can do the same in C++ but it is not automatic, so it is up to us to use it. To do that we should use some kind of memory or object pooling.
I looked for some existing implementations and I actually didn't like them. Boost, for example, has a pool that's fast to allocate memory but that's slow to free memory if we want to control when we destroy each object. So, I decided to create my own implementation, which I am going to present here.
The Solution
This solution is made of a single template class, named ObjectPool
. I can say that choosing between naming the class "object pool" and "memory pool" is a little problematic. As the solution doesn't keep a certain number of objects already initialized, we can say that it is only a memory pool, so the cost to invoke the constructor and the destructor of individual objects will continue to happen. Yet I didn't want to say it is a "memory pool", as users will request objects, not raw memory, from the pool. Maybe I should find another name that doesn't cause confusion but, for now, it is called ObjectPool
.
The implementation is like this:
- An initial amount of memory is allocated (by default, it is a block capable of holding 32 objects, aligned as a pointer [4-bytes for 32-bit computers and 8-bytes for 64-bit computers]) and there's a reference to a "first deleted" object set to
NULL
; - Each time a new object is requested the code first checks if there's a pointer for a first deleted object. If there is, then that's the address that will be used and the first pointer will point to the "next" free object (which is the content at that pointer location). If there's no deleted object that we can reuse, then we check if we still have place in our memory block. If not, a new block of memory is requested (which doubles in size, with an specific limit). In any case, we will call the constructor of the object at the address that we chose and then we will return it;
- To delete an object, it is pretty simple. We invoke the destructor and then we will consider the object as a pointer. Such pointer will be set to point to the actual "first free" object then its address will be put as the first deleted item;
- When we delete the pool itself, it will free the first block and, as each block frees the next one, it will end-up freeing all blocks.
So, we can summarize things like this:
- All object allocations are O(1). Surely in some cases memory needs to be allocated, which can take some time, but the performance of such allocation is not directly affected by the number of already allocated items as we put a limit on how big the blocks can become;
- All object destructions are O(1), as we call the destructor and simply "swap pointers";
- It is not important how many objects we have deleted. The pool will keep all memory blocks allocated until the pool itself is deleted;
- When we delete the pool, the memory blocks are free but the destructor of the inner items are not called, so it is up to us to delete each item before destroying the pool (if we need to call the object destructor at all).
And there are the alternative methods that actually don't initialize or destroy the objects. Those methods do all the work for "allocating" or "deallocating" an object from the pool but don't call the constructor or the destructor. This may be useful if we need to call a specific, non-default constructor or if we know that the object doesn't have a destructor (or for some reason we don't want to call it).
When is this pool useful?
- When we have a "work" to do that allocates/deallocates many objects and we know that we can keep the memory allocated until the end of such work (most loops enter in this category);
- When we know that we have a limited number of objects that will be in memory at the same time, yet we keep "allocating" and "deallocating" them.
Using the code
To use the code we must initialize the pool giving the initial capacity and the maximum size for the other blocks. If we don't give any parameter, the defaults of 32 (for the initial capacity) and 1 million (for the maximum block size) are used.
So, a line like the following will initialize our pool in the stack (or statically) using those defaults:
ObjectPool<Test> pool;
Then, to allocate an object we do:
Test *test = pool.New();
And to deallocate an object we do:
pool.Delete(test);
If we don't want to use the default constructor, we can do:
T *unitializedObject = pool.GetNextWithoutInitializing();
Test *test = new (uninitializedObject) Test();
And if we don't want to allow the destructor to be called but we want to return the memory to the pool, we can use:
pool.DeleteWithoutDestroying(test);
Obviously, we should use the appropriate class instead of Test
and the appropriate variable names instead of test
.
The source code of the pool is this:
template<typename T>
class DefaultMemoryAllocator
{
public:
static inline void *Allocate(size_t size)
{
return ::operator new(size, ::std::nothrow);
}
static inline void Deallocate(void *pointer, size_t size)
{
::operator delete(pointer);
}
};
template<typename T, class TMemoryAllocator=DefaultMemoryAllocator>
class ObjectPool
{
private:
struct _Node
{
void *_memory;
size_t _capacity;
_Node *_nextNode;
_Node(size_t capacity)
{
if (capacity < 1)
throw std::invalid_argument("capacity must be at least 1.");
_memory = TMemoryAllocator::Allocate(_itemSize * capacity);
if (_memory == NULL)
throw std::bad_alloc();
_capacity = capacity;
_nextNode = NULL;
}
~_Node()
{
TMemoryAllocator::Deallocate(_memory, _itemSize * _capacity);
}
};
void *_nodeMemory;
T *_firstDeleted;
size_t _countInNode;
size_t _nodeCapacity;
_Node _firstNode;
_Node *_lastNode;
size_t _maxBlockLength;
static const size_t _itemSize;
ObjectPool(const ObjectPool<T, TMemoryAllocator> &source);
void operator = (const ObjectPool<T, TMemoryAllocator> &source);
void _AllocateNewNode()
{
size_t size = _countInNode;
if (size >= _maxBlockLength)
size = _maxBlockLength;
else
{
size *= 2;
if (size < _countInNode)
throw std::overflow_error("size became too big.");
if (size >= _maxBlockLength)
size = _maxBlockLength;
}
_Node *newNode = new _Node(size);
_lastNode->_nextNode = newNode;
_lastNode = newNode;
_nodeMemory = newNode->_memory;
_countInNode = 0;
_nodeCapacity = size;
}
public:
explicit ObjectPool(size_t initialCapacity=32, size_t maxBlockLength=1000000):
_firstDeleted(NULL),
_countInNode(0),
_nodeCapacity(initialCapacity),
_firstNode(initialCapacity),
_maxBlockLength(maxBlockLength)
{
if (maxBlockLength < 1)
throw std::invalid_argument("maxBlockLength must be at least 1.");
_nodeMemory = _firstNode._memory;
_lastNode = &_firstNode;
}
~ObjectPool()
{
_Node *node = _firstNode._nextNode;
while(node)
{
_Node *nextNode = node->_nextNode;
delete node;
node = nextNode;
}
}
T *New()
{
if (_firstDeleted)
{
T *result = _firstDeleted;
_firstDeleted = *((T **)_firstDeleted);
new(result) T();
return result;
}
if (_countInNode >= _nodeCapacity)
_AllocateNewNode();
char *address = (char *)_nodeMemory;
address += _countInNode * _itemSize;
T *result = new(address) T();
_countInNode++;
return result;
}
T *GetNextWithoutInitializing()
{
if (_firstDeleted)
{
T *result = (T *)_firstDeleted;
_firstDeleted = *((T **)_firstDeleted);
return result;
}
if (_countInNode >= _nodeCapacity)
_AllocateNewNode();
char *address = (char *)_nodeMemory;
address += _countInNode * _itemSize;
_countInNode++;
return (T *)address;
}
void Delete(T *content)
{
content->~T();
*((T **)content) = _firstDeleted;
_firstDeleted = content;
}
void DeleteWithoutDestroying(T *content)
{
*((T **)content) = _firstDeleted;
_firstDeleted = content;
}
};
template<typename>
const size_t ObjectPool<t,>::_itemSize = ((sizeof(T) + sizeof(void *)-1) / sizeof(void *)) * sizeof(void *);
</t,></typename>
Thread-safety
The presented code is not thread-safe but this actually makes it faster. So, if we use it in a static variable or to pass it to other threads, it is up to us to use some kind of locking.
Personally I think that we should not have a real static pool and, if needed, we should have a pool per thread. This will avoid the performance degradations caused by locking.
Sample
The sample is an application that keeps creating and deleting 100 objects 1 million times, then it shows how much time it takes to finish the job, using the pool and using normal new and delete calls.
Of course this is not a real situation, but it shows how fast the pool can be compared to normal new/delete calls. It is up to you to use it in better situations.
Version History
- 12, April, 2014: Added a memory allocator parameter to the template, the default one uses the
new
/delete
operators instead of malloc
/free
, declared the copy constructor/copy operator as private to avoid the default implementation, made the constructor explicit
and changed the delete of the memory blocks to be in a while
instead of being recursive to avoid excessive use of the stack; - 21, March, 2014: Corrected a bug (double allocation) in the constructor;
- 19, March, 2014: Initial version.