Introduction
64 bit is wonderful - we are no longer limited to just 2GB (per Win32 process) - you can now address a full 8TB of memory (per Win64 process)... which is great if your application really needs 8TB of memory.
If you don't need the full address space, 64-bit can be wasteful: pointers take twice the memory, which may actually result in a slower application, especially when using a pointer-laden data structure such as a binary tree (larger structures take more cache space).
A simple scheme is presented that allows using 32-bit "pointers" to access up to 64GB of memory on a 64-bit machine.
Encoding 64-bit pointers in 32-bits
The main idea of sptr
is data alignment - modern processors feel much more comfortable when accessing "aligned" addresses: accessing a char
at address 0xC0001
means more work than accessing an int
at 0xC0000
.
malloc()
(which does the heavy lifting for new
) only returns aligned addresses (see http://msdn.microsoft.com/en-us/library/ycsb6wwf.aspx) - a pointer returned by malloc()
will always be aligned to a 16-byte boundary on Visual Studio - the 4 lowest bits will always be 0.
Therefore, at least for pointers we allocate on the heap, we can just "delete" the 4 lowest bits (by shifting right) without losing information.
If all memory addresses were allocated from the "bottom" (the highest bits are also 0), we could just encode a pointer as follows:
#include <stddef.h> // for uintptr_t
typedef unsigned __int32 uint32_t;
uint32_t encode(void* p)
{
uintptr_t i = reinterpret_cast<uintptr_t>(p) >> 4;
return static_cast<uint32_t>(i);
}
(uintptr_t
is an integral data type that is guaranteed to be the same size as a pointer.)
This will encode any (16-byte aligned) address in the range 0x00 0000 0000
- 0x10 0000 0000
into a 32-bit integer.
However, the Operating System is free to map physical memory into other virtual memory segments - for example, mapping kernel memory "at the top", so it is preferable to keep such a map ourselves:
#include <boost/thread/mutex.hpp>
#include <boost/thread/locks.hpp>
class sptr_base
{
protected:
static const uint32_t ALIGNMENT_BITS = 4;
static const uint32_t ALIGNMENT = (1<<ALIGNMENT_BITS);
static const uintptr_t ALIGNMENT_MASK = ALIGNMENT - 1;
protected:
static uintptr_t _seg_map[ALIGNMENT];
static uintptr_t _segs;
static boost::mutex _m;
inline static uintptr_t ptr2seg(uintptr_t p)
{
p &= ~0xFFFFFFFFULL; uintptr_t s = _segs;
uintptr_t i = 0;
for (; i < s; ++i)
if (_seg_map[i] == p)
return i;
boost::lock_guard<boost::mutex> lock(_m);
for (i = 0; i < s; ++i)
if (_seg_map[i] == p)
return i;
i = _segs++;
if (_segs > ALIGNMENT)
throw bad_segment("Segment out of range");
_seg_map[i] = p;
return i;
}
};
ptr2seg()
maps the high bits of a pointer to one of a few segments. The code refrains from using a mutex
when reading the map, based on the atomicity of integer assignment.
The actual pointer encode/decode is then as follows:
#include <boost/pool/detail/singleton.hpp>
typedef boost::details::pool::singleton_default<segment_mapper> the_segment_mapper;
uint32_t encode(const_native_pointer_type ptr)
{
uintptr_t p = reinterpret_cast<uintptr_t>(ptr);
if ((p & ALIGNMENT_MASK) != 0)
throw bad_alignment("Pointer is not aligned");
return (uint32_t)(ptr2seg(p) + p);
}
inline native_pointer_type decode(uint32_t e)
{
uintptr_t el = e;
uintptr_t ptr = (_seg_map[el & ALIGNMENT_MASK] + el) & ~ALIGNMENT_MASK;
return reinterpret_cast<native_pointer_type>(ptr);
}
sptr
sptr
gives a pointer-like interface to the "compressed" pointer. It allows accessing our "pointer" similar to a real pointer:
sptr<int> p = new int;
*p = 5;
delete p;
int* q = new int;
p = q;
p++;
q = p;
Limitations
sptr
has several limitations (which may make it inappropriate in many places).
First, sptr
can't access the whole 64-bit address space (duh!). It is appropriate only for applications requiring just a bit more than the 32-bit address space (server machines are usually limited to 16GB-32GB of memory; Desktops usually can't deal with more than 8GB).
Second, although an address allocated on the heap will be aligned properly, pointer arithmetic may complicate things:
sptr<char> buf = new char[100];
will work just fine as new
returns an aligned address.
However, something like:
for (sptr<char> p = buf; p != buf + 100; ++p)
*p = '0';
will not work as p
will now point to an unaligned address (the sptr
implementation will assert at compile time).
However, in most cases, a standard pointer can be used:
for (char* p = buf; p != buf + 100; ++p)
*p = '0';
Third, mix sptr
and multiple-inheritence with caution.
For example, given:
struct A { char i; };
struct B { int j; };
struct C : public A, public B { };
This may or may not work (depending on structure alignment):
sptr<C> c = new C;
sptr<B> b = c;
This is because b
does not point to the same address as c
(this is not the place for a discussion on multiple-inheritance, but you can try the following code to convince yourself):
C* c = new C;
B* b = c;
std::cout << ((uintptr_t)b - (uintptr_t)c) << std::endl;
There is no problem accessing the object via a standard pointer:
sptr<C> c = new C;
B* b = c;
Performance
A small benchmark is provided with the code, implementing a simple linked-list which is repeatedly traversed.
The list's node is:
struct node
{
int val;
node_ptr prev;
node_ptr next;
};
The size of the node is therefore 20 bytes with standard pointers vs. 12 bytes with sptr
.
However, this comes at a tradeoff - the extra bit shuffling impacts performance - the standard pointer version will perform 1.5x faster on the linked-list benchmark.
Accessing the map accounts for much of this performance hit. If it could be guaranteed that all addresses are allocated from the region 0x00 0000 0000 - 0x10 0000 0000
, we could discard the map, significantly improving performance.