Here you will also learn about techniques to avoid wasting space on unique_ptr deleter and on other structures allocators like std vector or map, taking into account Empty Base Optimization (EBO) and no_unique_address attribute sice C++20.
It’s been a long dark (mode) night. The last thing I can remember is the pointers I used. There was a tiny space that left for me to use, 8 bytes only. The exact size of a pointer. I wanted to protect it, and used std::unique_ptr
. Everything seemed working, and then it hit me. std::unique_ptr has to store an additional space for its deleter. And that’s where our adventure begins.
Chapter 1: The Phenomenon
Before investigating a phenomenon, we have to dig deeper and understand the exact case which is in front of us.
So this phenomenon happens only when using an empty captured lambda or an empty functor by value as a deleter. In any other case, the std::unique_ptr
size get changed. Whether we pass the functor by reference, or using std::function
/ function pointer as deleters.
So it actually seems like some kind of optimization, but which one? You have no such available optimization for a case when you store an instance as a class member, at least not before C++20.
The first thing that came to my head is “Do compilers use a hidden optimization technique that no one knows about?”
Chapter 2: Known Techniques
Anytime I notice an unknown or unexplained phenomenon, the first thing I do is to look for a similar behavior. And fortunately, there is one, pre C++20 (and one more that is added in 20) that might explain this phenomenon.
Empty Base Optimization
This optimization is applied when we have a derived class, whose base class doesn’t contain any non-static
data members. Whenever we have such a case, the compiler can optimize the space usage, and avoid allocating a byte for the base class (although the rule forces any instance to be at least 1 byte size). Examples (taken from cppreference):
struct Base {};
struct Derived1 : Base {
int i;
};
int main()
{
static_assert(sizeof(Base) >= 1);
static_assert(sizeof(Derived1) == sizeof(int));
}
This optimization has some limitations and sometimes, its behavior is compiler specific. For example, this optimization won’t apply if the first non-static
data member’s type is the same of the base class or derived from it:
struct Base {};
struct Derived1 : Base {
int i;
};
struct Derived2 : Base {
Base c; int i;
};
struct Derived3 : Base {
Derived1 c; int i;
};
int main()
{
static_assert(sizeof(Derived2) == 2*sizeof(int));
static_assert(sizeof(Derived3) == 3*sizeof(int));
}
For more information about this optimization, see cppreference – Empty Base Optimization.
[[no_unique_address]] // Since C++20
This attribute takes the empty base optimization some steps ahead, and allowing us to perform the same optimization (and a little bit more) on contained data members, without the need to inherit from them. Moreover, any tail padding of the marked data member, may be used for other data members. Examples from cppreference
:
struct Empty {};
struct X {
int i;
Empty e;
};
struct Y {
int i;
[[no_unique_address]] Empty e;
};
struct Z {
char c;
[[no_unique_address]] Empty e1, e2;
};
struct W {
char c[2];
[[no_unique_address]] Empty e1, e2;
};
int main()
{
static_assert(sizeof(Empty) >= 1);
static_assert(sizeof(X) > sizeof(int));
std::cout << "sizeof(Y) == sizeof(int) is " << std::boolalpha
<< (sizeof(Y) == sizeof(int)) << '\n';
static_assert(sizeof(Z) >= 2);
std::cout << "sizeof(W) == 2 is " << (sizeof(W) == 2) << '\n';
}
Note about the test (sizeof(W) == 2)
: Currently gcc, clang and msvc return false
for this test. It’s unclear if it’s an incomplete feature, or a mistake in cppreference.
One more note (thanks to obsidian_golem): In MSVC compiler, this attribute doesn’t exist, instead we should use the attribute [[msvc::no_unique_address
]].
Chapter 3: The Bridge between Possible and Impossible
This optimization is easy to implement using [[no_unique_address
]]. But compilers can use this feature only when compiled with C++20 standard, and because this optimization is applied also in old standard versions, we didn’t finish our research yet.
Now, we left with the first option of empty base optimization. And it’s time to get inside the compilers to see exactly how it’s done.
LLVM Technique
LLVM uses something they called (and probably taken from boost) compressed_pair
. Using a wrapper for each element in the pair, they can decide if they inherit the type or contain it inside. The wrapper is written in the following way:
template <class _Tp, int _Idx,
bool _CanBeEmptyBase = is_empty<_Tp>::value && !__libcpp_is_final<_Tp>::value>
struct __compressed_pair_elem {
public:
const _Tp& get() { return __value_; }
private:
_Tp __value_;
};
template <class _Tp, int _Idx>
struct __compressed_pair_elem<_Tp, _Idx, true> : private _Tp {
public:
const _Tp& get() { return *this; }
};
Let’s split it to multiple parts:
template <class _Tp, int _Idx,
bool _CanBeEmptyBase = is_empty<_Tp>::value && !__libcpp_is_final<_Tp>::value>
Here, we can see the identifier for a class type. _Tp
is the type we want to store (whether it’s the deleter or the managed object). _Idx
is used as a class key (we’ll see where it comes to use in few moments. And one more interesting non-type template parameter _CanBeEmptyBase
which has a significant hint in the name (something like Empty Base Optimization?). The default value it gets is:
is_empty<_Tp>::value && !__libcpp_is_final<_Tp>::value>
It might be a little confusing, but don’t forget that we are inside std
namespace. So we are actually using here std::is_empty
and std::__libcpp_is_final
functions.
std::is_empty
has value true
if it satisfies some conditions. The main condition that is important for our understanding is that the type (and its base classes hierarchy) doesn’t have non-static data members inside. For more conditions and explanations, you can refer to cppreference.
std::__libcpp_is_final
is another hint we get here, because a final
class cannot be inherited to another class.
So the combination to get a true
value to _CanBeEmptyBase
parameter is if the type is empty and not final
. Which means if we can inherit it and use the empty base class optimization on it.
In the default case for this class, we will use the first implementation (which is holding the type as a private
data member: _Tp __value_;
. And the specialization for true
value in _CanBeEmptyBase
is derived from this type:
template <class _Tp, int _Idx>
struct __compressed_pair_elem<_Tp, _Idx, true> : private _Tp
And now that we have compressed_pair_element
, we can construct a compressed_pair
out of it:
template <class _T1, class _T2>
class __compressed_pair : private __compressed_pair_elem<_T1, 0>,
private __compressed_pair_elem<_T2, 1> {
typedef _LIBCPP_NODEBUG_TYPE __compressed_pair_elem<_T1, 0> _Base1;
typedef _LIBCPP_NODEBUG_TYPE __compressed_pair_elem<_T2, 1> _Base2;
_LIBCPP_INLINE_VISIBILITY
typename _Base1::reference first() _NOEXCEPT {
return static_cast<_Base1&>(*this).__get();
}
_LIBCPP_INLINE_VISIBILITY
typename _Base1::const_reference first() const _NOEXCEPT {
return static_cast<_Base1 const&>(*this).__get();
}
_LIBCPP_INLINE_VISIBILITY
typename _Base2::reference second() _NOEXCEPT {
return static_cast<_Base2&>(*this).__get();
}
_LIBCPP_INLINE_VISIBILITY
typename _Base2::const_reference second() const _NOEXCEPT {
return static_cast<_Base2 const&>(*this).__get();
}
};
I admit, it might seem a little intimidating at first look (and it's much more intimidating at the source where they declared also the constructors and a swap function). But let’s clear that implementation a little to see what it is really about.
template <class _T1, class _T2>
class __compressed_pair : private __compressed_pair_elem<_T1, 0>,
private __compressed_pair_elem<_T2, 1> {
typedef __compressed_pair_elem<_T1, 0> _Base1;
typedef __compressed_pair_elem<_T2, 1> _Base2;
inline typename _Base1::reference first() noexcept() {
return static_cast<_Base1&>(*this).__get();
}
inline typename _Base1::const_reference first() const noexcept() {
return static_cast<_Base1 const&>(*this).__get();
}
inline typename _Base2::reference second() noexcept() {
return static_cast<_Base2&>(*this).__get();
}
inline typename _Base2::const_reference second() const noexcept() {
return static_cast<_Base2 const&>(*this).__get();
}
};
So now, we’ve got here a class that inherits from compressed_pair_elem
twice. And here is the explanation for the _Idx
: This is necessary to enable inherit _compressed_pair_elem
that holds the same inner type multiple times. Because we can't inherit from the same class twice, we have to give the class an additional unique key, and LLVM chose an integer (which can also be combined with variadic template and __make_tuple_indices
or different similar approaches). Note that in this specific implementation of compressed_pair
, there is an additional static_assert
that forbids the two types to be the same:
static_assert((!is_same<_T1, _T2>::value),
"__compressed_pair cannot be instantiated when T1 and T2 are the same type; "
"The current implementation is NOT ABI-compatible with the previous "
"implementation for this configuration");
Now, all we have to do is to use this magnificent creature inside the unique_ptr
class, and to potentially save some bytes:
template <class _Tp, class _Dp = default_delete<_Tp> >
class unique_ptr {
public:
typedef _Tp element_type;
typedef _Dp deleter_type;
typedef typename __pointer_type<_Tp, deleter_type>::type pointer;
private:
__compressed_pair<pointer, deleter_type> __ptr_;
};
GCC Technique
In GCC, the technique is a little bit different, but based on the same type of optimization. Inside the unique_ptr
class, they used the following data member declaration:
tuple<_Tp, _Dp> _M_t;
A tuple in some compilers uses empty base class optimization, and it’s time to look over tuples in GCC.
template<typename _Tp>
using __empty_not_final
= __conditional_t<__is_final(_Tp), false_type,
__is_empty_non_tuple<_Tp>>;
template<size_t _Idx, typename _Head,
bool = __empty_not_final<_Head>::value>
struct _Head_base;
#if __has_cpp_attribute(__no_unique_address__)
template<size_t _Idx, typename _Head>
struct _Head_base<_Idx, _Head, true>
{
static constexpr _Head&
_M_head(_Head_base& __b) noexcept { return __b._M_head_impl; }
static constexpr const _Head&
_M_head(const _Head_base& __b) noexcept { return __b._M_head_impl; }
[[__no_unique_address__]] _Head _M_head_impl;
};
#else
template<size_t _Idx, typename _Head>
struct _Head_base<_Idx, _Head, true>
: public _Head
{
static constexpr _Head&
_M_head(_Head_base& __b) noexcept { return __b; }
static constexpr const _Head&
_M_head(const _Head_base& __b) noexcept { return __b; }
};
#endif
template<size_t _Idx, typename _Head>
struct _Head_base<_Idx, _Head, false>
{
static constexpr _Head&
_M_head(_Head_base& __b) noexcept { return __b._M_head_impl; }
static constexpr const _Head&
_M_head(const _Head_base& __b) noexcept { return __b._M_head_impl; }
_Head _M_head_impl;
};
And we got a twist – GCC checked if they are allowed to use no_unique_address
attribute, and in case of true
, they implemented a really similar class to the class specialization that doesn’t use the empty base optimization, only with this attribute.
Now we have the optimization in _Head_base
(similarly to _compressed_pair_elem
in LLVM implementation), and we can inherit it in _Tuple_impl
:
template<size_t _Idx, typename... _Elements>
struct _Tuple_impl;
template<size_t _Idx, typename _Head, typename... _Tail>
struct _Tuple_impl<_Idx, _Head, _Tail...>
: public _Tuple_impl<_Idx + 1, _Tail...>,
private _Head_base<_Idx, _Head>
{ };
Here, we can see the unpacking of the tail, in a classic recursion way, and each unpacked _Head
is being wrapped with _Head_base
class, which is being derived to the _Tuple_impl
.
Chapter 4: There’ll Always be More Rivers to Cross and More Bridges to Build
std::unique_ptr
is just one of many std containers which accept allocator or deleter. And each one of them may use this ability in order to save some bytes. We won’t cover the way it’s done in other containers here, however the idea is the same.
Hope you enjoyed reading. If you have any thoughts about it, or feel that this knowledge assists you in your development, feel free to share it in the comments section.