Table of contents
Introduction
The idea of this article started from some consideration about the availability in the C++ standard library of containers to create open data structures. It is a matter of fact that STL doesn't help so much: it provides well structured containers, but it blinds the inner structure so that you cannot expand it. Whoever of you found himself in the need to implement self-linking or self-sorting objects, in fact, didn't find any answer in STL.
Other alternative approaches don't consider the idea of a generic container of "values", but consider the idea of making the object aware of the containment. And when this is not suitable, wrap values in container-aware boxes.
So I restarted without STL, to write my own ones, attempting to move away from the STL concept of "value semantics".
Generic programing and "reference semantics"
Behind the concept of generic programming, there is the idea that every functionality defined for a type T
can be specialized in order to be "tuned" to types not conformant to a given default.
This is typically done with global template functions or through traits classes (classes containing only typedefs and static functions) acting as policies in respect to a parameter type specifying a custom behavior.
This is fine with "value semantics", for which it is simple to imagine a function returning a value, but goes to nightmare when acting with reference semantics due to the unmanaged nature of C++.
In practice, it is never clear who owns a referenced object, who creates it, and who has to destroy it during the passage, to parameters and return values.
An answer to this kind of problems can become from "smart pointers" like std::auto_ptr
or boost::shared_ptr
.
But they fail supporting another C++ feature: operator overloading. You define operators as operating with classes (thus, having value semantics), and you loose them when acting through "pointers". You always have to explicitly dereference them and - at the same time - you still are in the problem of the copy on return.
To better explain this aspect, consider a sort of "very complex" number, and define an addition between two of them: there's no other way than copy on return. And if you want to return an auto_ptr<my_verycomplex>
, you end up with an
operator+
that takes two my_verycomplex
s and returns a "pointer".
Or consider the idea to have only operators walking on "pointers".
Other garbage collected languages, like C# or the more recent D, always allocate objects dynamically, so they don't need to distinguish between accessing and dereference (so they don't have a ->
operator) and consider references as "values" in respect to operators.
So here I started with the idea to generalize the dereference operation and to specialize the dereference of a pointer into an auto-dereference. The result is a somewhat strange smart pointer, that, unlike its similar, self.converts to and from... its respective values (not pointer) although - being C++ rigid about the use of the ".
" operator, they have a ->
and *
operators.
Generalizing the dereference
Just like we can generalize comparisons using template functions like
template<class A, class B> bool is_less(const A& a, const B& b)
{ return a < b; }
so that we can specialize is_less
differently depending on A
and B
, dereference can be specialized with a traits like this:
template<class T>
struct dereference
{
typedef T value_t; typedef T access_t;
static access_t* access(value_t* p) { return p; }
};
Here we are essentially saying that - by default - a pointer to a value accesses the value itself.
At this point, we can declare a smart handler (a pointer that self converts into its value) with a class like the following:
template<class T>
class auto_H
{
T* p;
public:
auto_H() :p() {}
auto_H(const auto_H& h) :p(h.p) { const_cast<auto_H&>(h).p = 0; }
auto_H(const T& t) :p(new T(t)) {}
explicit auto_H(T* s) :p(s) {}
~auto_H() { delete p; }
H& operator=(const H& h)
{ if(this != &h) { this->~H(); new(this)H(h); } return *this; }
const T* get() const { return p; }
T* get() { return p; }
typedef T value_t;
typedef typename dereference<T>::access_t access_t;
const access_t* operator->() const
{ if(!p) throw dbg::excp_nullptr(); return dereference<T>::access(p); }
const access_t& operator*() const { return *operator->(); }
access_t* operator->()
{ if(!p) p = new T; return dereference<T>::access(p); }
access_t& operator*() { return *operator->(); }
operator const access_t&() const { return operator*(); }
operator access_t&() { return operator*(); }
};
Essentially, it cumulates the following features:
- Copy and assignment operate an ownership transfer (a la
auto_ptr
). - Access and dereference (
->
and *
) operate through dereference traits, and - unlike regular pointers - retain the const
-ness. - Self convert to and from the respective reference (not pointer), so that if
T
provides its own operators, we can easily write expressions mixing-up both T
and auto_H<T>
. Due to this last fact, T
can return a value using auto_H<T>
itself, thus avoiding the copy on return. - Auto create a value when accessed in non-const mode.
But an important aspect is that, by specializing dereference
, we can come to a curious recursion like the following:
template<class T>
struct dereference<auto_H<T> >
{
typedef auto_H<T> value_t;
typedef typename dereference<T>::access_t access_t;
static access_t* access(value_t* p) { return p->operator->(); }
};
Where we are saying that accessing an auto_H
will result in its-own deep access, thus making all levels of indirection to self-convert into their respective deep values.
Class H<T,cow>
implements a handler similar to auto_H
, but using reference counting instead of transfer ownership and, when cow
is defined as true
, performing copy-on-write by cloning (via copy constructor) a shared value if accessed in a non-const way.
Vector and its anomaly
Unlike most containers, vectors present an anomaly: their elements cannot be allocated independently of each other, and - in the case of growing, insertion, and removal of elements, the entire structure must be relocated to preserve their essential nature of "compact structure of consecutive elements", that is required to be efficiently accessed by integer indexes.
Vectors are, in fact, dynamically allocated arrays by a "manager" that defines how such an array can grow and shrink, passed as a parameter or as a return value, shared, copied, etc. Such a manager is itself a handle, and hence vectors must behave themselves as "handled".
But vectors are also collections that can be enumerated and iterated. It is so necessary to have an interface that allows a class to be consistently used by such operations.
Collections and iterations
The STL defines an abstract concept of "iterator" (that is nothing more than a generalized pointer that supports the ++
and --
operations), and implements it privately in every collection type. Here we will keep this concept open, defining iterators as generic types that rely on functionalities implemented in the collection they iterate.
An iterator, in this framework, expects to operate on a class implementing this interface model:
struct ti_enumerable
{
struct index_type;
struct iterated_type;
iterated_type& at(index_type i);
const iterated_type& at(index_type i) const;
index_type first() const;
index_type last() const;
index_type prev(index_type ) const;
index_type next(index_type ) const;
};
Practically, index_type
will be overridden as "whatever type can be suitable to mark a position in a collection": essentially a generalized "index". Meanwhile, iterated_type
will be the type to what it is expected that an iterator dereference.
You will not need an iterator class to walk a collection, since these functions are accessible. But iterators can simplify the walking providing a syntax much similar to handles than using the collection enumeration functions directly.
The "dereferencing" must be implemented in the collection through the at
functions, while iteration is supported by first
, last
, prev
, and next
. To properly support iteration boundaries, next(last())
and prev(first())
must return a recognizable index value to mark the end of the iteration.
For the vector
base-class buffer
, this becomes:
typedef int index_type;
typedef typename dereference<T>::access_t iterated_type;
int first() const { return p->size? 0: npos; }
int last() const { return p->size? p->size-1: npos; }
int next(int n) { return (n<p->size-1)? n+1: npos; }
int prev(int n) { return (n>0)? n-1: npos; }
const iterated_type& at(int idx) const {
return *dereference<T>::access(&p->get()[idx]); }
iterated_type& at(int idx) { unshare();
return *dereference<T>::access(&p->get()[idx]); }
where T
is the type to what the buffer is parametrized. Note that we are using specializing dereference to access the stored value through index and iterations, so that the vector of handles behaves as holding values.
Buffer, vector, and strings
The buffer
class implements a reference-counted copy-on-write array with growing capabilities for 50% of its capacity without relocation need, using in-place construction and deletion of its elements.
Such operations are driven by a traits class (buffer_traits<T>
) that, for a generic type, iterates among default and copy constructors and destructor.
They can be easily specialized for character types to avoid such overhead, and call the traditional C functions (typically implemented in the direct assembler) memcpy
, memset
, and memmove
.
The class vector
is a better "tuned" buffer (from which it derives), driven by vector_traits
, that overrides buffer_traits
and adds some specific features to support lexicographic sorting. Unlike buffer
, vector
supports comparison operators through an override of the <
==
operator inherited by comparable
and delegating to the traits.
Unlike many other frameworks do, string
s, here, aren't vector
, but just another flavor of buffer
, with different traits (string_traits
) and some more specific member functions.
The main difference from vectors are in the comparisons and in the conversion capabilities.
| vector | string |
---|
comparisons | Compares lexicographically the elements for < and for == , relying on the comparison operation of the embedded elements, | Compares lexicographically using the C "collate" functions, using the current locale and ignoring case. |
---|
conversions | Convert form another vector, element by element, relying on the el elements conversion if supported by the elements themselves. | Conversion supported only between char and whar_t based strings, by means of the wcstombs and vice versa C functions. |
---|
concatenations | No operator provided. The insert function is available for this purpose. | The insert function is delegated from the + and += operators. |
---|
The choice to use "ignore case collate" for string comparisons is because I found that in most of the programs I wrote, this is the most used function to match strings in lists or to match a user input to a data set.
Of course, every other string function can be used: strings are null terminated, and self-convert to const T*
, where T
is char
or wchar_t
(and TCHAR
).
The way the buffer is stored also makes the string size to be stored just before the first character, thus making them compatible to BSTR
, although not allocated via OLE.
Linked, chains, and lists
Self-linking items can be required as elements of complex structures, where algorithms can be implemented distributed across the element's member functions (rather than across iterations).
For this kind of stuff, STL gives practically no support.
Here, two paired classes (linked
and chain
) are used as bases to implement complex chains of complex linked elements. Both the classes take as template parameters the two derived classes for which linked
and chain
are used as bases.
Because template instances are different types, the same derived class can derive from more differently parameterized bases. It is so possible to have elements for more chains and chains for more elements independently.
A linked
element is considered to be owned by the chain
it is inserted into, and it is aware of the chain and of the neighbor elements. It is so possible to implement recursive algorithms as member functions calling themselves applied on next()->
or prev()->
, making such constructs suitable for responsibility chain implementations.
A chain
owns linked
elements and - on clear
- performs the action defined in the static free
function (that calls delete
, but can be overridden). static_cast
is used to refer to the derived elements from their recurring template bases. Chains are also implemented as enumerable, so that stdx::iterator<your_chain>
works coherently, dereferencing to the chained elements.
Lists are then nothing more than chains of list_node<T>
, that are in turn node boxes holding a T
value. The class list<T>
is a chain<list<T>, list_node<T> >
with list_node<T>
that is a linked<list_node<T>, list<T> >
. The class list
overrides the at
functions and the iterated_type
so that iterators dereference into T
across its own eventually specialized dereference<T>::access
and types.
Also, insertion and removal of items are defined in terms of T
rather than items themselves. The class list
becomes so blended against its nodes, even if based on visible structures, and becomes so equivalent to std::list
, but, unlike in STL, it is not the only possible kind of list.
Btrees, nodes, sets, and maps
In a similar way of what has been done with lists, btree
implements an AVL tree of btree_node
-s, or better, both these classes are the bases for implementing a Btree of Btree nodes.
Like list nodes, btree nodes are aware of their owner and logical neighbors (the prev
and next
functions walk the sorted sequence of the nodes), but unlike list nodes whose values are themselves, btree nodes must specify a way for the btree to get a sorting key and a mapped value.
This is achieved by overriding the key_t
and mapped_t
in btree_node
, as well as the get_key
and get_mapped
functions. The default implementation just returns a const-reference to the nodes themselves, that is expected in their override to be comparable (supporting <
and ==
).
Like we did with buffer to get vectors and strings, and with chain to get lists, some "tuned" btrees are derived from this class pair.
The class set
is a btree<set<T>,seet_node<T> >
with set_node
as btree_node<set_node<T>,set<T> >
: it carries the T
value and uses it as its own key and mapped value. It corresponds to the std::set
.
Similarly, map<K,V>
is a btree<map<K,V>, map_node<K,V> >
and map_node
is a btree_node<map_node<K,V>, map<K,V> >
, declaring K
as a key_t
and V
as value_t
. It also introduces the operator[](const K&)
returning const V&
in const mode (throws if no key found), and V&
in non-const (create if not found) mode.
Like all the other classes, access and dereference is implemented through the specializing dereference
traits, thus allowing self-dereference of handlers or of types that support similar behavior (that have a specialized stdx::dereference
).
Memory allocation
An extensive use of linked elements and chains is the fixed-allocation manager in alloc.cpp. It is nothing more than what has been presented in this other article, rewritten using these collection modules.
By overriding the global new
and delete
operators, memory under a given size is not queried to the Operating System, but sub-allocated by a manager that in turn requests to the system always chunks of fixed size. This is useful in applications that do large use of dynamic objects (may be via a handler) where many of them can be small and live for a relatively short time.
Unit tests
Simple unit tests that demonstrate the functionality of these classes are enclosed in the T0A project, that access the classes in the form of a static library.
Such unit tests compile in debug form to be easily tracked by a step-by-step execution, and in release form to act as a performance test for containers. This may, to build, require at least 1GB of memory since millions of objects are allocated and deallocated to check the container capabilities.
Since this is a non-STL project, printf
functions have been used to print to the screen.
In conclusion
Two simple ideas (using smart "handles" instead of "pointers", and using accessible building blocks to build containers) makes C++ operate more similar to other garbage collecting languages, and can even make STL not required.
The unit tests code is a good sample about how this has been achieved.