Introduction
This article deals with the algorithmic complexity induced by the common implementation of the one-to-many relationship. It proposes a simple and efficient alternative in C++.
Background
The relationship between objects is one of the main topics a developer has to deal with in object oriented programming. These relationships are classified according their cardinality: one-to-one, one-to-many, many-to-many,
For the one-to-many, the various resources (books, tutorial, forum) available to developers offers, in my experience, a terribly inefficient implementation from the point of view of algorithmic complexity.
These implementations mainly use aggregated arrays or lists, some using sets. This example code, applied to the subject-observer pattern, illustrates my point.
class Observer
{
Subject* _subject;
public:
Observer() : _subject(nullptr) { }
~Observer() { detach(); }
void attach(Subject* s)
{
detach();
if(_subject = s)
_subjet->attach(this);
}
void detach()
{
if(_subject)
_subject->detach(this);
_subject = nullptr;
}
};
class Subject
{
friend class Observer;
array<Observer*> _observers;
list<Observer*> _observers;
set<Observer*> _observers;
void _notify() { ... }
void _attach(Observer* o)
{
_observers.insert(o);
}
void _detach(Observer* o)
{
_observers.remove(o);
}
~Subject()
{
while(_observers.size())
_observers[0]->detach();
}
};
On an array container, the complexity of the remove is in O (n). First, the position of the observer must be sought, then the following elements of the table must be moved to fill the vacancy left by removing the observer.
On a list container, the complexity is identical, the problem being similar. The position of the observer must first be sought before its removing. The search for the element leads to complexity in O(n), not the remove itself.
On a set container, the complexity depends on the implementation of the set. On a binary tree (such as the std::set
), the insertion as the remove will be done in O (log2 (n)).
To sum up, all the complexity comes from the need to find the position of the observer in the container that points to it.
Simple but Efficient Alternative
The find complexity can be resolved in a very simple way, by using a double linked list implementation directly inside the Observer
class.
class Observer
{
friend class Subject;
Subject* _subject;
Observer* _prev;
Observer* _next;
public:
Observer() : _subject(nullptr) , _prev(nullptr) , _next(nullptr) { }
~Observer(); void attach(Subject*); void detach(); };
class Subject
{
friend class Observer;
Observer* _first;
void attach(Observer* o)
{
o->_next = _first;
if(_first)
_first->_prev = o;
_first = o;
}
void detach(Observer* o)
{
if(_first == o)
_first = o->_next;
if(o->_prev)
o->_prev->_next = o->_next;
if(o->_next)
o->_next->_prev = o->_prev;
o->_prev = o->_next = nullptr;
}
};
Because the observer knows its position in the list (_prev
and _next
members), there is no need to search for it. Complexity is O(1).
Implementation of attach or detach is straightforward,
It occupies a memory space similar to a list or a set, and two times that of an array.
Its only fault is the presence of naked pointers (not fashionable today), and it requires rewriting the same few lines of code for each new relationship.
Using Inheritance to Resolve the Last Faults
The members and attach/detach methods of the related classes can be moved to a pair of template classes: inList
and inCell
. The "in
" prefix stands for inheritance.
template <typename LIST, typename CELL>
class inCell
{
public:
inList<LIST,CELL>* _list;
inCell<LIST,CELL>* _prev;
inCell<LIST,CELL>* _next;
inCell() : _list(nullptr) , _prev(nullptr) , _next(nullptr) { }
~inCell(); void attach(inList<LIST,CELL>*); void detach(); };
template <typename LIST, typename CELL>
class inList
{
public:
inCell<LIST,CELL>* _first;
void attach(inCell<LIST,CELL>*); void detach(inCell<LIST,CELL>*); };
Then, each implementation of the one-to-many relationship is ensured by an inheritance. For design questions, private inheritance will be privileged with the writing of facade methods.
class Observer : private inCell<Subject,Observer>
{
public:
void attach(Subject* s) { InCell<Subject,Observer>::attach(s); }
void detach() { InCell<Subject,Observer>::detach(); }
};
class Subject : private inList<Subject,Observer>
{
friend class Observer;
};
For the subject-observer, the one-to-many relationship is guaranteed, "sealed" forever in the inList
-inCell
implementation. This is not the case for the "common" implementation, where few lines have to be written for each new relation.
Because C++ allows multiple inheritance, all needed one-to-many relationships between classes can be achieved easily this way. Here is an example with 3 related classes.
class Record : private inCell<Artist,Record>
{
};
class Label : private inList<Label,Artist>
{
friend class Artist;
};
class Artist : private inList<Artist,Record>,
private inCell<Label,Artist>
{
friend class Record;
void sign(Label* l) { inCell<Label,Artist>::attach(l); }
...
};
In addition, this way of doing things is very expressive. The one-to-many relationships are clearly visible, grouped in one place.
Benchmarks
For the test, vector, list and set come from the standard library.
The test consists in attaching N observers to the subject, then in detaching them. The order of detachment is important for measuring algorithmic complexity. To simulate real use cases, the observers located in the middle of the container will be detached, to avoid favoring the first or last positions. In the particular case of a list container, if the removed observers are always at the first position, the execution time will be in O (1) (not visible in this test).
Inherited lists (inList
) are linear in execution. The execution time is normalized to inList
execution time.
Attach
All container types are slower in execution than inList
.
All are linear in execution time, except std::set
in O(log2(n)).
vector
becomes relatively faster as the container grows in size. This is because of the reallocation strategy of the vector
.
list
needs to allocate/free each cell contained in it, which is time consuming. This is not the case for inCell
, because the observer is the cell itself.
| attach |
| vector | list | set | inList |
10 | 20 | 16 | 23 | 1 |
100 | 11 | 36 | 61 | 1 |
1000 | 7 | 40 | 86 | 1 |
10000 | 7 | 42 | 101* | 1 |
(*) the attach of the 10k elements in the set are achieved in 0.0015 seconds on a common laptop today.
Detach
vector
and list
are in O(n). If the set grows by 10
, the execution time, relatively to inList
becomes 10 times longer.
The set is in O(log2(n)). Each time the container doubles in size, the execution time becomes one step slower.
| detach |
| vector | list | set | inList |
10 | 5 | 15 | 29 | 1 |
100 | 42 | 71 | 80 | 1 |
1000 | 575 | 703 | 187 | 1 |
10000 | 5135 | 8681* | 233 | 1 |
(*) the detach of the 10k elements in the list are achieved in 0.05 seconds on a common laptop today.
Discussions
A common implementation (i.e., with aggregated vector/list/set) may suffice for some needs; containers may be small in size, attach/detach happens rarely, computers are fast enough, etc.
However, the inherited list is always the best, far ahead, in runtime, regardless of the size of the set. It is easy to implement and understand, robust in its design, clear in its presentation, and in O(1). In short, it deserves to be known to all.
I would like to pay tribute to my mentor, Michel Ziackovic ("Rendre à César ce qui est à César" in French). He taught me this way of doing things 25 years ago. He was the architect of sophisticated CAD software where thousands of one-to-many relationships, for hundred of classes, are still implemented this way.
History
- 23rd January, 2020: Initial version