C++ comes with a collection of highly optimized algorithms for sorting, partitioning, and filtering arrays of objects. Typically, the data for each object is stored in a contiguous span of memory and can be treated as a single chunk - read, replaced, or moved. In some cases, however, the data is stored in an interleaved manner, i.e., a list of x-coordinates, then y-coordinates, then z-coordinates. The out-of-the-box iterators won't work in such a case. The article shows how to write a custom iterator alongside a memory-efficient wrapper class.
Introduction
Let's suppose that we store the coordinates for 10 points as a soa[30]
array of floats. The first ten elements are x-coordinates, the following ten elements are y-coordinates, and the last ten elements are z-coordinates. Such a way of storing the data is often called the Structure of Arrays (SOA). We want to (1) remove elements whose z-coordinates are zero and sort the remaining data by the y-coordinate. Surely, we can rearrange the data so that the x,y,z values are stored in groups, for example:
std::vector<std::array<float,3>> v;
The vector could then be processed by conventional means. Another solution is to make a separate index sequence of the form:
std::vector<int> v(10); std::iota(v.begin(), v.end(), 0);
The original data could be accessed indirectly via the indexing array v
. A result of the sorting operation would be a new sequence, which could be used to re-arrange the elements.
The issue in both approaches is the duplication of the original data, which is not a big deal when sorting 10 points. Unfortunately, the SOA format often shows up when working with GPU computing, where such a format is optimal for memory throughput. And when someone uses GPUs to process optimally stored data, they often have more than 10 points (possibly eight GPU devices holding ~80 GB each).
Since we are processing the data host-side (on a CPU), it would suffice to rearrange it while copying from a GPU to a host. Unfortunately, such an operation would be extremely inefficient, so we are back to the initial problem.
Obviously, sorting, partitioning, and other algorithms from the C++ library can be performed on SOA. Instead of moving objects as contiguous chunks, we need to move the individual pieces separately, e.g., x, y, z coordinates one by one. But we don't want to rewrite these functions. Things would be more convenient if one could reuse the existing C++ library.
Background
Custom iterators are used to define custom traversal behavior for user-defined data structures. We can create a custom iterator to traverse our soa[30]
array, but how would this help? The array is the simplest data structure - we could just use pointers instead of custom iterators to traverse it.
The issue with invoking the standard algorithms is not traversal but reading and writing the elements. For example, the std::sort()
function extensively uses std::iter_swap(iterator a, iterator b)
to exchange the data. Iterators do not store the data - they can provide a reference to an object that could be read, copied, or overwritten. But how can we get a reference to a non-contiguous object like SOA? We can't.
A proxy object
A proxy object pretends to be a contiguous chunk of data while it is not. More precisely, sometimes, it holds a copy of the data, and sometimes it accesses the SOA to extract the data. When we only need to read the data, e.g., the z-coordinate of the 5th element, it suffices to access soa[5*pitch+2]
to get it. However, functions like std::swap()
also want to make copies of objects. When a copy constructor is invoked, the new object takes a full copy of the data as well. Thankfully, only a handful of these are created by functions such as std::sort()
and std::remove_if()
.
Sample code
Proxy class
The code below is a proof of concept, not a "library." It includes printout functions spdlog::info()
from the spdlog
library to trace the execution step by step. The proxy object Point
contains a boolean variable isReference
, which keeps track of whether the data should be extracted from *soa
at index pos
, or the data comes from the local storage data[nArrays]
. The number of arrays in SOA is nArrays
and the total size of soa
is nArrays*pitch
, where pitch
is the maximum number of elements that can be stored. The value of pitch
does not change. The proxy class defines a regular constructor (isReference
is set to true
), copy constructor (isReference
is set to false
), a way to access data via x()
, y()
, z()
functions. It overloads operator=
so that it writes into *soa
when isReference
is set.
struct Point
{
constexpr static unsigned nArrays = 3; bool isReference = false;
unsigned pos, pitch; float *soa;
float data[nArrays];
Point()
{
isReference = false;
spdlog::info("Point() constructor");
}
Point(const Point &other)
{
isReference = false;
if(other.isReference)
{
for(int i=0;i<nArrays;i++) data[i] = other.soa[other.pos + i*other.pitch];
spdlog::info("Point(other) constructor ({},{}); from ref {}; ({},{})",
x(),y(), other.pos, other.x(), other.y());
}
else
{
for(int i=0;i<nArrays;i++) data[i] = other.data[i];
spdlog::info("Point(other) constructor ({},{}); from val: ({},{})",
x(),y(), other.x(), other.y());
}
}
float x() const
{
if(isReference) return soa[pos];
else return data[0];
}
float y() const
{
if(isReference) return soa[pos+pitch];
else return data[1];
}
float z() const
{
if(isReference) return soa[pos+2*pitch];
else return data[2];
}
Point& operator=(Point other)
{
if(isReference)
{
if(other.isReference)
{
for(int i=0;i<nArrays;i++) soa[pos + i*pitch] = other.soa[other.pos + i*other.pitch];
spdlog::info("rp {} ({},{}) = rp {} ({},{})", pos, x(), y(), other.pos, other.x(), other.y());
}
else
{
for(int i=0;i<nArrays;i++) soa[pos + i*pitch] = other.data[i];
spdlog::info("rp {} ({},{}) = vp ({},{})", pos, x(), y(), other.x(), other.y());
}
}
else
{
if(other.isReference)
{
for(int i=0;i<nArrays;i++) data[i] = other.soa[other.pos + i*other.pitch];
spdlog::info("vp ({},{}) = rp {} ({},{})", x(), y(), other.pos, other.x(), other.y());
}
else
{
for(int i=0;i<nArrays;i++) data[i] = other.data[i];
spdlog::info("vp ({},{}) = vp ({},{})", x(), y(), other.x(), other.y());
}
}
return *this;
}
};
The container
The role of the container is to store the actual SOA, to keep track of the number of elements size
, which can be smaller than its capacity (pitch
). The container also provides custom iterators begin()
and end()
.
class MyContainer
{
public:
std::vector<float> data;
unsigned pitch, size;
SOAIterator begin(){return SOAIterator(0, data.data(), pitch);}
SOAIterator end(){return SOAIterator(size, data.data(), pitch);}
void print_contents()
{
for(int i=0;i<size;i++) std::cout << "(" << data[i] << ',' << data[pitch+i] << ',' << data[2*pitch+i] << ") ";
std::cout << '\n';
}
};
The iterator
Custom iterators are notoriously full of boilerplate code. Here, we try to keep the functionality to the minimum. The iterator stores one copy of the Point
object, which has isReference
set to true
. This is the proxy object, which the iterator returns when dereferenced via reference operator*()
. The Point
already keeps track of how to access SOA, so moving the iterator means modifying the pos
index of the Point
. The full code of SOAIterator
is provided below.
class SOAIterator
{
public:
using iterator_category = std::random_access_iterator_tag;
using value_type = Point;
using difference_type = int;
using pointer = Point*;
using reference = Point&;
SOAIterator(unsigned pos, float *soa_data, unsigned pitch)
{
m_point.isReference = true;
m_point.pos = pos;
m_point.soa = soa_data;
m_point.pitch = pitch;
spdlog::info("SOAIterator() {}", pos);
}
SOAIterator(const SOAIterator& other)
{
m_point.isReference = true;
m_point.pos = other.m_point.pos;
m_point.soa = other.m_point.soa;
m_point.pitch = other.m_point.pitch;
spdlog::info("SOAIterator from other {}", other.m_point.pos);
}
SOAIterator& operator=(const SOAIterator& other)
{
m_point.isReference = other.m_point.isReference;
m_point.pos = other.m_point.pos;
m_point.soa = other.m_point.soa;
m_point.pitch = other.m_point.pitch;
return *this;
}
SOAIterator(){}
~SOAIterator(){}
bool operator<(const SOAIterator& t) const {return m_point.pos<t.m_point.pos;}
bool operator==(const SOAIterator& rawIterator)const{return (m_point.pos == rawIterator.m_point.pos);}
bool operator!=(const SOAIterator& rawIterator)const{return (m_point.pos != rawIterator.m_point.pos);}
SOAIterator& operator++()
{
spdlog::info("operator ++; {}", this->m_point.pos);
++m_point.pos;
return (*this);
}
SOAIterator& operator--()
{
spdlog::info("operator --; {}", this->m_point.pos);
--m_point.pos;
return (*this);
}
SOAIterator operator+(const difference_type& movement)
{
spdlog::info("operator +; {}", this->m_point.pos);
SOAIterator result = *this;
result.m_point.pos += movement;
return result;
}
SOAIterator operator-(const difference_type& movement)
{
spdlog::info("operator -; {}", this->m_point.pos);
SOAIterator result = *this;
result.m_point.pos -= movement;
return result;
}
difference_type operator-(const SOAIterator& rawIterator){return m_point.pos-rawIterator.m_point.pos;}
reference operator*()
{
spdlog::info("dereference operator on iterator {}", m_point.pos);
return m_point;
}
Point m_point;
};
Working example
To test the approach we fill the container of 30 floats with 10 x-values, 10 y-values, and 10 z-values:
[2024-03-31 19:16:43.205] [info] INITIAL DATA
(0,20.1,0) (1,19.1,1) (2,18.1,1) (3,17.1,0) (4,16.1,1) (5,15.1,1) (6,14.1,0) (7,13.1,1) (8,12.1,1) (9,11.1,0)
We invoke std::remove_if()
to get rid of the elements with zero z-values. As a result of this operation, the custom iterator now points to the end of the data, so we trim the size of the container mc
.
SOAIterator it_result = std::remove_if(mc.begin(), mc.end(), [](Point &p){return p.z()==0;});
mc.size = it_result.m_point.pos;
The printout now shows:
AFTER REMOVING ELEMENTS with z()==0
[2024-03-31 19:16:43.206] [info] it_result.pos 6
(1,19.1,1) (2,18.1,1) (4,16.1,1) (5,15.1,1) (7,13.1,1) (8,12.1,1)
Next, we sort the points by y-value:
std::sort(mc.begin(),mc.end(),[](Point &p1, Point &p2){return p1.y()<p2.y();});
The result becomes:
AFTER SORTING
(8,12.1,1) (7,13.1,1) (5,15.1,1) (4,16.1,1) (2,18.1,1) (1,19.1,1)
Code for the example:
#include <iostream>
#include <vector>
#include <algorithm>
#include <spdlog/spdlog.h>
using namespace std;
int main()
{
MyContainer mc;
mc.data.resize(30);
mc.size = mc.pitch = 10;
for(int i=0;i<mc.pitch;i++)
{
mc.data[i] = i;
mc.data[mc.pitch+i] = 20.1-i;
mc.data[2*mc.pitch+i] = i%3 == 0 ? 0 : 1;
}
spdlog::info("INITIAL DATA");
mc.print_contents();
SOAIterator it_result = std::remove_if(mc.begin(), mc.end(), [](Point &p){return p.z()==0;});
mc.size = it_result.m_point.pos;
spdlog::info("\n\n AFTER REMOVING ELEMENTS with z()==0");
spdlog::info("it_result.pos {}", it_result.m_point.pos);
mc.print_contents();
spdlog::info("\n\n BEGIN SORTING");
std::sort(mc.begin(),mc.end(),[](Point &p1, Point &p2){return p1.y()<p2.y();});
spdlog::info("\n\n AFTER SORTING");
mc.print_contents();
return 0;
}
Conclusion
The proposed approach was not thoroughly tested, e.g., it may be slow, but seems to work correctly. The container currently works on floating-point values with 3 arrays but may be rewritten as a template to hold any data. For actual use, the tracing spdlog
functions should be removed. My intent is to sort point-cloud data that comes from a GPU simulation. If anyone finds this code useful, please drop a line below. Any comments are appreciated.