Sorting and Removing Elements from the Structure of Arrays (SOA) in C++

Igor Gribanov

5.00/5 (1 vote)

2 Apr 2024CPOL5 min read

4.3K

C++ iterators and algorithms work well for containers, but can we sort the Structure of Arrays?

C++ comes with a collection of highly optimized algorithms for sorting, partitioning, and filtering arrays of objects. Typically, the data for each object is stored in a contiguous span of memory and can be treated as a single chunk - read, replaced, or moved. In some cases, however, the data is stored in an interleaved manner, i.e., a list of x-coordinates, then y-coordinates, then z-coordinates. The out-of-the-box iterators won't work in such a case. The article shows how to write a custom iterator alongside a memory-efficient wrapper class.

Introduction

Let's suppose that we store the coordinates for 10 points as a soa[30] array of floats. The first ten elements are x-coordinates, the following ten elements are y-coordinates, and the last ten elements are z-coordinates. Such a way of storing the data is often called the Structure of Arrays (SOA). We want to (1) remove elements whose z-coordinates are zero and sort the remaining data by the y-coordinate. Surely, we can rearrange the data so that the x,y,z values are stored in groups, for example:

C++

std::vector<std::array<float,3>> v;

The vector could then be processed by conventional means. Another solution is to make a separate index sequence of the form:

C++

std::vector<int> v(10);           // array of indices
std::iota(v.begin(), v.end(), 0); // writes a sequence 0,1,2,etc.

The original data could be accessed indirectly via the indexing array v. A result of the sorting operation would be a new sequence, which could be used to re-arrange the elements.

The issue in both approaches is the duplication of the original data, which is not a big deal when sorting 10 points. Unfortunately, the SOA format often shows up when working with GPU computing, where such a format is optimal for memory throughput. And when someone uses GPUs to process optimally stored data, they often have more than 10 points (possibly eight GPU devices holding ~80 GB each).

Since we are processing the data host-side (on a CPU), it would suffice to rearrange it while copying from a GPU to a host. Unfortunately, such an operation would be extremely inefficient, so we are back to the initial problem.

Obviously, sorting, partitioning, and other algorithms from the C++ library can be performed on SOA. Instead of moving objects as contiguous chunks, we need to move the individual pieces separately, e.g., x, y, z coordinates one by one. But we don't want to rewrite these functions. Things would be more convenient if one could reuse the existing C++ library.

Background

Custom iterators are used to define custom traversal behavior for user-defined data structures. We can create a custom iterator to traverse our soa[30] array, but how would this help? The array is the simplest data structure - we could just use pointers instead of custom iterators to traverse it.

The issue with invoking the standard algorithms is not traversal but reading and writing the elements. For example, the std::sort() function extensively uses std::iter_swap(iterator a, iterator b) to exchange the data. Iterators do not store the data - they can provide a reference to an object that could be read, copied, or overwritten. But how can we get a reference to a non-contiguous object like SOA? We can't.

A proxy object

A proxy object pretends to be a contiguous chunk of data while it is not. More precisely, sometimes, it holds a copy of the data, and sometimes it accesses the SOA to extract the data. When we only need to read the data, e.g., the z-coordinate of the 5th element, it suffices to access soa[5*pitch+2] to get it. However, functions like std::swap() also want to make copies of objects. When a copy constructor is invoked, the new object takes a full copy of the data as well. Thankfully, only a handful of these are created by functions such as std::sort() and std::remove_if().

Sample code

Proxy class

The code below is a proof of concept, not a "library." It includes printout functions spdlog::info() from the spdlog library to trace the execution step by step. The proxy object Point contains a boolean variable isReference, which keeps track of whether the data should be extracted from *soa at index pos, or the data comes from the local storage data[nArrays]. The number of arrays in SOA is nArrays and the total size of soa is nArrays*pitch, where pitch is the maximum number of elements that can be stored. The value of pitch does not change. The proxy class defines a regular constructor (isReference is set to true), copy constructor (isReference is set to false), a way to access data via x(), y(), z() functions. It overloads operator= so that it writes into *soa when isReference is set.

C++

struct Point
{
    constexpr static unsigned nArrays = 3;  // count of arrays in SOA
    bool isReference = false;
    unsigned pos, pitch;    // element # and capacity of each array in SOA
    float *soa;             // reference to SOA (assume contiguous space of size nArrays*pitch)

    float data[nArrays];    // local copy of the data when isReference==true

    Point()
    {
        isReference = false;
        spdlog::info("Point() constructor");
    }

    Point(const Point &other)
    {
        isReference = false;
        // local copy
        if(other.isReference)
        {
            for(int i=0;i<nArrays;i++) data[i] = other.soa[other.pos + i*other.pitch];
            spdlog::info("Point(other) constructor ({},{}); from ref {}; ({},{})",
                         x(),y(), other.pos, other.x(), other.y());
        }
        else
        {
            for(int i=0;i<nArrays;i++) data[i] = other.data[i];
            //            spdlog::info("vp {} ({},{}) = vp ({},{})", x(), y(), other.x(), other.y());
            spdlog::info("Point(other) constructor ({},{}); from val: ({},{})",
                         x(),y(), other.x(), other.y());
        }

    }

    // x(), y(), z() are for testing/tracing only
    float x() const
    {
        if(isReference) return soa[pos];
        else return data[0];
    }
    float y() const
    {
        if(isReference) return soa[pos+pitch];
        else return data[1];
    }
    float z() const
    {
        if(isReference) return soa[pos+2*pitch];
        else return data[2];
    }

    // user-defined copy assignment (copy-and-swap idiom)
    Point& operator=(Point other)
    {
        if(isReference)
        {
            // distribute into soa
            if(other.isReference)
            {
                for(int i=0;i<nArrays;i++) soa[pos + i*pitch] = other.soa[other.pos + i*other.pitch];
                spdlog::info("rp {} ({},{}) = rp {} ({},{})", pos, x(), y(), other.pos, other.x(), other.y());
            }
            else
            {
                for(int i=0;i<nArrays;i++) soa[pos + i*pitch] = other.data[i];
                spdlog::info("rp {} ({},{}) = vp ({},{})", pos, x(), y(), other.x(), other.y());
            }
        }
        else
        {
            // local copy
            if(other.isReference)
            {
                for(int i=0;i<nArrays;i++) data[i] = other.soa[other.pos + i*other.pitch];
                spdlog::info("vp ({},{}) = rp {} ({},{})", x(), y(), other.pos, other.x(), other.y());
            }
            else
            {
                for(int i=0;i<nArrays;i++) data[i] = other.data[i];
                spdlog::info("vp ({},{}) = vp ({},{})", x(), y(), other.x(), other.y());
            }
        }
        return *this;
    }
};

The container

The role of the container is to store the actual SOA, to keep track of the number of elements size, which can be smaller than its capacity (pitch). The container also provides custom iterators begin() and end().

C++

class MyContainer
{
public:
    std::vector<float> data;
    unsigned pitch, size;

    SOAIterator begin(){return SOAIterator(0, data.data(), pitch);}
    SOAIterator end(){return SOAIterator(size, data.data(), pitch);}

    void print_contents()
    {
        for(int i=0;i<size;i++) std::cout << "(" << data[i] << ',' << data[pitch+i] << ',' << data[2*pitch+i] << ") ";
        std::cout << '\n';
    }
};

The iterator

Custom iterators are notoriously full of boilerplate code. Here, we try to keep the functionality to the minimum. The iterator stores one copy of the Point object, which has isReference set to true. This is the proxy object, which the iterator returns when dereferenced via reference operator*(). The Point already keeps track of how to access SOA, so moving the iterator means modifying the pos index of the Point. The full code of SOAIterator is provided below.

class SOAIterator
{
public:

    using iterator_category = std::random_access_iterator_tag;
    using value_type = Point;
    using difference_type = int;
    using pointer = Point*;
    using reference = Point&;

    SOAIterator(unsigned pos, float *soa_data, unsigned pitch)
    {
        m_point.isReference = true;
        m_point.pos = pos;
        m_point.soa = soa_data;
        m_point.pitch = pitch;
        spdlog::info("SOAIterator() {}", pos);
    }

    SOAIterator(const SOAIterator& other)
    {
        m_point.isReference = true;
        m_point.pos = other.m_point.pos;
        m_point.soa = other.m_point.soa;
        m_point.pitch = other.m_point.pitch;
        spdlog::info("SOAIterator from other {}", other.m_point.pos);
    }

    
    SOAIterator& operator=(const SOAIterator& other)
    {
        m_point.isReference = other.m_point.isReference;
        m_point.pos = other.m_point.pos;
        m_point.soa = other.m_point.soa;
        m_point.pitch = other.m_point.pitch;
        return *this;
    }

    SOAIterator(){} 
    ~SOAIterator(){} 

    bool operator<(const SOAIterator& t) const {return m_point.pos<t.m_point.pos;}
    bool operator==(const SOAIterator& rawIterator)const{return (m_point.pos == rawIterator.m_point.pos);}
    bool operator!=(const SOAIterator& rawIterator)const{return (m_point.pos != rawIterator.m_point.pos);}

    SOAIterator&                  operator++()
    {
        spdlog::info("operator ++; {}", this->m_point.pos);
        ++m_point.pos;
        return (*this);
    }
    SOAIterator&                  operator--()
    {
        spdlog::info("operator --; {}", this->m_point.pos);
        --m_point.pos;
        return (*this);
    }

    SOAIterator                   operator+(const difference_type& movement)
    {
        spdlog::info("operator +; {}", this->m_point.pos);
        SOAIterator result = *this;
        result.m_point.pos += movement;
        return result;
    }

    SOAIterator                   operator-(const difference_type& movement)
    {
        spdlog::info("operator -; {}", this->m_point.pos);
        SOAIterator result = *this;
        result.m_point.pos -= movement;
        return result;
    }

    difference_type operator-(const SOAIterator& rawIterator){return m_point.pos-rawIterator.m_point.pos;}

    reference operator*()
    {
        spdlog::info("dereference operator on iterator {}", m_point.pos);
        return m_point;
    }

    Point m_point;
};

Working example

To test the approach we fill the container of 30 floats with 10 x-values, 10 y-values, and 10 z-values:

C++

[2024-03-31 19:16:43.205] [info] INITIAL DATA
(0,20.1,0) (1,19.1,1) (2,18.1,1) (3,17.1,0) (4,16.1,1) (5,15.1,1) (6,14.1,0) (7,13.1,1) (8,12.1,1) (9,11.1,0)

We invoke std::remove_if() to get rid of the elements with zero z-values. As a result of this operation, the custom iterator now points to the end of the data, so we trim the size of the container mc.

C++

SOAIterator it_result = std::remove_if(mc.begin(), mc.end(), [](Point &p){return p.z()==0;});
mc.size = it_result.m_point.pos;

The printout now shows:

C++

 AFTER REMOVING ELEMENTS with z()==0
[2024-03-31 19:16:43.206] [info] it_result.pos 6
(1,19.1,1) (2,18.1,1) (4,16.1,1) (5,15.1,1) (7,13.1,1) (8,12.1,1)

Next, we sort the points by y-value:

C++

std::sort(mc.begin(),mc.end(),[](Point &p1, Point &p2){return p1.y()<p2.y();});

The result becomes:

C++

 AFTER SORTING
(8,12.1,1) (7,13.1,1) (5,15.1,1) (4,16.1,1) (2,18.1,1) (1,19.1,1)

Code for the example:

C++

#include <iostream>
#include <vector>
#include <algorithm>

#include <spdlog/spdlog.h>

using namespace std;

int main()
{
    MyContainer mc;
    mc.data.resize(30);
    mc.size = mc.pitch = 10;
    for(int i=0;i<mc.pitch;i++)
    {
        mc.data[i] = i;
        mc.data[mc.pitch+i] = 20.1-i;
        mc.data[2*mc.pitch+i] = i%3 == 0 ? 0 : 1;
    }
    spdlog::info("INITIAL DATA");
    mc.print_contents();

    SOAIterator it_result = std::remove_if(mc.begin(), mc.end(), [](Point &p){return p.z()==0;});
    mc.size = it_result.m_point.pos;
    spdlog::info("\n\n AFTER REMOVING ELEMENTS with z()==0");
    spdlog::info("it_result.pos {}", it_result.m_point.pos);
    mc.print_contents();

    spdlog::info("\n\n BEGIN SORTING");
    std::sort(mc.begin(),mc.end(),[](Point &p1, Point &p2){return p1.y()<p2.y();});
    spdlog::info("\n\n AFTER SORTING");
    mc.print_contents();
    return 0;
}

Conclusion

The proposed approach was not thoroughly tested, e.g., it may be slow, but seems to work correctly. The container currently works on floating-point values with 3 arrays but may be rewritten as a template to hold any data. For actual use, the tracing spdlog functions should be removed. My intent is to sort point-cloud data that comes from a GPU simulation. If anyone finds this code useful, please drop a line below. Any comments are appreciated.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)