Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / C++20

C++ GroupBy

5.00/5 (9 votes)
17 Mar 2023CPOL1 min read 8.4K  
A small and simple C++ GroupBy with the same syntax and use of DotNet GroupBy
C++ GroupBy

Introduction

I think C++ is the most beautiful programming language but some functionality of other languages is useful. The list' extension of the DotNet GroupBy is one of them. This is my C++ simple implementation.

Requirements

  • C++ 20 or higher

Function

The idea is to write a function that executes an input function on every item of a generic input container range and extracts a key that will be used to split the input container in a key-sublist values.

With a little bit of metaprogramming:

C++
    // Support struct
    template<class KeyFunc, typename It >
    struct ___st_GroupBy___
    {
        typedef typename std::invoke_result<KeyFunc,It>::type Key_Type;
    };

    // Main Function
   template<class KeyFunc, typename It >
   std::unordered_map<typename ___st_GroupBy___<KeyFunc, It>::Key_Type, std::vector<It>> GroupBy(KeyFunc&& Func, It _begin, It _end )
   {
        std::unordered_map<typename ___st_GroupBy___<KeyFunc, It>::Key_Type, std::vector<It>> map;
        for(It Curr = _begin; Curr != _end; Curr++ )
            map[Func(Curr)].push_back(Curr);
        return map;
   }

The function uses an auxiliary internal struct to determine the Key type returned by the input function.

The returned type is a map has as key, the returned key value from input function call, and as value, a vector of iterator of the original input container. This solution is a little bit more complex as syntax to use but avoid copy of item inside sublist output. The disadvantage is that if I modify the original input container, the iterators inside the returned map can be invalid and you can fall in access violation exception.

Another version that is less optimized but safer can be the following:

C++
template<class KeyFunc, typename It >
std::unordered_map<typename ___st_GroupBy___<KeyFunc, It>::Key_Type, std::vector<typename It::value_type>> GroupByCpy(KeyFunc&& Func, It _begin, It _end )
{
    std::unordered_map<typename ___st_GroupBy___<KeyFunc, It>::Key_Type, std::vector<typename It::value_type>> map;
    for(It Curr = _begin; Curr != _end; Curr++ )
        map[Func(Curr)].push_back(*Curr);
    return map;
}

This function creates a copy of the elements of the input container and it requires that the copy operator is been defined in the type of the items of the input container.

Usage

C++
#include <string>
#include <vector>
using std::string;
struct TestData
{
   string Key;
   int Data1;
   int Data2;
};

vector<TestData> InputData{{"pippo",1,2},{"pippo",2,3},{"pippo",3,4},{"pluto",1,0},{"paperino",2,3},{"paperino",3,4}};

auto result = GroupBy( [&](decltype(InputData.begin()) Item)->string
              { return Item->Key; }, InputData.begin(), InputData.end());

for (auto KeyValuesPair : result)
{
    string str_Key = KeyValuesPair.first;
    vector<vector<TestData>::iterator> &vec_Vector = KeyValuesPair.second;

    std::cout << "Key : " << str_Key << std::endl;
    for (const auto &it_Value : vec_Vector)
        std::cout << "\tData1 : " << it_Value->Data1 << " - " << "\tData2 : " << it_Value->Data2 << std::endl;
}

/*
The returned type in this exemple is 
std::unordered_map<std::string,std::vector<vector<TestData>::iterator>>

output: 

Key : paperino
    Data1 : 2 -     Data2 : 3
    Data1 : 3 -     Data2 : 4
Key : pippo
    Data1 : 1 -     Data2 : 2
    Data1 : 2 -     Data2 : 3
    Data1 : 3 -     Data2 : 4
Key : pluto
    Data1 : 1 -     Data2 : 0

*/

std::unordered_map<string, vector<TestData>> result2 = GroupByCpy( [&](decltype(InputData.begin()) Item)->string{ return Item->Key; }, InputData.begin(), InputData.end());

for (auto KeyValuesPair : result2)
{
    string str_Key = KeyValuesPair.first;
    vector<TestData> &vec_Vector = KeyValuesPair.second;

    std::cout << "Key : " << str_Key << std::endl;
    for (const auto &obj_Value : vec_Vector)
        std::cout << "\tData1 : " << obj_Value.Data1 << " - " << "\tData2 : " << obj_Value.Data2 << std::endl;
}
/*
The returned type in this example is
std::unordered_map<std::string,std::vector<TestData>>

output:

Key : paperino
    Data1 : 2 -     Data2 : 3
    Data1 : 3 -     Data2 : 4
Key : pippo
    Data1 : 1 -     Data2 : 2
    Data1 : 2 -     Data2 : 3
    Data1 : 3 -     Data2 : 4
Key : pluto
    Data1 : 1 -     Data2 : 0

*/

Another variation can be using std::map instead of unordered_map in order to have sorting by key.

History

  • 17th March, 2023: Initial version

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)