cpplinq - LINQ Query Operators for C++ Sequences

Marius Bancila

4.96/5 (37 votes)

3 Nov 2012CPOL14 min read

64.7K

An introduction to cpplinq, a open-source template library that provides LINQ-like operators for querying collections (arrays and STL containers) in C++11.

Introduction

cpplinq is an open-source native template library that provides a set of LINQ-like operators for querying collections (arrays and STL containers) in C++11. The API consists of a set of template classes and functions and is compatible with all major compilers (VC++, mingw, g++, clang++). The library is available on Codeplex.

Here are some quick samples, before we start with some considerations about LINQ.

Sum the even numbers from an array of integers:

C++

int numbers[] = {1,2,3,4,5,6,7,8,9};
auto s = from_array(numbers)
      >> where([](int i) {return i%2==0;})
      >> sum(); // yields 20

Order descending all the distinct numbers from an array of integers, transform them into strings and print the result.

C++

int numbers[] = {3,1,4,1,5,9,2,6};
auto result = from_array(numbers)
           >> distinct()
           >> orderby_descending([](int i) {return i;})
           >> select([](int i){std::stringstream s; s<<i; return s.str();})
           >> to_vector();
for(auto i : result)
    std::cout << i << std::endl;

A Few Words About LINQ

Developers familiar with .NET LINQ and its query operators should skip this paragraph.

LINQ (Language Integrated Query) is a component of the .NET Framework that adds native query capabilities to .NET. It is basically a gateway between a language environment and a data store. LINQ defines a set of standard query operators for traversal, filtering and projection of data. In addition to the standard query operators, the C# and VB.NET languages also provide language extensions to map on these query operators and enable developers to write code in a more declarative, SQL-like manner.

The advantages of LINQ include:

Write code in a declarative style
Use familiar SQL-like syntax to query not just SQL databases, but virtually any source of data
Correctness of the query is determined at compile-time, not run time
Productivity and easier maintainability as a result of simplified and easier to write (and read) code

If you want to read more about LINQ and the .NET standard query operators, see these articles:

LINQ in C++

LINQ was made possible after .NET added support for lambdas. C++11 however, followed the path and added support for lambdas and other features that would enable us to write something similar in C++. Though not all C++11 features are yet implemented by the C++ compilers, all compilers do currently support lambdas, auto, decltype and other key features.

There have already been attempts to implement LINQ-like query operators in C++. One of these attempts is the boolinq. It's a template library that allows writing queries like this (is_prime is a lambda that takes an int and returns true if the number is prime, or false otherwise):

C++

vector<int> some_primes(int src[], int howMany)
{
    return  from(src)
           .where(is_prime)
           .take(howMany)
           .toVector();
}

The library has an important drawback: it's based on the operator ".", which is not overloadable. Therefore, any extensions to the library are not possible without changing the existing library code.

On the other hand, Microsoft is working on a native RX library (see Reactive Extensions), that would provide query capabilities for C++. The library is not yet available, but they have delivered a presentation on Channel9 about it. The trivial example above would look extremely similar in native RX:

C++

vector<int> some_primes(int src[], int howMany)
{
    return  from(src)
           .where(is_prime)
           .take(howMany);
           .to_vector();
}

Therefore, native RX will probably suffer from the same problem when it comes to extensions (though it's too early to know how the library will finally look).

On the other hand, boost also contains a library for working with ranges, called range. boost::range provides the means to enable generic algorithms to work with standard-like containers, null terminated strings, pairs of iterators, raw arrays and others. Some of the algorithms provided by the library are similar to the .NET standard query operators, but the library features more than just that. Though the library is powerful and can be used with a variety of containers, it's not small. I also find it, like boost libraries in general, deficient when it comes to simplicity and readability. And if you’re already familiar to .NET and would like to write similar code in C++, boost::range is not the best choice, in my opinion.

Enter cpplinq

cpplinq takes another approach and uses operator >> to pipe query operators, which makes it open for extensions without having to change existing implementation. The sample shown earlier with the prime numbers would look like this:

C++

vector<int> some_primes(int src[], int howMany)
{
    return    from_array(src)
           >> where(is_prime)
           >> take(howMany);
           >> to_vector();
}

The library requires C++11 support (for lambda, auto, decltype, static_assert and others) and is tested with all major compilers: VC++ 10 (from Visual Studio 2010), VC++ 11 (from Visual Studio 2012) and mingw (v4.7.0) on Windows, g++ (v4.7.1) and clang++ (v3.1) on Linux.

The library:

consists of a series of template classes and functions
all query operators are defined under namespace cpplinq
is available in a single header (cpplinq.hpp)

Supported Query Operators

cpplinq supports most of the .NET standard query operators, and more operators will probably be implemented in the future. The following table shows a comparison with the .NET standard query operators and provides a short description of each implemented operator.

Restriction operators
.NET Operators	cpplinq support	cpplinq name	Description
`Where`	yes	`where`	Filters a sequence based on a predicate
Project Operators
`Select`	yes	`select`	Performs a projection over a sequence
`SelectMany`	yes	`select_many`	Performs a one-to-many element projection over a sequence
Partitioning Operators
`Take`	yes	`take`	Creates a new sequence from a given sequence, keeping the first given number of elements and skipping the remainder of the sequence
`TakeWhile`	yes	`take_while`	Creates a new sequence from a given sequence, keeping elements as long as a predicate applied to the element holds true and skipping the remainder of the sequence
`Skip`	yes	`skip`	Creates a new sequence from a given sequence, skipping a given number of elements and taking the rest of the sequence
`SkipWhile`	yes	`skip_while`	Creates a new sequence from a given sequence, skipping elements as long as a predicate applied to each element holds true and taking the rest of the sequence
Join Operators
`Join`	yes	`join`	Performs an inner join of two sequences based on matching keys extracted from the elements
GroupJoin	no
Concatenation Operators
`Concat`	yes	`concat`	Concatenates two sequences
Ordering Operators
`OrderBy`	yes	`orderby`	Orders a sequence by a key in an ascending or descending manner
`OrderByDescending`	yes	`orderby_descending`	Same as `orderby` except that the ordering is descending
		`ordery_ascending`	Same as `orderby` except that the ordering is ascending
`ThenBy`	yes	`thenby`	Establishes a secondary ordering of a sequence by a key, in an ascending or descending manner
`ThenByDescending`	yes	`thenby_descending`	Same as `thenby` except that the ordering is descending
		`thenby_ascending`	Same as `thenby` except that the ordering is ascending
`Reverse`	yes	`reverse`	Reverses the elements of a sequence
Grouping Operators
`GroupBy`	no
Set Operators
`Distinct`	yes	`distinct`	Eliminates duplicate elements from a sequence
`Union`	yes	`union_with`	Produces the set union of two sequences
`Intersect`	yes	`intersect_with`	Produces the set intersection of two sequences
`Except`	yes	`except`	Produces the set difference between two sequences
Conversion Operators
`AsEnumerable`	no
`ToArray`	no
`ToList`	yes	`to_list`	Creates a `std::list<TValue>` from a range, where `TValue` is the type of the elements of the range
		`to_vector`	Creates a `std::vector<TValue>` from a range, where `TValue` is the type of the elements of the range
`ToDictionary`	yes	`to_map`	Creates a `std::map<TKey, TValue>` from a range. It takes a predicate that selects the value to use a key for each element of the range
`ToLookup`	yes	`to_lookup`	Creates a `cpplinq::lookup<TKey, TElement>` from a sequence
`OfType`	no
`Cast`	no
Equality Operators
`SequenceEqual`	yes	`sequence_equal`	Checks whether two sequences are equal
Element Operators
`First`	no
`FirstOrDefault`	yes	`first_or_default`	Returns the first element of a sequence, or a default value if no element is found
`Last`	no
`LastOrDefault`	yes	`last_or_default`	Returns the last element of a sequence, or a default value if no element is found
`Single`	no
`SingleOrDefault`	no
`ElementAt`	no
`ElementAtOrDefault`	yes	`element_at_or_default`	Returns the element at a given index in a sequence, or a default value if the index is out of range (or the sequence is empty)
`DefaultIfEmpty`	no
Generation Operators
`Range`	yes	`range`	Generates a range of integral numbers
`Repeat`	yes	`repeat`	Generates a range by repeating a value a given number of times
`Empty`	yes	`empty`	Returns an empty range of a given type
Qantifiers
`Any`	yes	`any`	Checks whether any element of a sequence satisfies a condition
`All`	yes	`all`	Checks whether all elements of a sequence satisfy a condition
`Contains`	yes	`contains`	Checks whether a sequence contains a given element
Aggregate Operators
`Count`	yes	`count`	Counts the number of elements in a sequence
`LongCount`
`Sum`	yes	`sum`	Computes the sum of a sequence of numeric values
`Min`	yes	`min`	Finds the minimum of a range of numeric values
`Max`	yes	`max`	Finds the maximum of a range of numeric values
`Average`	yes	`avg`	Computes the average of a range of numeric values
`Aggregate`	yes	`aggregate`	Applies a function over a sequence, accumulating the result
Other operators
		`for_each`	Applies a function on each element of a sequence
		`concatenate`	Concatenates the elements of a sequence of strings with a specified separator

Samples

The following sample shows some of the capabilities of the library in querying sequences.

The first examples displays a list of prime numbers, separater by a comma and a space. First, it generates a range from 0 to INT_MAX, then it filters the prime numbers from this range, takes only the first count elements and projects the result as a string. Then, the elements of the resulted sequence are concatenated in a string using ", " as delimiter.

C++

using namespace cpplinq;

auto is_prime = [](int i) -> bool 
{
    if (i < 2) { return false; }
    else if (i == 2) { return true; }
    else
    {
        auto r = (int)std::ceil (std::sqrt (i));
        for (auto iter = 2; iter <= r; ++iter)
        {
            if (i % iter == 0) { return false; }
        }
        return true;
    }
};

void show_some_primes(int count)
{
    auto primes = range(0, INT_MAX)   // generate range
               >> where(is_prime)     // filter elements and keep only primes
               >> take(count)         // take first count elements
               >> select(             // project the numbers as strings
                         [](int i) {std::stringstream s; s << i; return s.str();});

    auto result = primes              // source sequence
               >> concatenate(", ");  // fold the sequence into a string

    std::cout << result << std::endl;
}

int main()
{
    show_some_primes(10); // prints 2, 3, 5, 7, 11, 13, 17, 19, 23, 29
    return 0;
}

It is important to note that all the above operators, except for concatenate(), do not return a sequence, but rather an object representing a "description" of a sequence, which enables lazy evaluation. The range is only generated when we start iterating over it, and a new element is only generated when the next iteration is performed. This behavior however varies with some operators, such as the set operators. concatenate() is the operator that iterates over the sequence and folds it into a string.

The next sample shows how to compute the set union and intersection of an array and a vector of integers.

C++

int set1[] = {1,3,5,7,9,7,5,3,1};
std::vector<int> set2; 
set2.push_back(1);
set2.push_back(2);
set2.push_back(3);
set2.push_back(4);

auto u = from_array(set1)
      >> union_with(from(set2))
      >> to_vector(); // yields {1,2,3,4,5,7,9}
      
auto i = from_array(set1)
      >> intersect_with(from(set2))
      >> to_vector(); // yields {1,3}

The next sample shows how to retrieve the first and last elements of a sequence, or a default value if no such element exists. Notice that these operators have an overload that takes a predicate, so that you can return the first or last element that satisfies a condition.

C++

struct customer
{
    std::size_t id;
    std::string first_name;
    std::string last_name;

    customer (std::size_t id = 0, std::string first_name = "", std::string last_name = "")
        :id(std::move(id)), first_name(std::move(first_name)), last_name(std::move(last_name))
    {
    }
};

customer customers[] = {
    customer (1 , "Bill"    , "Gates"   ),
    customer (2 , "Steve"   , "Jobs"    ),
    customer (4 , "Linus"   , "Torvalds"),
    customer (11, "Steve"   , "Ballmer" )
};

auto f = from_array(customers) >> first_or_default();
std::cout << f.last_name << std::endl; // prints Gates

auto l = from_array(customers) >> last_or_default();
std::cout << l.last_name << std::endl; // prints Ballmer

In the next sample, we join a sequence of customers with a sequence of customer addresses to determine who lives where. The query produces a left join between an array of customers and an array of addresses. The customer class is the one introduced in the previous sample, while the customer_address class is shown below:

C++

struct customer_address
{
    std::size_t id;
    std::size_t customer_id;
    std::string country;

    customer_address(std::size_t id, std::size_t customer_id, std::string country)
        : id(std::move (id)), 
        customer_id (std::move(customer_id)), country(std::move(country))
    {
    }
};

customer customers[] = {
    customer (1 , "Bill"    , "Gates"   ),
    customer (2 , "Steve"   , "Jobs"    ),
    customer (4 , "Linus"   , "Torvalds"),
    customer (11, "Steve"   , "Ballmer" )
};

customer_address customer_addresses[] =
{
    customer_address (1, 1, "USA"       ),
    customer_address (2, 4, "Finland"   ),
    customer_address (3, 11, "USA"      ),
};

auto result = from_array(customers)                         
           >> join (
                    from_array(customer_addresses),
                    [](customer const & c) {return c.id;},
                    [](customer_address const & ca) {return ca.customer_id;},
                    [](customer const & c, customer_address const & ca) 
                                          {return std::make_pair (c, ca);});

result >> for_each([](std::pair<customer, customer_address> const & p) {
    std::cout << p.first.last_name << " lives in " << p.second.country << std::endl;
});

The result that the for_each operator prints is:

C++

Gates lives in USA
Torvalds lives in Finland
Ballmer lives in USA

Let's suppose we also have some orders (with order head and order lines) and we want to print the last order of a particular customer. Suppose the order head and lines look like this:

C++

struct order 
{
    std::size_t id;
    std::size_t customer_id;
    time_t      date;

    order(std::size_t id, std::size_t cid, time_t date):
        id(id), customer_id(cid), date(date)
    {
    }
};

struct order_line 
{
    std::size_t id;
    std::size_t order_id;
    std::size_t article_id;
    double      quantity;

    order_line(std::size_t id, std::size_t oid, std::size_t aid, double quantity):
        id(id), order_id(oid), article_id(aid), quantity(quantity)
    {
    }
};

and we have the orders and the lines in these arrays (ignore the dates and suppose they hold real values):

C++

order orders[] = 
{
    order(1, 1, time(NULL)),
    order(2, 2, time(NULL)),
    order(3, 1, time(NULL)),
};

order_line orderlines [] = 
{
    order_line(1, 1, 100, 1.0),
    order_line(2, 1, 101, 5.0),
    order_line(3, 1, 102, 2.0),
    order_line(4, 2, 400, 1.0),
    order_line(5, 3, 401, 1.0),
};

A function that prints the last order from a particular customer could look like this:

C++

void print_last_order_for_customer(std::size_t custid)
{
    auto result = from_array(orders)
               >> where([=](order const & o) {return o.customer_id == custid;})
               >> orderby_descending([](order const & o) {return o.date;})
               >> take(1)
               >> join(from_array(orderlines),
                        [](order const & o) {return o.id;},
                        [](order_line const & ol) {return ol.order_id;},
                        [](order const & o, order_line const & ol) 
                          {return std::make_pair(o, ol);});

    result >> for_each([](std::pair<order, order_line> const & p){
        std::cout << "order=" << p.first.id << ", " << ctime(&(p.first.date)) 
                  << "  article=" << p.second.article_id << ", 
                     quantity=" << p.second.quantity << std::endl; 
    });
}

Calling this function for customer 1 would print the following text:

C++

order=3, Wed Oct 24 19:26:18 2012
  article=401, quantity=1

Range Generators and Conversion Operators

The most important and most used operators are the range generators (used to build an object that represents the range on which query operators are applied) and range conversion operators (used to fold a range into a container that holds the values of the range). The library provides the following range generators:

Name	Description	Sample
`from_iterators`	constructs a range from a pair of iterators	C++ std::vector<int> numbers; auto result = from_iterators(numbers.begin(), numbers.end()); result >> for_each([](int i) {std::cout << i << std::endl;});
`from`	constructs a range from an STL-like container that provides `begin()` and `end()` methods (representing the first and past-the-end elements). This is basically a wrapper on `from_iterators` operator.	C++ std::vector<int> numbers; auto result = from(numbers); result >> for_each([](int i) {std::cout << i << std::endl;});
`from_array`	constructs a range from an array	C++ int numbers[] = {1,2,3,4,5}; auto result = from_array(numbers); result >> for_each([](int i) {std::cout << i << std::endl;});
`range`	generates a range of integral, consecutive numbers, starting with an initial seed and having a specified number of elements	C++ // create a range of numbers in the interval [10, 100) auto result = range(10, 90);
`repeat`	generates a range by repeating a value a given number of times	C++ // create a range with 10 strings with the value "cpplinq" auto result = repeat("cpplinq", 10);
`empty`	returns an empty range of a given type	C++ // create an empty range of customers auto result = empty<customer>();

On the other hand, the following range conversion operators, that fold a range into an STL or STL-like container, are available:

Name	Description	Sample
`to_vector`	creates a `std::vector<TValue>` from a range, where `TValue` is the type of the elements of the range	C++ // create a vector with ten integer elements, from [1, 10] auto result = range(1, 10) >> to_vector();
`to_list`	creates a `std::list<TValue>` from a range, where `TValue` is the type of the elements of the range.	C++ // create a vector with ten integer elements, from [1, 10] auto result = range(1, 10) >> to_list();
`to_map`	creates a `std::map<TKey, TValue>` from a range. It takes a predicate that selects the value to use as the key for each element of the range. It implements a one-to-one dictionary that maps keys to single values.	C++ // create a map where key is the customer ID, // and the value is the customer object auto result = from_array (customers) >> to_map ([](customer const & c){return c.id;});
`to_lookup`	creates a `cpplinq::lookup<TKey, TElement>` from a sequence. It implements a one-to-many dictionary that maps keys to sequences of values	C++ customer_address customer_addresses[] = { customer_address (2, 4, "Finland" ), customer_address (3, 4, "USA" ), customer_address (1, 1, "USA" ), }; auto lookup = from_array (customer_addresses) >> to_lookup ([] (customer_address const & ca) {return ca.customer_id;}); auto countries = lookup[4] >> select([](customer_address const & ca) {return ca.country;}) >> to_vector(); // yields {"Finland", "USA"}

Extending the Library

Because the cpplinq library is built on operator>>, it is possible to extend it with new query operators, without making changes to the existing code. The library consists of range builders, range operators and helper methods.

A range builder is a class that folds the range (sequence) into a value. It could be a single value, for instance, the sum of the elements of a sequence, or a container of values, like a vector, containing the elements of a sequence.

A range operator is a class that transforms a range into another range. It could be seen as a range builder that returns another range, instead of a single (folded value). For instance, a select range can transform a sequence of integers into a sequence of strings, and a where range can filter a range of integers to keep only even numbers. Every range operator has a companion range builder whose purpose is to produce a new range.

In addition, for each query operator, we define a helper, stand-alone function that creates and returns an instance of a builder class.

Building a Range Aggregator

A range aggregator has several requirements:

must inherit base_builder
must be copyable
must be moveable
has a member type called this_type representing the type of the builder class
has a method called build that accepts a range and returns the aggregated value

Below is the code for the sum range aggregator.

C++

struct sum_builder : base_builder
{
    typedef                 sum_builder                     this_type;

    CPPLINQ_INLINEMETHOD sum_builder () throw ()
    {
    }

    CPPLINQ_INLINEMETHOD sum_builder (sum_builder const & v) throw ()
    {
    }

    CPPLINQ_INLINEMETHOD sum_builder (sum_builder && v) throw ()
    {
    }

    template<typename TRange>
    CPPLINQ_INLINEMETHOD typename TRange::value_type build (TRange range)
    {
        auto sum = typename TRange::value_type ();
        while (range.next ())
        {
            sum += range.front ();
        }
        return std::move (sum);
    }

};

The build method does the work by traversing the input range and summing the elements.

In addition, we define a helper method that creates and returns an instance of the builder.

C++

template<typename TSelector>
CPPLINQ_INLINEMETHOD detail::sum_selector_builder<TSelector>  sum(TSelector selector) throw()
{
    return detail::sum_selector_builder<TSelector> (std::move (selector));
}

Having the helper method defined like that, we can use it to sum the elements of a sequence.

C++

int numbers [] = {1,2,3,4};
auto s = from_array(numbers) >> sum(); // yields 10

Building a Range Operator

A range operator has several requirements, some identical to a range builder:

must inherit base_range
must be copyable
must be moveable
has a member type called this_type which is the type of the range class
has a member type called value_type which is the type of values in the range
has a member type called return_type which is value_type const & or value_type
has an enum member return_reference which is either 0 or 1, depending on whether return_type is a reference type or not
has a member method called front which returns the current value as a return_type
has a member method called next which moves the next value if available and returns true, or returns false if no other value is available
has a member method called operator>> that accepts a range builder and returns a new range

The front method:

If called on a range where next has not be been called or the previous next returned false, the behavior is undefined

The next method:

If called on a range where the previous next returned false, next should do nothing and return false
A range where next never has been called should be seen as "pointing" to no-value (as opposed to begin() which "points" to the first value in a collection)

Note: There is no method to reset to range to the first position again. Once it's traversed the race is over. The rationale for that is to support range sources that doesn't support resets. In practice, for in-memory collections, resets are possible but as long as the range source supports normal copy-semantics resets are achievable using that.

The following is the code for the where_range operator.

C++

template<typename TRange, typename TPredicate>

struct where_range : base_range
{
    typedef                 where_range<TRange, TPredicate> this_type       ;
    typedef                 TRange                          range_type      ;
    typedef                 TPredicate                      predicate_type  ;

    typedef                 typename TRange::value_type     value_type      ;
    typedef                 typename TRange::return_type    return_type     ;
    enum    
    { 
        returns_reference   = TRange::returns_reference   , 
    };

    range_type              range       ;
    predicate_type          predicate   ;

    CPPLINQ_INLINEMETHOD where_range (
            range_type      range
        ,   predicate_type  predicate
        ) throw ()
        :   range       (std::move (range))
        ,   predicate   (std::move (predicate))
    {
    }

    CPPLINQ_INLINEMETHOD where_range (where_range const & v)
        :   range       (v.range)
        ,   predicate   (v.predicate)
    {
    }

    CPPLINQ_INLINEMETHOD where_range (where_range && v) throw ()
        :   range       (std::move (v.range))
        ,   predicate   (std::move (v.predicate))
    {
    }

    template<typename TRangeBuilder<
    CPPLINQ_INLINEMETHOD typename get_builtup_type<TRangeBuilder, 
            this_type<::type operator>>(TRangeBuilder range_builder) throw ()   
    {
        return range_builder.build (*this);
    }

    CPPLINQ_INLINEMETHOD return_type front () const 
    {
        return range.front ();
    }

    CPPLINQ_INLINEMETHOD bool next ()
    {
        while (range.next ())
        {
            if (predicate (range.front ()))
            {
                return true;
            }
        }

        return false;
    }
};

In addition, we define a where_range builder, whose purpose is to build a where_range operator.

C++

template<typename TPredicate>
struct where_builder : base_builder
{
    typedef                 where_builder<TPredicate>   this_type       ;
    typedef                 TPredicate                  predicate_type  ;

    predicate_type          predicate   ;

    CPPLINQ_INLINEMETHOD explicit where_builder (predicate_type predicate) throw ()
        :   predicate (std::move (predicate))
    {
    }

    CPPLINQ_INLINEMETHOD where_builder (where_builder const & v)
        :   predicate (v.predicate)
    {
    }

    CPPLINQ_INLINEMETHOD where_builder (where_builder && v) throw ()
        :   predicate (std::move (v.predicate))
    {
    }

    template<typename TRange>
    CPPLINQ_INLINEMETHOD where_range<TRange, TPredicate> build (TRange range) const throw ()
    {
        return where_range<TRange, TPredicate>(std::move (range), predicate);
    }

};

Last, we also add a helper method that instantiates and returns a where_range builder.

C++

template<typename TPredicate>
CPPLINQ_INLINEMETHOD detail::where_builder<TPredicate> where(TPredicate predicate) throw()
{
    return detail::where_builder<TPredicate> (std::move (predicate));
}

Having the helper method defined, we can put it to use.

C++

int numbers [] = {1,2,3,4,5,6,7,8,9};
auto result = from_array(numbers)
           >> where([](int i) {return i%2==0;})
           >> to_vector(); // yields {2,4,6,8}

Conclusion

The cpplinq template library is an attempt to provide .NET-like query operators for sequences of objects in C++11. The library implements most of the standard .NET query operators and can be used with all major compilers. The library is built around operator>> which makes it easily extensible without needing to change existing library code. You can download the library (and the unit tests) from the library's home page at codeplex.

History

4^th November, 2012: Initial version

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)