Introduction
cpplinq is an open-source native template library that provides a set of LINQ-like operators for querying collections (arrays and STL containers) in C++11. The API consists of a set of template classes and functions and is compatible with all major compilers (VC++, mingw, g++, clang++). The library is available on Codeplex.
Here are some quick samples, before we start with some considerations about LINQ.
Sum the even numbers from an array of integers:
int numbers[] = {1,2,3,4,5,6,7,8,9};
auto s = from_array(numbers)
>> where([](int i) {return i%2==0;})
>> sum();
Order descending all the distinct numbers from an array of integers, transform them into string
s and print the result.
int numbers[] = {3,1,4,1,5,9,2,6};
auto result = from_array(numbers)
>> distinct()
>> orderby_descending([](int i) {return i;})
>> select([](int i){std::stringstream s; s<<i; return s.str();})
>> to_vector();
for(auto i : result)
std::cout << i << std::endl;
A Few Words About LINQ
Developers familiar with .NET LINQ and its query operators should skip this paragraph.
LINQ (Language Integrated Query) is a component of the .NET Framework that adds native query capabilities to .NET. It is basically a gateway between a language environment and a data store. LINQ defines a set of standard query operators for traversal, filtering and projection of data. In addition to the standard query operators, the C# and VB.NET languages also provide language extensions to map on these query operators and enable developers to write code in a more declarative, SQL-like manner.
The advantages of LINQ include:
- Write code in a declarative style
- Use familiar SQL-like syntax to query not just SQL databases, but virtually any source of data
- Correctness of the query is determined at compile-time, not run time
- Productivity and easier maintainability as a result of simplified and easier to write (and read) code
If you want to read more about LINQ and the .NET standard query operators, see these articles:
LINQ in C++
LINQ was made possible after .NET added support for lambdas. C++11 however, followed the path and added support for lambdas and other features that would enable us to write something similar in C++. Though not all C++11 features are yet implemented by the C++ compilers, all compilers do currently support lambdas, auto, decltype and other key features.
There have already been attempts to implement LINQ-like query operators in C++. One of these attempts is the boolinq. It's a template library that allows writing queries like this (is_prime
is a lambda that takes an int
and returns true
if the number is prime, or false
otherwise):
vector<int> some_primes(int src[], int howMany)
{
return from(src)
.where(is_prime)
.take(howMany)
.toVector();
}
The library has an important drawback: it's based on the operator ".
", which is not overloadable. Therefore, any extensions to the library are not possible without changing the existing library code.
On the other hand, Microsoft is working on a native RX library (see Reactive Extensions), that would provide query capabilities for C++. The library is not yet available, but they have delivered a presentation on Channel9 about it. The trivial example above would look extremely similar in native RX:
vector<int> some_primes(int src[], int howMany)
{
return from(src)
.where(is_prime)
.take(howMany);
.to_vector();
}
Therefore, native RX will probably suffer from the same problem when it comes to extensions (though it's too early to know how the library will finally look).
On the other hand, boost also contains a library for working with ranges, called range. boost::range
provides the means to enable generic algorithms to work with standard-like containers, null
terminated string
s, pairs of iterators, raw arrays and others. Some of the algorithms provided by the library are similar to the .NET standard query operators, but the library features more than just that. Though the library is powerful and can be used with a variety of containers, it's not small. I also find it, like boost libraries in general, deficient when it comes to simplicity and readability. And if you’re already familiar to .NET and would like to write similar code in C++, boost::range
is not the best choice, in my opinion.
Enter cpplinq
cpplinq takes another approach and uses operator >>
to pipe query operators, which makes it open for extensions without having to change existing implementation. The sample shown earlier with the prime numbers would look like this:
vector<int> some_primes(int src[], int howMany)
{
return from_array(src)
>> where(is_prime)
>> take(howMany);
>> to_vector();
}
The library requires C++11 support (for lambda, auto, decltype, static_assert and others) and is tested with all major compilers: VC++ 10 (from Visual Studio 2010), VC++ 11 (from Visual Studio 2012) and mingw (v4.7.0) on Windows, g++ (v4.7.1) and clang++ (v3.1) on Linux.
The library:
- consists of a series of template classes and functions
- all query operators are defined under namespace cpplinq
- is available in a single header (cpplinq.hpp)
Supported Query Operators
cpplinq supports most of the .NET standard query operators, and more operators will probably be implemented in the future. The following table shows a comparison with the .NET standard query operators and provides a short description of each implemented operator.
.NET Operators | cpplinq support | cpplinq name | Description |
Restriction operators |
Where | yes | where | Filters a sequence based on a predicate |
Project Operators |
Select | yes | select | Performs a projection over a sequence |
SelectMany | yes | select_many | Performs a one-to-many element projection over a sequence |
Partitioning Operators |
Take | yes | take | Creates a new sequence from a given sequence, keeping the first given number of elements and skipping the remainder of the sequence |
TakeWhile | yes | take_while | Creates a new sequence from a given sequence, keeping elements as long as a predicate applied to the element holds true and skipping the remainder of the sequence |
Skip | yes | skip | Creates a new sequence from a given sequence, skipping a given number of elements and taking the rest of the sequence |
SkipWhile | yes | skip_while | Creates a new sequence from a given sequence, skipping elements as long as a predicate applied to each element holds true and taking the rest of the sequence |
Join Operators |
Join | yes | join | Performs an inner join of two sequences based on matching keys extracted from the elements |
GroupJoin | no | | |
Concatenation Operators |
Concat | yes | concat | Concatenates two sequences |
Ordering Operators |
OrderBy | yes | orderby | Orders a sequence by a key in an ascending or descending manner |
OrderByDescending | yes | orderby_descending | Same as orderby except that the ordering is descending |
| | ordery_ascending | Same as orderby except that the ordering is ascending |
ThenBy | yes | thenby | Establishes a secondary ordering of a sequence by a key, in an ascending or descending manner |
ThenByDescending | yes | thenby_descending | Same as thenby except that the ordering is descending |
| | thenby_ascending | Same as thenby except that the ordering is ascending |
Reverse | yes | reverse | Reverses the elements of a sequence |
Grouping Operators |
GroupBy | no | | |
Set Operators |
Distinct | yes | distinct | Eliminates duplicate elements from a sequence |
Union | yes | union_with | Produces the set union of two sequences |
Intersect | yes | intersect_with | Produces the set intersection of two sequences |
Except | yes | except | Produces the set difference between two sequences |
Conversion Operators |
AsEnumerable | no | | |
ToArray | no | | |
ToList | yes | to_list | Creates a std::list<TValue> from a range, where TValue is the type of the elements of the range |
| | to_vector | Creates a std::vector<TValue> from a range, where TValue is the type of the elements of the range |
ToDictionary | yes | to_map | Creates a std::map<TKey, TValue> from a range. It takes a predicate that selects the value to use a key for each element of the range |
ToLookup | yes | to_lookup | Creates a cpplinq::lookup<TKey, TElement> from a sequence |
OfType | no | | |
Cast | no | | |
Equality Operators |
SequenceEqual | yes | sequence_equal | Checks whether two sequences are equal |
Element Operators |
First | no | | |
FirstOrDefault | yes | first_or_default | Returns the first element of a sequence, or a default value if no element is found |
Last | no | | |
LastOrDefault | yes | last_or_default | Returns the last element of a sequence, or a default value if no element is found |
Single | no | | |
SingleOrDefault | no | | |
ElementAt | no | | |
ElementAtOrDefault | yes | element_at_or_default | Returns the element at a given index in a sequence, or a default value if the index is out of range (or the sequence is empty) |
DefaultIfEmpty | no | | |
Generation Operators |
Range | yes | range | Generates a range of integral numbers |
Repeat | yes | repeat | Generates a range by repeating a value a given number of times |
Empty | yes | empty | Returns an empty range of a given type |
Qantifiers |
Any | yes | any | Checks whether any element of a sequence satisfies a condition |
All | yes | all | Checks whether all elements of a sequence satisfy a condition |
Contains | yes | contains | Checks whether a sequence contains a given element |
Aggregate Operators |
Count | yes | count | Counts the number of elements in a sequence |
LongCount | | | |
Sum | yes | sum | Computes the sum of a sequence of numeric values |
Min | yes | min | Finds the minimum of a range of numeric values |
Max | yes | max | Finds the maximum of a range of numeric values |
Average | yes | avg | Computes the average of a range of numeric values |
Aggregate | yes | aggregate | Applies a function over a sequence, accumulating the result |
Other operators |
| | for_each | Applies a function on each element of a sequence |
| | concatenate | Concatenates the elements of a sequence of strings with a specified separator |
Samples
The following sample shows some of the capabilities of the library in querying sequences.
The first examples displays a list of prime numbers, separater by a comma and a space. First, it generates a range from 0
to INT_MAX
, then it filters the prime numbers from this range, takes only the first count elements and projects the result as a string
. Then, the elements of the resulted sequence are concatenated in a string
using ",
" as delimiter.
using namespace cpplinq;
auto is_prime = [](int i) -> bool
{
if (i < 2) { return false; }
else if (i == 2) { return true; }
else
{
auto r = (int)std::ceil (std::sqrt (i));
for (auto iter = 2; iter <= r; ++iter)
{
if (i % iter == 0) { return false; }
}
return true;
}
};
void show_some_primes(int count)
{
auto primes = range(0, INT_MAX) >> where(is_prime) >> take(count) >> select( [](int i) {std::stringstream s; s << i; return s.str();});
auto result = primes >> concatenate(", ");
std::cout << result << std::endl;
}
int main()
{
show_some_primes(10); return 0;
}
It is important to note that all the above operators, except for concatenate()
, do not return a sequence, but rather an object representing a "description" of a sequence, which enables lazy evaluation. The range is only generated when we start iterating over it, and a new element is only generated when the next iteration is performed. This behavior however varies with some operators, such as the set operators. concatenate()
is the operator that iterates over the sequence and folds it into a string.
The next sample shows how to compute the set union and intersection of an array and a vector of integers.
int set1[] = {1,3,5,7,9,7,5,3,1};
std::vector<int> set2;
set2.push_back(1);
set2.push_back(2);
set2.push_back(3);
set2.push_back(4);
auto u = from_array(set1)
>> union_with(from(set2))
>> to_vector();
auto i = from_array(set1)
>> intersect_with(from(set2))
>> to_vector();
The next sample shows how to retrieve the first and last elements of a sequence, or a default value if no such element exists. Notice that these operators have an overload that takes a predicate, so that you can return the first or last element that satisfies a condition.
struct customer
{
std::size_t id;
std::string first_name;
std::string last_name;
customer (std::size_t id = 0, std::string first_name = "", std::string last_name = "")
:id(std::move(id)), first_name(std::move(first_name)), last_name(std::move(last_name))
{
}
};
customer customers[] = {
customer (1 , "Bill" , "Gates" ),
customer (2 , "Steve" , "Jobs" ),
customer (4 , "Linus" , "Torvalds"),
customer (11, "Steve" , "Ballmer" )
};
auto f = from_array(customers) >> first_or_default();
std::cout << f.last_name << std::endl;
auto l = from_array(customers) >> last_or_default();
std::cout << l.last_name << std::endl;
In the next sample, we join a sequence of customers with a sequence of customer addresses to determine who lives where. The query produces a left join between an array of customers and an array of addresses. The customer
class is the one introduced in the previous sample, while the customer_address
class is shown below:
struct customer_address
{
std::size_t id;
std::size_t customer_id;
std::string country;
customer_address(std::size_t id, std::size_t customer_id, std::string country)
: id(std::move (id)),
customer_id (std::move(customer_id)), country(std::move(country))
{
}
};
customer customers[] = {
customer (1 , "Bill" , "Gates" ),
customer (2 , "Steve" , "Jobs" ),
customer (4 , "Linus" , "Torvalds"),
customer (11, "Steve" , "Ballmer" )
};
customer_address customer_addresses[] =
{
customer_address (1, 1, "USA" ),
customer_address (2, 4, "Finland" ),
customer_address (3, 11, "USA" ),
};
auto result = from_array(customers)
>> join (
from_array(customer_addresses),
[](customer const & c) {return c.id;},
[](customer_address const & ca) {return ca.customer_id;},
[](customer const & c, customer_address const & ca)
{return std::make_pair (c, ca);});
result >> for_each([](std::pair<customer, customer_address> const & p) {
std::cout << p.first.last_name << " lives in " << p.second.country << std::endl;
});
The result that the for_each
operator prints is:
Gates lives in USA
Torvalds lives in Finland
Ballmer lives in USA
Let's suppose we also have some orders (with order head and order lines) and we want to print the last order of a particular customer
. Suppose the order head and lines look like this:
struct order
{
std::size_t id;
std::size_t customer_id;
time_t date;
order(std::size_t id, std::size_t cid, time_t date):
id(id), customer_id(cid), date(date)
{
}
};
struct order_line
{
std::size_t id;
std::size_t order_id;
std::size_t article_id;
double quantity;
order_line(std::size_t id, std::size_t oid, std::size_t aid, double quantity):
id(id), order_id(oid), article_id(aid), quantity(quantity)
{
}
};
and we have the orders and the lines in these arrays (ignore the dates and suppose they hold real values):
order orders[] =
{
order(1, 1, time(NULL)),
order(2, 2, time(NULL)),
order(3, 1, time(NULL)),
};
order_line orderlines [] =
{
order_line(1, 1, 100, 1.0),
order_line(2, 1, 101, 5.0),
order_line(3, 1, 102, 2.0),
order_line(4, 2, 400, 1.0),
order_line(5, 3, 401, 1.0),
};
A function that prints the last order from a particular customer
could look like this:
void print_last_order_for_customer(std::size_t custid)
{
auto result = from_array(orders)
>> where([=](order const & o) {return o.customer_id == custid;})
>> orderby_descending([](order const & o) {return o.date;})
>> take(1)
>> join(from_array(orderlines),
[](order const & o) {return o.id;},
[](order_line const & ol) {return ol.order_id;},
[](order const & o, order_line const & ol)
{return std::make_pair(o, ol);});
result >> for_each([](std::pair<order, order_line> const & p){
std::cout << "order=" << p.first.id << ", " << ctime(&(p.first.date))
<< " article=" << p.second.article_id << ",
quantity=" << p.second.quantity << std::endl;
});
}
Calling this function for customer 1 would print the following text:
order=3, Wed Oct 24 19:26:18 2012
article=401, quantity=1
Range Generators and Conversion Operators
The most important and most used operators are the range generators (used to build an object that represents the range on which query operators are applied) and range conversion operators (used to fold a range into a container that holds the values of the range). The library provides the following range generators:
Name | Description | Sample |
from_iterators | constructs a range from a pair of iterators |
std::vector<int> numbers;
auto result =
from_iterators(numbers.begin(), numbers.end());
result >> for_each([](int i)
{std::cout << i << std::endl;});
|
from | constructs a range from an STL-like container that provides begin() and end() methods (representing the first and past-the-end elements). This is basically a wrapper on from_iterators operator. |
std::vector<int> numbers;
auto result = from(numbers);
result >> for_each([](int i) {std::cout << i << std::endl;});
|
from_array | constructs a range from an array |
int numbers[] = {1,2,3,4,5};
auto result = from_array(numbers);
result >> for_each([](int i) {std::cout << i << std::endl;});
|
range | generates a range of integral, consecutive numbers, starting with an initial seed and having a specified number of elements |
auto result = range(10, 90);
|
repeat | generates a range by repeating a value a given number of times |
auto result = repeat("cpplinq", 10);
|
empty | returns an empty range of a given type |
auto result = empty<customer>();
|
On the other hand, the following range conversion operators, that fold a range into an STL or STL-like container, are available:
Name | Description | Sample |
to_vector | creates a std::vector<TValue> from a range, where TValue is the type of the elements of the range |
auto result = range(1, 10) >> to_vector();
|
to_list | creates a std::list<TValue> from a range, where TValue is the type of the elements of the range. |
auto result = range(1, 10) >> to_list();
|
to_map | creates a std::map<TKey, TValue> from a range. It takes a predicate that selects the value to use as the key for each element of the range. It implements a one-to-one dictionary that maps keys to single values. |
auto result = from_array (customers)
>> to_map ([](customer const & c){return c.id;});
|
to_lookup | creates a cpplinq::lookup<TKey, TElement> from a sequence. It implements a one-to-many dictionary that maps keys to sequences of values |
customer_address customer_addresses[] =
{
customer_address (2, 4, "Finland" ),
customer_address (3, 4, "USA" ),
customer_address (1, 1, "USA" ),
};
auto lookup = from_array (customer_addresses)
>> to_lookup ([] (customer_address const & ca)
{return ca.customer_id;});
auto countries = lookup[4]
>> select([](customer_address const & ca)
{return ca.country;})
>> to_vector();
|
Extending the Library
Because the cpplinq
library is built on operator>>
, it is possible to extend it with new query operators, without making changes to the existing code. The library consists of range builders, range operators and helper methods.
A range builder is a class that folds the range (sequence) into a value. It could be a single value, for instance, the sum of the elements of a sequence, or a container of values, like a vector, containing the elements of a sequence.
A range operator is a class that transforms a range into another range. It could be seen as a range builder that returns another range, instead of a single (folded value). For instance, a select range can transform a sequence of integers into a sequence of string
s, and a where
range can filter a range of integers to keep only even numbers. Every range operator has a companion range builder whose purpose is to produce a new range.
In addition, for each query operator, we define a helper, stand-alone function that creates and returns an instance of a builder class.
Building a Range Aggregator
A range aggregator has several requirements:
- must inherit
base_builder
- must be copyable
- must be moveable
- has a member type called
this_type
representing the type of the builder
class - has a method called
build
that accepts a range and returns the aggregated value
Below is the code for the sum range aggregator.
struct sum_builder : base_builder
{
typedef sum_builder this_type;
CPPLINQ_INLINEMETHOD sum_builder () throw ()
{
}
CPPLINQ_INLINEMETHOD sum_builder (sum_builder const & v) throw ()
{
}
CPPLINQ_INLINEMETHOD sum_builder (sum_builder && v) throw ()
{
}
template<typename TRange>
CPPLINQ_INLINEMETHOD typename TRange::value_type build (TRange range)
{
auto sum = typename TRange::value_type ();
while (range.next ())
{
sum += range.front ();
}
return std::move (sum);
}
};
The build
method does the work by traversing the input range and summing the elements.
In addition, we define a helper method that creates and returns an instance of the builder.
template<typename TSelector>
CPPLINQ_INLINEMETHOD detail::sum_selector_builder<TSelector> sum(TSelector selector) throw()
{
return detail::sum_selector_builder<TSelector> (std::move (selector));
}
Having the helper method defined like that, we can use it to sum the elements of a sequence.
int numbers [] = {1,2,3,4};
auto s = from_array(numbers) >> sum();
Building a Range Operator
A range operator has several requirements, some identical to a range builder:
- must inherit
base_range
- must be copyable
- must be moveable
- has a member type called
this_type
which is the type of the range class - has a member type called
value_type
which is the type of values in the range - has a member type called
return_type
which is value_type const &
or value_type
- has an
enum
member return_reference
which is either 0
or 1
, depending on whether return_type
is a reference type or not - has a member method called
front
which returns the current value as a return_type
- has a member method called
next
which moves the next value if available and returns true
, or returns false
if no other value is available - has a member method called
operator>>
that accepts a range builder and returns a new range
The front method:
- If called on a range where next has not be been called or the previous next returned
false
, the behavior is undefined
The next method:
- If called on a range where the previous next returned
false
, next
should do nothing and return false
- A range where
next
never has been called should be seen as "pointing" to no-value (as opposed to begin()
which "points" to the first value in a collection)
Note: There is no method to reset to range to the first position again. Once it's traversed the race is over. The rationale for that is to support range sources that doesn't support resets. In practice, for in-memory collections, resets are possible but as long as the range source supports normal copy-semantics resets are achievable using that.
The following is the code for the where_range
operator.
template<typename TRange, typename TPredicate>
struct where_range : base_range
{
typedef where_range<TRange, TPredicate> this_type ;
typedef TRange range_type ;
typedef TPredicate predicate_type ;
typedef typename TRange::value_type value_type ;
typedef typename TRange::return_type return_type ;
enum
{
returns_reference = TRange::returns_reference ,
};
range_type range ;
predicate_type predicate ;
CPPLINQ_INLINEMETHOD where_range (
range_type range
, predicate_type predicate
) throw ()
: range (std::move (range))
, predicate (std::move (predicate))
{
}
CPPLINQ_INLINEMETHOD where_range (where_range const & v)
: range (v.range)
, predicate (v.predicate)
{
}
CPPLINQ_INLINEMETHOD where_range (where_range && v) throw ()
: range (std::move (v.range))
, predicate (std::move (v.predicate))
{
}
template<typename TRangeBuilder<
CPPLINQ_INLINEMETHOD typename get_builtup_type<TRangeBuilder,
this_type<::type operator>>(TRangeBuilder range_builder) throw ()
{
return range_builder.build (*this);
}
CPPLINQ_INLINEMETHOD return_type front () const
{
return range.front ();
}
CPPLINQ_INLINEMETHOD bool next ()
{
while (range.next ())
{
if (predicate (range.front ()))
{
return true;
}
}
return false;
}
};
In addition, we define a where_range
builder, whose purpose is to build a where_range
operator.
template<typename TPredicate>
struct where_builder : base_builder
{
typedef where_builder<TPredicate> this_type ;
typedef TPredicate predicate_type ;
predicate_type predicate ;
CPPLINQ_INLINEMETHOD explicit where_builder (predicate_type predicate) throw ()
: predicate (std::move (predicate))
{
}
CPPLINQ_INLINEMETHOD where_builder (where_builder const & v)
: predicate (v.predicate)
{
}
CPPLINQ_INLINEMETHOD where_builder (where_builder && v) throw ()
: predicate (std::move (v.predicate))
{
}
template<typename TRange>
CPPLINQ_INLINEMETHOD where_range<TRange, TPredicate> build (TRange range) const throw ()
{
return where_range<TRange, TPredicate>(std::move (range), predicate);
}
};
Last, we also add a helper method that instantiates and returns a where_range
builder.
template<typename TPredicate>
CPPLINQ_INLINEMETHOD detail::where_builder<TPredicate> where(TPredicate predicate) throw()
{
return detail::where_builder<TPredicate> (std::move (predicate));
}
Having the helper method defined, we can put it to use.
int numbers [] = {1,2,3,4,5,6,7,8,9};
auto result = from_array(numbers)
>> where([](int i) {return i%2==0;})
>> to_vector();
Conclusion
The cpplinq template library is an attempt to provide .NET-like query operators for sequences of objects in C++11. The library implements most of the standard .NET query operators and can be used with all major compilers. The library is built around operator>>
which makes it easily extensible without needing to change existing library code. You can download the library (and the unit tests) from the library's home page at codeplex.
History
- 4th November, 2012: Initial version