Introduction
This article is about comparing (mostly performance-wise) of LINQ ports from .NET to PHP. In .NET, LINQ is used for writing SQL-like queries on various collections, including databases. In PHP, it is usually used for transforming arrays, like built-in functions array_filter
and array_map
, but in a more readable form and with much more features. Due to limitations of PHP and the current state of the libraries, LINQ ports are best suited for performing transformations on relatively small datasets returned from web services, for example.
This is not an introductory article, I will not include tutorials on using LINQ. I will likely write another article with more details for beginners. However, some examples should be self-explanatory, so even if you have not used LINQ before, you can compare the code.
You are expected to know how to use closures. Knowing LINQ is a huge bonus.
Background
Before developping one more port of LINQ from .NET to PHP, I've deeply investigated all available libraries. There were plenty of them: LINQ for PHP, Phinq, PHPLinq and Plinq. Unfortunately, all of them do not support lazy evaluation; most of them do not include enough tests (if any); documentation is either missing or incomplete etc. Overall, they are clearly not production-ready.
This is why YaLinqo was born. At the time, it was the only LINQ port which truly implemented LINQ to objects. It has 100% test coverage, very detailed PHPDoc, supports "string lambdas" and does not lose keys during transformations. The first version was implemented in PHP 5.3, later it was updated to rely on yield
from PHP 5.5.
Since then, two libraries have appeared which rival YaLinqo. The first one is Ginq. Unlike YaLinqo, it relies on manually implemented iterators. In a way, it is implemented closer to "PHP-way" than the first version of YaLinqo, which relied on "hackish" iterators inspired by LINQ.js. It doesn't support "string lambdas", instead it supports "property access" from Symfony, which comes in handy when sorting, grouping and joining. Many methods have aliases coming from functional programming, for example "map" in addition to "select". Documentation is not detailed.
Another library is Pinq. It is the (potentially) most powerful library which supports both objects and databases. It supports parsing PHP code using PHP-Parser and can generate SQL. Unfortunately, at the time of writing, the only query provider is for MySQL and its state is a "demonstration". I suspect there is a lot of work to do before it becomes production ready and starts supporting multiple DBMS. Another drawback is that, surprisingly, it contains less functions and its functions are less featureful.
All three libraries have permissive open-source licenses, good test coverage, documentation, support lots of functions, are available on Packagist, and overall are ready to use in any project which doesn't require heavy optimization. If you count every microsecond, you should take into account that these libraries add considerable overhead, so if you use them in a high-load project where LINQ queries are a substantial part of the executed code, you may prefer to keep using good old for
and foreach
. However, I don't consider script languages a good option for high-load projects, and most heavy logic is usually done in DB, so in most cases, increased readability and maintainability are worth some performance loss.
It is interesting to note that the three libraries are very different is size: YaLinqo contains 4 classes and has zero dependencies, Ginq contains more than 70 classes and has a dependency on Symfony's Property Access module, Pinq contains more than 500 classes and has a dependency on PHP-Parser. The difference lies in their architecture. YaLinqo uses only PHP arrays and callbacks. Pinq includes iterator classes for every transformation, collections, comparers etc. Ginq contains even more classes and interfaces inspired by LINQ in .NET and includes all the plumbing necessary for supporting databases: repositories, parsing etc. (I haven't thoroughly investigated sources of Pinq.)
About tests
I have very little experience in performance testing, so tests are quick and dirty, without much thought put into getting precise results. Memory usage isn't considered at all. However, the difference in performance is so huge that I don't think precision really matters. If you find a bug in the code or can improve the tests, the project is available on GitHub, pull requests are welcome.
In all following tests, benchmark_linq_groups
function is called which accepts an array of functions for implementations in PHP, YaLinqo, Ginq and Pinq. This function consumes produced collections using foreach
and makes sure that results returned from all tests are the same.
Tests are performed on PHP 5.5.14, Windows 7 SP1 64-bit.
Tests
Let's start with pure overhead:
benchmark_linq_groups("Iterating over $ITER_MAX ints", 100, null,
[
"for" => function () use ($ITER_MAX) {
$j = null;
for ($i = 0; $i < $ITER_MAX; $i++)
$j = $i;
return $j;
},
"array functions" => function () use ($ITER_MAX) {
$j = null;
foreach (range(0, $ITER_MAX - 1) as $i)
$j = $i;
return $j;
},
],
[
function () use ($ITER_MAX) {
$j = null;
foreach (E::range(0, $ITER_MAX) as $i)
$j = $i;
return $j;
},
],
[
function () use ($ITER_MAX) {
$j = null;
foreach (G::range(0, $ITER_MAX - 1) as $i)
$j = $i;
return $j;
},
],
[
function () use ($ITER_MAX) {
$j = null;
foreach (P::from(range(0, $ITER_MAX - 1)) as $i)
$j = $i;
return $j;
},
]);
Generator function range
is not available in Pinq, so I use a built-in function instead, as suggested in its documentation.
Here are the results:
Iterating over 1000 ints
------------------------
PHP [for] 0.00006 sec x1.0 (100%)
PHP [array functions] 0.00011 sec x1.8 (+83%)
YaLinqo 0.00041 sec x6.8 (+583%)
Ginq 0.00075 sec x12.5 (+1150%)
Pinq 0.00169 sec x28.2 (+2717%)
Iterators waste lots of time. Pinq surprises the most — 30 times slower than for
. However, it is far from the most surprising result, as you will see.
Let's generate an array instead of just iterating:
benchmark_linq_groups("Generating array of $ITER_MAX integers", 100, 'consume',
[
"for" =>
function () use ($ITER_MAX) {
$a = [ ];
for ($i = 0; $i < $ITER_MAX; $i++)
$a[] = $i;
return $a;
},
"array functions" =>
function () use ($ITER_MAX) {
return range(0, $ITER_MAX - 1);
},
],
[
function () use ($ITER_MAX) {
return E::range(0, $ITER_MAX)->toArray();
},
],
[
function () use ($ITER_MAX) {
return G::range(0, $ITER_MAX - 1)->toArray();
},
],
[
function () use ($ITER_MAX) {
return P::from(range(0, $ITER_MAX - 1))->asArray();
},
]);
And results:
Generating array of 1000 integers
---------------------------------
PHP [for] 0.00025 sec x1.3 (+32%)
PHP [array functions] 0.00019 sec x1.0 (100%)
YaLinqo 0.00060 sec x3.2 (+216%)
Ginq 0.00107 sec x5.6 (+463%)
Pinq 0.00183 sec x9.6 (+863%)
YaLinqo is now only two time slower than the solution with for
. Other libraries performed worse, but satisfactory.
Let's count items from test data: number of orders having more than 5 order items; number of order having more than 2 order items of quantity more than 5.
benchmark_linq_groups("Counting values in arrays", 100, null,
[
"for" => function () use ($DATA) {
$numberOrders = 0;
foreach ($DATA->orders as $order) {
if (count($order['items']) > 5)
$numberOrders++;
}
return $numberOrders;
},
"array functions" => function () use ($DATA) {
return count(
array_filter(
$DATA->orders,
function ($order) { return count($order['items']) > 5; }
)
);
},
],
[
function () use ($DATA) {
return E::from($DATA->orders)
->count(function ($order) { return count($order['items']) > 5; });
},
"string lambda" => function () use ($DATA) {
return E::from($DATA->orders)
->count('$o ==> count($o["items"]) > 5');
},
],
[
function () use ($DATA) {
return G::from($DATA->orders)
->count(function ($order) { return count($order['items']) > 5; });
},
],
[
function () use ($DATA) {
return P::from($DATA->orders)
->where(function ($order) { return count($order['items']) > 5; })
->count();
},
]);
benchmark_linq_groups("Counting values in arrays deep", 100, null,
[
"for" => function () use ($DATA) {
$numberOrders = 0;
foreach ($DATA->orders as $order) {
$numberItems = 0;
foreach ($order['items'] as $item) {
if ($item['quantity'] > 5)
$numberItems++;
}
if ($numberItems > 2)
$numberOrders++;
}
return $numberOrders;
},
"array functions" => function () use ($DATA) {
return count(
array_filter(
$DATA->orders,
function ($order) {
return count(
array_filter(
$order['items'],
function ($item) { return $item['quantity'] > 5; }
)
) > 2;
})
);
},
],
[
function () use ($DATA) {
return E::from($DATA->orders)
->count(function ($order) {
return E::from($order['items'])
->count(function ($item) { return $item['quantity'] > 5; }) > 2;
});
},
],
[
function () use ($DATA) {
return G::from($DATA->orders)
->count(function ($order) {
return G::from($order['items'])
->count(function ($item) { return $item['quantity'] > 5; }) > 2;
});
},
],
[
function () use ($DATA) {
return P::from($DATA->orders)
->where(function ($order) {
return P::from($order['items'])
->where(function ($item) { return $item['quantity'] > 5; })
->count() > 2;
})
->count();
},
]);
Points to note: first, functional style with standard array functions turns code into funny barely readable stairs. Second, "string lambdas" don't help here, because escaping code in escaped code is incomprehensible. And third, Pinq does not provide an overload of count
function accepting a predicate, so a method chain is necessary. Results:
Counting values in arrays
-------------------------
PHP [for] 0.00023 sec x1.0 (100%)
PHP [array functions] 0.00052 sec x2.3 (+126%)
YaLinqo 0.00056 sec x2.4 (+143%)
YaLinqo [string lambda] 0.00059 sec x2.6 (+157%)
Ginq 0.00129 sec x5.6 (+461%)
Pinq 0.00382 sec x16.6 (+1561%)
Counting values in arrays deep
------------------------------
PHP [for] 0.00064 sec x1.0 (100%)
PHP [array functions] 0.00323 sec x5.0 (+405%)
YaLinqo 0.00798 sec x12.5 (+1147%)
Ginq 0.01416 sec x22.1 (+2113%)
Pinq 0.04928 sec x77.0 (+7600%)
Results are more or less predictable, except for scary Pinq's result. I've looked at the code — it seems to generate a complete collection and then call built-in count
on it...
Let's filter arrays. Conditions are like last time, but instead of counting, we generate the collections.
benchmark_linq_groups("Filtering values in arrays", 100, 'consume',
[
"for" => function () use ($DATA) {
$filteredOrders = [ ];
foreach ($DATA->orders as $order) {
if (count($order['items']) > 5)
$filteredOrders[] = $order;
}
return $filteredOrders;
},
"array functions" => function () use ($DATA) {
return array_filter(
$DATA->orders,
function ($order) { return count($order['items']) > 5; }
);
},
],
[
function () use ($DATA) {
return E::from($DATA->orders)
->where(function ($order) { return count($order['items']) > 5; });
},
"string lambda" => function () use ($DATA) {
return E::from($DATA->orders)
->where('$order ==> count($order["items"]) > 5');
},
],
[
function () use ($DATA) {
return G::from($DATA->orders)
->where(function ($order) { return count($order['items']) > 5; });
},
],
[
function () use ($DATA) {
return P::from($DATA->orders)
->where(function ($order) { return count($order['items']) > 5; });
},
]);
benchmark_linq_groups("Filtering values in arrays deep", 100,
function ($e) { consume($e, [ 'items' => null ]); },
[
"for" => function () use ($DATA) {
$filteredOrders = [ ];
foreach ($DATA->orders as $order) {
$filteredItems = [ ];
foreach ($order['items'] as $item) {
if ($item['quantity'] > 5)
$filteredItems[] = $item;
}
if (count($filteredItems) > 0) {
$order['items'] = $filteredItems;
$filteredOrders[] = [
'id' => $order['id'],
'items' => $filteredItems,
];
}
}
return $filteredOrders;
},
"array functions" => function () use ($DATA) {
return array_filter(
array_map(
function ($order) {
return [
'id' => $order['id'],
'items' => array_filter(
$order['items'],
function ($item) { return $item['quantity'] > 5; }
)
];
},
$DATA->orders
),
function ($order) {
return count($order['items']) > 0;
}
);
},
],
[
function () use ($DATA) {
return E::from($DATA->orders)
->select(function ($order) {
return [
'id' => $order['id'],
'items' => E::from($order['items'])
->where(function ($item) { return $item['quantity'] > 5; })
->toArray()
];
})
->where(function ($order) {
return count($order['items']) > 0;
});
},
"string lambda" => function () use ($DATA) {
return E::from($DATA->orders)
->select(function ($order) {
return [
'id' => $order['id'],
'items' => E::from($order['items'])->where('$v["quantity"] > 5')->toArray()
];
})
->where('count($v["items"]) > 0');
},
],
[
function () use ($DATA) {
return G::from($DATA->orders)
->select(function ($order) {
return [
'id' => $order['id'],
'items' => G::from($order['items'])
->where(function ($item) { return $item['quantity'] > 5; })
->toArray()
];
})
->where(function ($order) {
return count($order['items']) > 0;
});
},
],
[
function () use ($DATA) {
return P::from($DATA->orders)
->select(function ($order) {
return [
'id' => $order['id'],
'items' => P::from($order['items'])
->where(function ($item) { return $item['quantity'] > 5; })
->asArray()
];
})
->where(function ($order) {
return count($order['items']) > 0;
});
},
]);
Code using standard array functions becomes very hard to comprehend, mostly due to inconsistent argument order of array_map
and array_filter
.
Code using LINQ is intentionally not optimal: objects are generated even if they will be discarded later. It is a tradition in LINQ to rely on "anonymous objects" to pass data between transformations.
Compared to previous results, these results are unusually even:
Filtering values in arrays
--------------------------
PHP [for] 0.00049 sec x1.0 (100%)
PHP [array functions] 0.00072 sec x1.5 (+47%)
YaLinqo 0.00094 sec x1.9 (+92%)
YaLinqo [string lambda] 0.00094 sec x1.9 (+92%)
Ginq 0.00295 sec x6.0 (+502%)
Pinq 0.00328 sec x6.7 (+569%)
Filtering values in arrays deep
-------------------------------
PHP [for] 0.00514 sec x1.0 (100%)
PHP [array functions] 0.00739 sec x1.4 (+44%)
YaLinqo 0.01556 sec x3.0 (+203%)
YaLinqo [string lambda] 0.01750 sec x3.4 (+240%)
Ginq 0.03101 sec x6.0 (+503%)
Pinq 0.05435 sec x10.6 (+957%)
Let's get to sorting:
benchmark_linq_groups("Sorting arrays", 100, 'consume',
[
function () use ($DATA) {
$orderedUsers = $DATA->users;
usort(
$orderedUsers,
function ($a, $b) {
$diff = $a['rating'] - $b['rating'];
if ($diff !== 0)
return -$diff;
$diff = strcmp($a['name'], $b['name']);
if ($diff !== 0)
return $diff;
$diff = $a['id'] - $b['id'];
return $diff;
});
return $orderedUsers;
},
],
[
function () use ($DATA) {
return E::from($DATA->users)
->orderByDescending(function ($u) { return $u['rating']; })
->thenBy(function ($u) { return $u['name']; })
->thenBy(function ($u) { return $u['id']; });
},
"string lambda" => function () use ($DATA) {
return E::from($DATA->users)
->orderByDescending('$v["rating"]')->thenBy('$v["name"]')->thenBy('$v["id"]');
},
],
[
function () use ($DATA) {
return G::from($DATA->users)
->orderByDesc(function ($u) { return $u['rating']; })
->thenBy(function ($u) { return $u['name']; })
->thenBy(function ($u) { return $u['id']; });
},
"property path" => function () use ($DATA) {
return G::from($DATA->users)
->orderByDesc('[rating]')->thenBy('[name]')->thenBy('[id]');
},
],
[
function () use ($DATA) {
return P::from($DATA->users)
->orderByDescending(function ($u) { return $u['rating']; })
->thenByAscending(function ($u) { return $u['name']; })
->thenByAscending(function ($u) { return $u['id']; });
},
]);
Code for usort
's callback is a little scary, but with some practice, it is pretty easy to write code for comparers. Code using LINQ is very clean, especially in case of Ginq where "property access" makes the code beautiful.
Results are unanticipated:
Sorting arrays
--------------
PHP 0.00037 sec x1.0 (100%)
YaLinqo 0.00161 sec x4.4 (+335%)
YaLinqo [string lambda] 0.00163 sec x4.4 (+341%)
Ginq 0.00402 sec x10.9 (+986%)
Ginq [property path] 0.01998 sec x54.0 (+5300%)
Pinq 0.00132 sec x3.6 (+257%)
First, Pinq is the fastest among LINQ libraries for the first time (spoiler: and the last time).
Second, Ginq's property access is incredibly slow. I would say it is unusable, because they are not worth 50x increase in time.
We get to the interesting part — joining two arrays based on equal keys in both.
benchmark_linq_groups("Joining arrays", 100, 'consume',
[
function () use ($DATA) {
$ordersByCustomerId = [ ];
foreach ($DATA->orders as $order)
$ordersByCustomerId[$order['customerId']][] = $order;
$pairs = [ ];
foreach ($DATA->users as $user) {
$userId = $user['id'];
if (isset($ordersByCustomerId[$userId])) {
foreach ($ordersByCustomerId[$userId] as $order) {
$pairs[] = [
'order' => $order,
'user' => $user,
];
}
}
}
return $pairs;
},
],
[
function () use ($DATA) {
return E::from($DATA->orders)
->join($DATA->users,
function ($o) { return $o['customerId']; },
function ($u) { return $u['id']; },
function ($o, $u) {
return [
'order' => $o,
'user' => $u,
];
});
},
"string lambda" => function () use ($DATA) {
return E::from($DATA->orders)
->join($DATA->users,
'$o ==> $o["customerId"]', '$u ==> $u["id"]',
'($o, $u) ==> [
"order" => $o,
"user" => $u,
]');
},
],
[
function () use ($DATA) {
return G::from($DATA->orders)
->join($DATA->users,
function ($o) { return $o['customerId']; },
function ($u) { return $u['id']; },
function ($o, $u) {
return [
'order' => $o,
'user' => $u,
];
});
},
"property path" => function () use ($DATA) {
return G::from($DATA->orders)
->join($DATA->users,
'[customerId]', '[id]',
function ($o, $u) {
return [
'order' => $o,
'user' => $u,
];
});
},
],
[
function () use ($DATA) {
return P::from($DATA->orders)
->join($DATA->users)
->onEquality(
function ($o) { return $o['customerId']; },
function ($u) { return $u['id']; }
)
->to(function ($o, $u) {
return [
'order' => $o,
'user' => $u,
];
});
},
]);
Pinq's code is different from the others. It transforms a single method call into a chain. It increases readability, but may look unusual for those who got used to LINQ methods chains in .NET.
And results:
Joining arrays
--------------
PHP 0.00021 sec x1.0 (100%)
YaLinqo 0.00065 sec x3.1 (+210%)
YaLinqo [string lambda] 0.00070 sec x3.3 (+233%)
Ginq 0.00103 sec x4.9 (+390%)
Ginq [property path] 0.00200 sec x9.5 (+852%)
Pinq 1.24155 sec x5,911.8 (+591084%)
Wow. Just wow. No, it is not a joke. I thought that the script hung, but eventually it returned this startling result. Pinq is 5,912 times slower than raw PHP. I could not find where exactly this happens in Plinq's code, but looks like it is basically for-for-if
with no lookups. I totally didn't expect this from a developer who implemented 500 classes.
Okay, let's see a simpler test — aggregating (or accumulating, or folding).
benchmark_linq_groups("Aggregating arrays", 100, null,
[
"for" => function () use ($DATA) {
$sum = 0;
foreach ($DATA->products as $p)
$sum += $p['quantity'];
$avg = 0;
foreach ($DATA->products as $p)
$avg += $p['quantity'];
$avg /= count($DATA->products);
$min = PHP_INT_MAX;
foreach ($DATA->products as $p)
$min = min($min, $p['quantity']);
$max = -PHP_INT_MAX;
foreach ($DATA->products as $p)
$max = max($max, $p['quantity']);
return "$sum-$avg-$min-$max";
},
"array functions" => function () use ($DATA) {
$sum = array_sum(array_map(function ($p) { return $p['quantity']; }, $DATA->products));
$avg = array_sum(array_map(function ($p) { return $p['quantity']; }, $DATA->products)) / count($DATA->products);
$min = min(array_map(function ($p) { return $p['quantity']; }, $DATA->products));
$max = max(array_map(function ($p) { return $p['quantity']; }, $DATA->products));
return "$sum-$avg-$min-$max";
},
],
[
function () use ($DATA) {
$sum = E::from($DATA->products)->sum(function ($p) { return $p['quantity']; });
$avg = E::from($DATA->products)->average(function ($p) { return $p['quantity']; });
$min = E::from($DATA->products)->min(function ($p) { return $p['quantity']; });
$max = E::from($DATA->products)->max(function ($p) { return $p['quantity']; });
return "$sum-$avg-$min-$max";
},
"string lambda" => function () use ($DATA) {
$sum = E::from($DATA->products)->sum('$v["quantity"]');
$avg = E::from($DATA->products)->average('$v["quantity"]');
$min = E::from($DATA->products)->min('$v["quantity"]');
$max = E::from($DATA->products)->max('$v["quantity"]');
return "$sum-$avg-$min-$max";
},
],
[
function () use ($DATA) {
$sum = G::from($DATA->products)->sum(function ($p) { return $p['quantity']; });
$avg = G::from($DATA->products)->average(function ($p) { return $p['quantity']; });
$min = G::from($DATA->products)->min(function ($p) { return $p['quantity']; });
$max = G::from($DATA->products)->max(function ($p) { return $p['quantity']; });
return "$sum-$avg-$min-$max";
},
"property path" => function () use ($DATA) {
$sum = G::from($DATA->products)->sum('[quantity]');
$avg = G::from($DATA->products)->average('[quantity]');
$min = G::from($DATA->products)->min('[quantity]');
$max = G::from($DATA->products)->max('[quantity]');
return "$sum-$avg-$min-$max";
},
],
[
function () use ($DATA) {
$sum = P::from($DATA->products)->sum(function ($p) { return $p['quantity']; });
$avg = P::from($DATA->products)->average(function ($p) { return $p['quantity']; });
$min = P::from($DATA->products)->minimum(function ($p) { return $p['quantity']; });
$max = P::from($DATA->products)->maximum(function ($p) { return $p['quantity']; });
return "$sum-$avg-$min-$max";
},
]);
benchmark_linq_groups("Aggregating arrays custom", 100, null,
[
function () use ($DATA) {
$mult = 1;
foreach ($DATA->products as $p)
$mult *= $p['quantity'];
return $mult;
},
],
[
function () use ($DATA) {
return E::from($DATA->products)->aggregate(function ($a, $p) { return $a * $p['quantity']; }, 1);
},
"string lambda" => function () use ($DATA) {
return E::from($DATA->products)->aggregate('$a * $v["quantity"]', 1);
},
],
[
function () use ($DATA) {
return G::from($DATA->products)->aggregate(1, function ($a, $p) { return $a * $p['quantity']; });
},
],
[
function () use ($DATA) {
return P::from($DATA->products)
->select(function ($p) { return $p['quantity']; })
->aggregate(function ($a, $q) { return $a * $q; });
},
]);
There is not much to explain in the first group of functions.
In the second group, I am calculating multiplication (yes, multiplying product quantities does not make much sense, but who cares). There's no overload in Plinq which accepts a seed, it always uses the first element (it also silently returns null if there are no elements...), so I had to use a method chain, again.
Results:
Aggregating arrays
------------------
PHP [for] 0.00059 sec x1.0 (100%)
PHP [array functions] 0.00193 sec x3.3 (+227%)
YaLinqo 0.00475 sec x8.1 (+705%)
YaLinqo [string lambda] 0.00515 sec x8.7 (+773%)
Ginq 0.00669 sec x11.3 (+1034%)
Ginq [property path] 0.03955 sec x67.0 (+6603%)
Pinq 0.03226 sec x54.7 (+5368%)
Aggregating arrays custom
-------------------------
PHP 0.00007 sec x1.0 (100%)
YaLinqo 0.00046 sec x6.6 (+557%)
YaLinqo [string lambda] 0.00057 sec x8.1 (+714%)
Ginq 0.00046 sec x6.6 (+557%)
Pinq 0.00610 sec x87.1 (+8615%)
All LINQ libraries performed bad. Ginq in property-access mode and Pinq performed remarkably bad. Even built-in functions turned out to be far from performant. For
rules.
And finally, the last test with a complex query from YaLinqo's ReadMe, which uses several functions and subqueries:
benchmark_linq_groups("Process data from ReadMe example", 5,
function ($e) { consume($e, [ 'products' => null ]); },
[
function () use ($DATA) {
$productsSorted = [ ];
foreach ($DATA->products as $product) {
if ($product['quantity'] > 0) {
if (empty($productsSorted[$product['catId']]))
$productsSorted[$product['catId']] = [ ];
$productsSorted[$product['catId']][] = $product;
}
}
foreach ($productsSorted as $catId => $products) {
usort($productsSorted[$catId], function ($a, $b) {
$diff = $a['quantity'] - $b['quantity'];
if ($diff != 0)
return -$diff;
$diff = strcmp($a['name'], $b['name']);
return $diff;
});
}
$result = [ ];
$categoriesSorted = $DATA->categories;
usort($categoriesSorted, function ($a, $b) {
return strcmp($a['name'], $b['name']);
});
foreach ($categoriesSorted as $category) {
$categoryId = $category['id'];
$result[$category['id']] = [
'name' => $category['name'],
'products' => isset($productsSorted[$categoryId]) ? $productsSorted[$categoryId] : [ ],
];
}
return $result;
},
],
[
function () use ($DATA) {
return E::from($DATA->categories)
->orderBy(function ($cat) { return $cat['name']; })
->groupJoin(
from($DATA->products)
->where(function ($prod) { return $prod['quantity'] > 0; })
->orderByDescending(function ($prod) { return $prod['quantity']; })
->thenBy(function ($prod) { return $prod['name']; }),
function ($cat) { return $cat['id']; },
function ($prod) { return $prod['catId']; },
function ($cat, $prods) {
return array(
'name' => $cat['name'],
'products' => $prods
);
}
);
},
"string lambda" => function () use ($DATA) {
return E::from($DATA->categories)
->orderBy('$cat ==> $cat["name"]')
->groupJoin(
from($DATA->products)
->where('$prod ==> $prod["quantity"] > 0')
->orderByDescending('$prod ==> $prod["quantity"]')
->thenBy('$prod ==> $prod["name"]'),
'$cat ==> $cat["id"]', '$prod ==> $prod["catId"]',
'($cat, $prods) ==> [
"name" => $cat["name"],
"products" => $prods
]');
},
],
[
function () use ($DATA) {
return G::from($DATA->categories)
->orderBy(function ($cat) { return $cat['name']; })
->groupJoin(
G::from($DATA->products)
->where(function ($prod) { return $prod['quantity'] > 0; })
->orderByDesc(function ($prod) { return $prod['quantity']; })
->thenBy(function ($prod) { return $prod['name']; }),
function ($cat) { return $cat['id']; },
function ($prod) { return $prod['catId']; },
function ($cat, $prods) {
return array(
'name' => $cat['name'],
'products' => $prods
);
}
);
},
],
[
function () use ($DATA) {
return P::from($DATA->categories)
->orderByAscending(function ($cat) { return $cat['name']; })
->groupJoin(
P::from($DATA->products)
->where(function ($prod) { return $prod['quantity'] > 0; })
->orderByDescending(function ($prod) { return $prod['quantity']; })
->thenByAscending(function ($prod) { return $prod['name']; })
)
->onEquality(
function ($cat) { return $cat['id']; },
function ($prod) { return $prod['catId']; }
)
->to(function ($cat, $prods) {
return array(
'name' => $cat['name'],
'products' => $prods
);
});
},
]);
Results:
Process data from ReadMe example
--------------------------------
PHP 0.00620 sec x1.0 (100%)
YaLinqo 0.02840 sec x4.6 (+358%)
YaLinqo [string lambda] 0.02920 sec x4.7 (+371%)
Ginq 0.07720 sec x12.5 (+1145%)
Pinq 2.71616 sec x438.1 (+43707%)
GroupJoin
killed the performance of Pinq. I guess the reson is the same as in the test with join
.
All results
Iterating over 1000 ints
------------------------
PHP [for] 0.00006 sec x1.0 (100%)
PHP [array functions] 0.00011 sec x1.8 (+83%)
YaLinqo 0.00041 sec x6.8 (+583%)
Ginq 0.00075 sec x12.5 (+1150%)
Pinq 0.00169 sec x28.2 (+2717%)
Generating array of 1000 integers
---------------------------------
PHP [for] 0.00025 sec x1.3 (+32%)
PHP [array functions] 0.00019 sec x1.0 (100%)
YaLinqo 0.00060 sec x3.2 (+216%)
Ginq 0.00107 sec x5.6 (+463%)
Pinq 0.00183 sec x9.6 (+863%)
Generating lookup of 1000 floats, calculate sum
-----------------------------------------------
PHP 0.00124 sec x1.0 (100%)
YaLinqo 0.00381 sec x3.1 (+207%)
YaLinqo [string lambda] 0.00403 sec x3.3 (+225%)
Ginq 0.01390 sec x11.2 (+1021%)
Pinq * Not implemented
Counting values in arrays
-------------------------
PHP [for] 0.00023 sec x1.0 (100%)
PHP [arrays functions] 0.00052 sec x2.3 (+126%)
YaLinqo 0.00056 sec x2.4 (+143%)
YaLinqo [string lambda] 0.00059 sec x2.6 (+157%)
Ginq 0.00129 sec x5.6 (+461%)
Pinq 0.00382 sec x16.6 (+1561%)
Counting values in arrays deep
------------------------------
PHP [for] 0.00064 sec x1.0 (100%)
PHP [arrays functions] 0.00323 sec x5.0 (+405%)
YaLinqo 0.00798 sec x12.5 (+1147%)
Ginq 0.01416 sec x22.1 (+2113%)
Pinq 0.04928 sec x77.0 (+7600%)
Filtering values in arrays
--------------------------
PHP [for] 0.00049 sec x1.0 (100%)
PHP [arrays functions] 0.00072 sec x1.5 (+47%)
YaLinqo 0.00094 sec x1.9 (+92%)
YaLinqo [string lambda] 0.00094 sec x1.9 (+92%)
Ginq 0.00295 sec x6.0 (+502%)
Pinq 0.00328 sec x6.7 (+569%)
Filtering values in arrays deep
-------------------------------
PHP [for] 0.00514 sec x1.0 (100%)
PHP [arrays functions] 0.00739 sec x1.4 (+44%)
YaLinqo 0.01556 sec x3.0 (+203%)
YaLinqo [string lambda] 0.01750 sec x3.4 (+240%)
Ginq 0.03101 sec x6.0 (+503%)
Pinq 0.05435 sec x10.6 (+957%)
Sorting arrays
--------------
PHP 0.00037 sec x1.0 (100%)
YaLinqo 0.00161 sec x4.4 (+335%)
YaLinqo [string lambda] 0.00163 sec x4.4 (+341%)
Ginq 0.00402 sec x10.9 (+986%)
Ginq [property path] 0.01998 sec x54.0 (+5300%)
Pinq 0.00132 sec x3.6 (+257%)
Joining arrays
--------------
PHP 0.00016 sec x1.0 (100%)
YaLinqo 0.00065 sec x4.1 (+306%)
YaLinqo [string lambda] 0.00070 sec x4.4 (+337%)
Ginq 0.00105 sec x6.6 (+556%)
Ginq [property path] 0.00194 sec x12.1 (+1112%)
Pinq 1.21249 sec x7,577.5 (+757648%)
Aggregating arrays
------------------
PHP [for] 0.00059 sec x1.0 (100%)
PHP [array functions] 0.00193 sec x3.3 (+227%)
YaLinqo 0.00475 sec x8.1 (+705%)
YaLinqo [string lambda] 0.00515 sec x8.7 (+773%)
Ginq 0.00669 sec x11.3 (+1034%)
Ginq [property path] 0.03955 sec x67.0 (+6603%)
Pinq 0.03226 sec x54.7 (+5368%)
Aggregating arrays custom
-------------------------
PHP 0.00007 sec x1.0 (100%)
YaLinqo 0.00046 sec x6.6 (+557%)
YaLinqo [string lambda] 0.00057 sec x8.1 (+714%)
Ginq 0.00046 sec x6.6 (+557%)
Pinq 0.00610 sec x87.1 (+8615%)
Process data from ReadMe example
--------------------------------
PHP 0.00620 sec x1.0 (100%)
YaLinqo 0.02840 sec x4.6 (+358%)
YaLinqo [string lambda] 0.02920 sec x4.7 (+371%)
Ginq 0.07720 sec x12.5 (+1145%)
Pinq 2.71616 sec x438.1 (+43707%)
Conclusion
If you need to perform queries on relatively small sets of data, for example returned from web-services, you can use either YaLinqo or Ginq.
YaLinqo has better performance, has more functions, has much better documentation. It is a minimalistic library which relies on modern PHP features. It supports both anonymous functions and string lambdas (in all varieties). It does not contain any classes besides a wrapper around an iteraror and relies on good old PHP arrays, so it is easy to learn.
Ginq uses multiple classes of iterators, collections and comparers. Thanks to this, it closer resembles LINQ from .NET. However, it comes with a price. Unlike in .NET, custom dictionaries implemented in PHP will be much slower than native arrays. Public classes of iterators, on the other hand, are alien for .NET developers, but PHP developers using SPL are used to seeing them. And they come with a price too — iterating using an SPL iterator is much slower than yield
. Overall, Ginq is 1.5—3 times slower than YaLinqo.
Pinq is unbelievably slow. No amount of architecture can justify slowing application down 6000 times because of a simple query. The library has a pretty website, a unique feature of supporting databases, a complex architecture, it is version 3 already, so I am very sad to come to conclusion that the library is absolutely unusable. I hope the developer improves the performance and implements at least one full-featured query provider. When it is done, the library may become the library of choice when LINQ to database is needed.
Another library to consider is Underscore.php. It is not LINQ, it is not lazy, but it follows the same functional idea and its methods may look familiar if you have used functional languages or various Underscore.* libraries in other languages.
Other libraries
I have written an extensive article in Russian which compares old "LINQ" libraries: LINQ for PHP, Phinq, PHPLinq and Plinq. However, I cannot recommend using any of them. They are incomplete, untested, undocumented, and above all, they are not LINQ — none of them support lazy evaluation. Discussing them in detail would be a waste of time in the presence of the newer libraries.
The only library among those which is worth mentioning is PHPLinq. It supports querying databases, in fact a lot of them. However, you should consider that the library is almost untested, the order of function calls is fixed (it is more like DAL for generating SQL), single
and first
are considered the same etc. I would never use code like this in production, but you can decide yourself.
Licenses
- YaLinqoPerf — WTFPL* License
- YaLinqo — Simplified BSD License
- Ginq — MIT License
- Pinq — MIT License + BSD 3-clause License (dependencies)
History
- 2015-05-30: first version