Table of Contents
Primary Motivation
My Github repository, Boost Concurrent Permutations and Combinations on CPU, has been up since January 2017. However, download is close to none while my older single threaded next_combination
has plenty of downloads. So I figured it must be the code usage is deemed too complex. Aim of this article is to show a very simple, albeit inefficient, multithreaded example to give users a guide. It should be relatively easy for user to come up with an efficient multithreaded code, once they understand the examples in this article. For each permutation and combination section, I am going to start off with a single threaded example then followed by a multi-threaded example.
Single Threaded Permutation
First, I'll show you the example on using next_permutation
in single threaded scenario. The while
loop displays std_permuted
until next_permutation
returned false
when std_permuted
is detected to be in descending order. That is "54321
".
void usage_of_next_perm()
{
std::string std_permuted = "12345";
do
{
std::cout << std_permuted << std::endl;
} while (std::next_permutation(std_permuted.begin(), std_permuted.end()));
}
Output is as follows:
12345
12354
12435
12453
12534
12543
...
54123
54132
54213
54231
54312
54321
Multi-Threaded Permutation
Next, I'll show you how to find permutation in multithreaded scenario. There are a total of 120 different permutations in collection of 5 elements. We'll use 4 threads in my example, meaning each thread will find 120/4=30 permutations. To tell you the truth, each thread will find 30-1=29 permutations, because the 1st one is already computed by find_perm_by_idx
in the main thread and passed to the thread. To keep the discussion simple, we pretend each thread computes 30 permutations. 1st thread finds 0th to 29th permutations, while 2nd thread starts from 30th to 59th permutations, and 3rd thread begins from 60th to 89 and so on. From 2nd thread and 3rd thread to start from 30th and 60th permutation respectively, we need to make use of find_perm_by_idx
to find 30th and 60th permutation given their index, we'll find the rest with next_permutation
. Technically, it is possible to use find_perm_by_idx
to find all permutations by giving indices from 0 to 119 but it is simply not fleasible to do so, because find_perm_by_idx
is much much slower than next_permutation
. As a word of caution, index_to_find
must be of type that is large enough to store the factorial(n). For instance, you are just finding the 1st 10 permutations, index_to_find
type can store 0 to 10 but if it cannot store the factorial(n), find_perm_by_idx
will not generate the permutation correctly. This applies to find_comb_by_idx
as well. If your largest integer type is not large enough, you can consider using Boost Multiprecision Integer. And vector_type
can be any collection type that supports push_back()
and size()
methods, like std::string
. You can see I use std::string
in my example.
I join my theads before starting the next thread because all_results
is shared with all threads and it is not guarded with a lock, therefore not thread-safe. It is only meant to be a simple example on how to find_perm_by_idx
. You can see I call the slow find_perm_by_idx
in the main thread. It is your job to try to figure out how to call find_perm_by_idx
in the worker threads.
template<typename int_type, typename vector_type>
vector_type find_perm_by_idx(int_type index_to_find,
vector_type& original_vector);
void usage_of_perm_by_idx()
{
uint64_t index_to_find = 0;
std::string original_text = "12345";
std::string std_permuted = "12345";
std::vector<std::string> all_results;
size_t repeat_times = 30;
for (size_t i=0; i<4; ++i)
{
std::string permuted = concurrent_perm::find_perm_by_idx(index_to_find, original_text);
std::thread th([permuted, &all_results, repeat_times] () mutable {
all_results.push_back(permuted);
for (size_t j=1; j<repeat_times; ++j)
{
std::next_permutation(permuted.begin(), permuted.end());
all_results.push_back(permuted);
}
});
th.join();
index_to_find += repeat_times;
}
}
I skip the output here, I have verified they are the same as generated by single threaded next_permutation
.
Single Threaded Combination
Next, we have come to next_combination
section. To compute the total combination count, we use compute_total_comb
. total
must be of type that is large enough to store the factorial(n).
void usage_of_next_comb()
{
std::string original_text = "123456";
std::string std_combined = "123";
uint64_t total = 0;
if(concurrent_comb::compute_total_comb(original_text.size(), std_combined.size(), total))
{
std::cout << std_combined << std::endl;
for (uint64_t i = 1; i < total; ++i)
{
stdcomb::next_combination(original_text.begin(), original_text.end(),
std_combined.begin(), std_combined.end());
std::cout << std_combined << std::endl;
}
}
}
Output is as follows. If your collection is sorted like original_text
, all generated combination will be sorted as well, as you can see below:
123
124
125
126
134
135
136
145
146
156
234
235
236
245
246
256
345
346
356
456
Multi-Threaded Combination
index_to_find
must be of type that is large enough to store the factorial(n) and vector_type
can be any collection type that supports push_back()
and size()
methods, like std::string
. Similar to find_perm_by_idx
mentioned above, it is possible to find all combinations with find_comb_by_idx
but it is much slower compared to next_combination
, so we resort to next_combination
to find the rest of combinations. There are total of 20 combinations, so each of the 4 thread will find 20/4=5 combinations. Actually, each thread will find 5-1=4 combination, because the 1st one is already computed by find_comb_by_idx
in the main thread and passed to the thread.
template<typename int_type, typename vector_type>
vector_type find_comb_by_idx(const uint32_t subset,
int_type index_to_find,
vector_type& original_vector);
void usage_of_comb_by_idx()
{
uint64_t index_to_find = 0;
std::string original_text = "123456";
std::string std_combined = "123";
std::vector<std::string> all_results;
size_t repeat_times = 5;
for (size_t i = 0; i < 4; ++i)
{
std::string combined = concurrent_comb::find_comb_by_idx(std_combined.size(),
index_to_find, original_text);
std::thread th([original_text, combined, &all_results, repeat_times] () mutable {
all_results.push_back(combined);
for (size_t j = 1; j < repeat_times; ++j)
{
stdcomb::next_combination(original_text.begin(), original_text.end(),
combined.begin(), combined.end());
all_results.push_back(combined);
}
});
th.join();
index_to_find += repeat_times;
}
}
Output is skipped here, afterall I have verified they are the same as generated by single threaded next_combination
.