InsertionSort performs better than QuickSort on almost In-Order Array and examines the caveats to be aware of when utilizing InsertionSort.
This short writeup is a tip that demonstrates performance of insertion sort over quick sort on sorting almost in-order array. As such, it is not eligible to participate in the CodeProject monthly article competition.
Table of Contents
Performance Characteristics
The average time complexity of BubbleSort
, SelectionSort
and InsertionSort
is O(N2) and QuickSort
is O(N.Log N). When it comes to sorting almost sorted array, QuickSort
's performance is degraded to O(N2) and InsertionSort
's runtime complexity is promoted to O(N). Below is a list of time complexity where the best to the worst is ranked from left to right. O(N2) is the most undesirable algorithmic runtime. This reason led to C++17's Standard Library to implement std::sort
based on Introsort. Unlike QuickSort
, Introsort
has a consistent runtime performance of O(N.Log N) but it still loses out to InsertionSort
's O(N) on sorting almost in-order container.
O(1) < O(log N) < O(N) < O(N.log N) < O(N^2)
The use case for InsertionSort
is an array in your application is required to be maintained in-order and from time to time, a new element is appended to it. A perfect example is the function to calculate median from a sorted array of elements. As an optimization, this array does not need to be sorted whenever an element is appended. Sorting is only required during median retrieval and new elements have been appended since the last retrieval. Obviously, the situation of too many unsorted element addition turning InsertionSort
runtime from O(N) back to O(N2) must be avoided at all costs. Set a threshold when a certain ratio of appended elements to array size is reached, the InsertionSort
must be triggered.
The BubbleSort
, SelectionSort
and InsertionSort
routines used in this tip are listed as follows:
template<typename T, typename Predicate>
void BubbleSort(T* arr, int size, Predicate compare)
{
for (int i = size - 1; i > 0; --i)
{
for (int j = 0; j < i; ++j)
{
if (compare(arr[j + 1], arr[j]))
{
T temp = arr[j];
arr[j] = arr[j + 1];
arr[j + 1] = temp;
}
}
}
}
template<typename T, typename Predicate>
void SelectionSort(T* arr, int size, Predicate compare)
{
for (int i = 0; i < size; ++i)
{
int minIndex = i;
for (int j = i + 1; j < size; ++j)
{
if (compare(arr[j], arr[minIndex]))
minIndex = j;
}
if (i != minIndex)
{
T temp = arr[i];
arr[i] = arr[minIndex];
arr[minIndex] = temp;
}
}
}
template<typename T, typename Predicate>
void InsertionSort(T* arr, int size, Predicate compare)
{
for (int i = 1; i < size; ++i)
{
T temp = arr[i];
int j = i - 1;
while (j > -1 && compare(temp, arr[j]))
{
arr[j + 1] = arr[j];
arr[j] = temp;
--j;
}
}
}
template<typename T, typename Predicate>
void InsertionSort2(T* arr, int size, Predicate compare)
{
for (int i = 1; i < size; ++i)
{
T temp = arr[i];
int j = i;
while (j > 0 && compare(temp, arr[j - 1]))
{
arr[j] = arr[j - 1];
--j;
}
arr[j] = temp;
}
}
Benchmark Code
The benchmark setup is as follows. The source vector, vec
, contains 100000
elements which are initialized from 1
to 100000
. And 5
elements are removed and appended to the vector end so as to create an almost in-order array.
const size_t VEC_SIZE = 100000;
std::vector<int> vec(VEC_SIZE);
std::iota(vec.begin(), vec.end(), 1);
vec.erase(std::remove(vec.begin(), vec.end(), 50000), vec.end());
vec.erase(std::remove(vec.begin(), vec.end(), 89), vec.end());
vec.erase(std::remove(vec.begin(), vec.end(), 14568), vec.end());
vec.erase(std::remove(vec.begin(), vec.end(), 78962), vec.end());
vec.erase(std::remove(vec.begin(), vec.end(), 60022), vec.end());
vec.push_back(50000);
vec.push_back(89);
vec.push_back(14568);
vec.push_back(78962);
vec.push_back(60022);
std::vector<int> temp(VEC_SIZE);
The benchmark code is as follows. At each iteration, vec
is assigned to the temp
vector and the sorting is done on the temp
vector. Each benchmark sorts temp
in ascending order.
Note: The assert
macro expands to nothing in release build so they do not negatively affect the release mode result. As a note of interest, I tried to add my home-made QuickSort
to the benchmark but it resulted in stack overflow due to its recursive calls. My guess is the C++ Standard Library's quick sort routines are iterative in nature because they do not have the stack overflow problem.
timer cStopWatch;
cStopWatch.start("C qsort Benchmark");
for (size_t k = 0; k < MAX_LOOP; ++k)
{
temp = vec;
assert(std::is_sorted(temp.cbegin(), temp.cend()) == false);
qsort((void*)temp.data(), temp.size(), sizeof(temp[0]), comparator);
assert(std::is_sorted(temp.cbegin(), temp.cend()));
}
cStopWatch.stop();
timer cppStopWatch;
cppStopWatch.start("C++ sort Benchmark");
for (size_t k = 0; k < MAX_LOOP; ++k)
{
temp = vec;
assert(std::is_sorted(temp.cbegin(), temp.cend()) == false);
std::sort(temp.begin(), temp.end());
assert(std::is_sorted(temp.cbegin(), temp.cend()));
}
cppStopWatch.stop();
timer cppParStopWatch;
cppParStopWatch.start("C++ par sort Benchmark");
for (size_t k = 0; k < MAX_LOOP; ++k)
{
temp = vec;
assert(std::is_sorted(temp.cbegin(), temp.cend()) == false);
std::sort(std::execution::par, temp.begin(), temp.end());
assert(std::is_sorted(temp.cbegin(), temp.cend()));
}
cppParStopWatch.stop();
timer insertionSortStopWatch;
insertionSortStopWatch.start("Insertion Sort Benchmark");
for (size_t k = 0; k < MAX_LOOP; ++k)
{
temp = vec;
assert(std::is_sorted(temp.cbegin(), temp.cend()) == false);
InsertionSort(temp.data(), (int)temp.size(), [](int a, int b) {
return a < b;
});
assert(std::is_sorted(temp.cbegin(), temp.cend()));
}
insertionSortStopWatch.stop();
timer bubbleSortStopWatch;
bubbleSortStopWatch.start("Bubble Sort Benchmark");
for (size_t k = 0; k < MAX_LOOP; ++k)
{
temp = vec;
assert(std::is_sorted(temp.cbegin(), temp.cend()) == false);
BubbleSort(temp.data(), (int)temp.size(), [](int a, int b) {
return a < b;
});
assert(std::is_sorted(temp.cbegin(), temp.cend()));
}
bubbleSortStopWatch.stop();
timer selectionSortStopWatch;
selectionSortStopWatch.start("Selection Sort Benchmark");
for (size_t k = 0; k < MAX_LOOP; ++k)
{
temp = vec;
assert(std::is_sorted(temp.cbegin(), temp.cend()) == false);
SelectionSort(temp.data(), (int)temp.size(), [](int a, int b) {
return a < b;
});
assert(std::is_sorted(temp.cbegin(), temp.cend()));
}
selectionSortStopWatch.stop();
Benchmark Results
This is the benchmark result on 1000
iterations of sorting of arrays of 100000
elements. As you can see, InsertionSort
performs even better than parallel std::sort
shown in Visual C++ results. VC++'s qsort
and std::sort
could be implemented using QuickSort
and IntroSort
. Remember, on almost sorted array, QuickSort
has the worst performance of O(N2) while IntroSort
is O(N.Log N) and InsertionSort
has the best performance of O(N). The C++ Standard Library on Ubuntu 2022 has not yet implemented the parallelized version of std::sort
as the serial and parallel results are largely in the same ballpark. The BubbleSort
and SelectionSort
benchmark are commented out because they took too long to complete. The code is benchmarked on Intel i7 8700 with 6 physical cores and 12 logical cores and 16GB memory.
VC++ 2022
C qsort Benchmark: 7881ms
C++ sort Benchmark: 1105ms
C++ par sort Benchmark: 418ms
Insertion Sort Benchmark: 279ms
Insertion Sort Optimized Benchmark: 206ms
gcc version 11.4.0
C qsort Benchmark: 2394ms
C++ sort Benchmark: 2782ms
C++ par sort Benchmark: 2884ms
Insertion Sort Benchmark: 451ms
Insertion Sort Optimized Benchmark: 354ms
clang version 14.0.0
C qsort Benchmark: 2573ms
C++ sort Benchmark: 2774ms
C++ par sort Benchmark: 2513ms
Insertion Sort Benchmark: 383ms
Insertion Sort Optimized Benchmark: 351ms
To give the reader the perspective of magnitude of time complexity difference of BubbleSort
and SelectionSort
's O(N2) versus InsertionSort
's O(N). I ran the 10 iterations instead of 1000 iterations. Below are the results:
VC++ 2022
C qsort Benchmark: 83ms
C++ sort Benchmark: 12ms
C++ par sort Benchmark: 4ms
Insertion Sort Benchmark: 2ms
Bubble Sort Benchmark:26641ms
Selection Sort Benchmark:39899ms
gcc version 11.4.0
C qsort Benchmark: 74ms
C++ sort Benchmark: 31ms
C++ par sort Benchmark: 28ms
Insertion Sort Benchmark: 3ms
Bubble Sort Benchmark:46323ms
Selection Sort Benchmark:26222ms
clang version 14.0.0
C qsort Benchmark: 75ms
C++ sort Benchmark: 27ms
C++ par sort Benchmark: 28ms
Insertion Sort Benchmark: 3ms
Bubble Sort Benchmark:17038ms
Selection Sort Benchmark:39072ms
The GCC and clang
command to build BenchmarkInsertionSort.cpp is as follows:
g++ BenchmarkInsertionSort.cpp -O3 -std=c++17
clang++ BenchmarkInsertionSort.cpp -O3 -std=c++17
References
History
- 31th March, 2024: Added the Optimized Insertion Sort by Rehorav to the benchmark results and the source code downloads.
- 18th February, 2024: First release