1. Introduction
1.1 A New Sorting Algorithm
One of the most common and fundamental problems in computer science is ordering a list of items which is well known as ‘Sorting’. Considering information processing performed by millions of computers along the whole world, ‘Sorting’ takes a very important role.
As it gained its importance in real world problems, it also gets the attention of computer scientists. As a matter of fact, various types of sorting algorithms have been developed. Some well known sorting algorithms are Bubble sort, Heap sort, Insertion sort, Merge sort, Quick sort, Selection sort, Shell sort, etc. [1-4]. An implementer uses a certain algorithm depending on the characteristics of distribution of the data elements or on some other context.
My algorithm, however, can provide optimized performance in a context where it is known that the data elements are distributed in an order that, there exists a considerable low amount of sorted subsets. Computer applications, that contain data with the characteristic like this and need to be sorted, can be highly benefited using ‘Squeeze Sort’ algorithm.
1.2 The Concept
The concept beneath the algorithm is, say, given a global data array as d [0: n-1] with n number of unsorted data and another global data array as s [0: n-1], which initially doesn't contain any data elements. On each call of the algorithm with one argument ‘t’ (where t<=n), it basically does two things.
First, it forms a sorted sequence p [0…k-1] (where k<=t<=n) from d [0: t-1]. Thus, the unsorted sequence d [0: t-1] is compressed as a new unsorted sequence as d [0: t-k-1]. As the name ‘Squeeze Sort’ implies, the amount of data elements keep getting Squeezed.
Second, the sorted sequence p [0: k-1] is then merged to an existing sorted sequence s [0: j-1] (where j<=n), and thus a new sorted sequence will be formed as s [0: j+k-1].
The algorithm will be recursively called until the unsorted sequence becomes (conceptually) empty. Finally there is a sorted sequence s [0: n-1], which is actually the desired sorted data.
2. Understanding the Detail
We have two global arrays named data [0: n-1] and sorted [0: n-1], where n indicates the number of elements to be sorted. We have another 2 global data named sorted_so_far
and unsorted_so_far
which indicate the number of elements sorted so far and the number of elements unsorted so far respectively.
Initially, data [0: n-1] contain n unsorted elements, whereas sorted [0: n-1] is conceptually empty. My algorithm will finally keep the desired sorted data into sorted [0: n-1].
The algorithm Squeeze_Sort
has an argument named ‘size
’, which indicates the number of elements to be sorted.
Here is the algorithm:
n, sorted_so_far=0,unsorted_so_far=n;
data[n];sorted[n];
merge (start, end) {
h=0;j=start;i=unsorted_so_far;
mid=sorted_so_far-1;
while(h<=mid) {
if(sorted[h]>=sorted[j])
data[i++]=sorted[h++];
else {
data[i++]=sorted[j++];
if(j==n)j=mid+1;
if(j==end) break;
} }
if(h>mid) for(k=j;;) {
data[i++]=sorted[k++]; if(k==n)k=mid+1;
if(k==end)break;
} else for( k=h; k<=mid;)
data[i++]=sorted[k++];
sorted_so_far=0;
for(k=unsorted_so_far<n; k++)
sorted[sorted_so_far++]=data[k];
}
Squeeze_Sort(size)
{
start=n; end=sorted_so_far;
max=data[0];min=data[0];
unsorted_so_far=0;
for(i=0;i<size; i++)
if(data[i] >= max )
sorted[--start]=max=data[i];
else if (data[i] < min)
sorted[end++]=min=data[i];
else
data[unsorted_so_far++]=data[i];
merge(start, end);
if (unsorted_so_far>0)
Squeeze_Sort(unsorted_so_far);
}
2.1 Compressing the Unsorted Data
The first portion of the algorithm checks consecutive maximum and minimum with respect to the first element of data [0: size-1] (i, e, data [0]) and stores the result in a sorted sequential manner. This results in compression of data as data [0:unsorted_so_far-1] with unsorted data and two array items sorted [start: n-1] and sorted [sorted_so_far: end-1] together, consecutively form a sorted sequence.
2.2 Extending the Number of Sorted Elements
The second portion of the algorithm merges the newly found sorted sequence (sorted [start: n-1] and sorted [sorted_so_far: end-1] together consecutively) with the existing sorted array item sorted [0: sorted_so_far], thus increasing the number of sorted elements.
2.3 Recursion Until All the Elements are Sorted
The algorithm Squeeze_Sort
will be recursively called until the number of unsorted elements becomes not greater than zero.
3. Simulation
The data blocks below show the simulation process of the sorting, as described in this section.
Initial Data:
data [0:unsorted_so_far]: 2, 40, 9, 100, 7, 22, 1, 8, 11, 102, 54, 53, 5, 71, 79, 66, 3
sorted: <empty>
First Execution of Squeeze Sort:
sorted [start: end]: 1, [2], 40, 100, 102
sorted [0:sorted_so_far]: 1, 2, 40, 100, 102
data [0:unsorted_so_far]: 9, 7, 22, 8, 11, 54, 53, 5, 71, 79, 66, 3
Second Execution of Squeeze Sort:
sorted [start: end]: 3, 5, 7, [9], 22, 54, 71, 79
sorted [0:sorted_so_far]: 1, 2, 3, 5, 7, 9, 22, 40, 54, 71, 79, 100, 102
data [0:unsorted_so_far]: 8,11,53,66
Finally:
sorted [start: end]: [8], 11, 53, 66
sorted [0:sorted_so_far]: 1, 2, 3, 5, 7, 9, 11, 22, 40, 53, 54, 66, 71, 79, 100, 102
data [0:unsorted_so_far] : <empty>
To simulate the execution process of Squeeze Sort simply, let’s go through the example.
We initialize the data [0: n-1] array with some random data.
data [0:16]: {2, 40, 9, 100, 7, 22, 1, 8, 11, 102, 54, 53, 5, 71, 79, 66, 3}
sorted [0:16] :{< empty>}.
When the execution of Squeeze Sort begins, based on the first element in data [0:16], a sorted sequence is generated and stored with respect to the sorted sequence as sorted [start: n-1] and sorted [sorted_so_far: end-1] consecutively. The term start indicates the starting index, whereas the term end indicates the ending tag, of newly found sorted array. The remaining data elements that can’t be involved are stored in data [0:unsorted_so_far] (i.e., the unsorted data getting compressed!). As the name implies, the term sorted_so_far
indicates the number of elements that have been sorted so far and stored in sorted[0: sorted_so_far-1], whereas the term unsorted_so_far
indicates the number of elements that have not been sorted so far and stored in data[0: unsorted_so_far-1].
During the first execution, we get a new sorted sequence as sorted [start: end] = {1, [2], 40, 100, 102} which was generated based on the first element of the previous data [0:unsorted_of_far-1]. This sequence then merged to the existing sorted array sorted [0:sorted_so_far] (which is empty now). As well as, we get the remaining unsorted data in data [0:unsorted_so_far] = {9, 7, 22, 8, 11, 54, 53, 5, 71, 79, 66, 3}.
Since there is some data remaining as unsorted, a recursion of Squeeze Sort will be happen.
During the second execution of Squeeze Sort, we get the further compressed data as data [0:unsorted_so_far] = {8, 11, 53, 66} and a more sorted sequence as sorted [0:sorted_so_far] = {1, 2, 3, 5, 7, 9, 22, 40, 54, 71, 79, 100, 102}.
The third execution of the Squeeze Sort gives us the desired result as sorted [0:16] = {1, 2, 3, 5, 7, 9, 11, 22, 40, 53, 54, 66, 71, 79, 100, 102}.
4. Complexity
4.1 Time Complexity
As usual, the time complexity of the Squeeze Sort depends on the characteristics of the distribution of the data element. However, the actual time complexity of this algorithm is proportional to the number of times its recursion occurred. Without considering the recursion, the general time complexity is O (n). So, it is clear to say that, considering recursion, if the number of recursion is k, then the time complexity can be defined as T (n) = k.O (n).
As the value of k can be decreased, more performance can be achieved.
So, the best case time complexity can be achieved (as shown in the Figure 1), if k=1.
That is, T (n) =1.O (n).
We have to consider the worst case as well.
The data set, containing the characteristics, as shown in Figure 2, will produce a worst case time complexity. For example, if the data elements are distributed as the manner below,
data [0: n-1] = {1, 100, 2, 99, 3, 98, 4, 97 …}, then for each time the algorithm is executed (considering recursion), the data array will be compressed into only 2 elements. Then, the complexity will be,
T (n) = 2 + 4 + …………+ (n-4) + (n-2) + n
=O (n²).
Figure 1: A situation that depicts the best case complexity O (n) of Squeeze Sort
Figure 2: A situation that depicts the worse case complexity O (n²) of Squeeze Sort
Comparing with two well known algorithms, we can get a rough estimate about the time complexity of Squeeze Sort. In the table below, I compared the time complexity of Quick Sort (average case time complexity is O (nlogn)) [1-4] and Bubble Sort (average case time complexity is O (n²)) [2] with Squeeze Sort.
Table: Comparative Results (Time Complexity)
Data Size (000) |
Quick Sort |
Squeeze Sort |
Bubble Sort |
30 |
.054945 |
.21978 |
3.901099 |
35 |
.054945 |
.32967 |
5.10989 |
70 |
.10989 |
.604396 |
9.450549 |
77 |
.10989 |
.659341 |
10.384615 |
105 |
.164835 |
.879121 |
13.846154 |
112 |
.164835 |
.989011 |
14.835165 |
154 |
.21978 |
1.208791 |
20.27475 |
200 |
.274725 |
1.273626 |
28.813188 |
4.2 Space Complexity
Since Squeeze Sort requires an extra array with the same size of the data array, 2n memory space is needed, so the space complexity is S (n) =O (n).
5. Conclusion
Although the sorting problem is old enough, it still retains the attention of the researchers. According to the various characteristics of the distribution of the real world data element and different situations that are faced by the programmers, more effective and creative ideas considering sorting usually provide a broader way.
I hope, researchers in this area will find my algorithm ‘Squeeze Sort’ useful to expand their better and creative ideas.
Reference
[1] Horowitz E. and Sahni S., “Fundamentals of Computer Algorithms”, Galgotia Pubs Ltd., New Delhi, 1995.
[2] Edward M. Reingold, “Data Structures”: Page 384, Little Brown and Company, Boston, 1999.
[3] Robert L. Kruse, “Data Structures and Program Design in C”, Prentice-Hall, Inc., 1991.
[4] Thomas H. Cormen, “Introduction to Algorithms”, MIT Press, 1990.
History
- 18th March, 2006: Initial post