|
I have a MASM routine that generates assembly code to implement a Quicksort. All is working, including support for embedded keys in records, support for signed/unsigned, big-endian/little endian, keys of BYTES/WORDS/DWORDS/FLOATS/DOUBLES/EXTENDED_FLOATING, null terminated strings for BYTES or WORDS (element count=0), multiple elements for any data type (counts from 1-2048), multiple data types for a secondary key including its use as a tie-breaker for equal primary keys to make the QuickSort Stable. I will add support for MMX and XMM keys, but the current quest is to add the capability to declare Ascending or Descending in any mixture for either the primary key or for the secondary key (one can be ascending, the other descending, or both the same - either ascending or descending). I could just invert the conditional transfers and keep the comparison order the same, or invert the comparison order and keep the conditional transfers the same, or finally, just swap the indexes at entry and restore them at exit (XCHG) and keep all of the rest of the extensive code the same.
Note, the source code is extensive, but the macro invocation conditionally assembles only what is required and there is not a lot of checking what should be done next. Inserting the XCHG's for each comparison would add code (small) and execution cycles (small), but the compare logic is the highest use code of the whole sort. Conditionally assembling a version for ascending order and a different version for descending order is basically a copy and paste with an exchange of either the indices (and has no extra code or cycles), or a change of the conditional transfers. It seems that maintaining the two copies in parallel might be dangerous - fixes to one set of code would have to always be applied to the other set, and we all know how often that strategy fails - duplicate versions aren't!
So the question is, should I swap the indices and which way should I do this - swap the indices or swap the comparison order, or should I create two versions, one for ascending and the other for descending?
Please feel free to jump in with both feet and kick up a little dust, but this is a Quicksort so I don't want to hear about C++ or C# and overloaded operators with all of the bloat that goes along with HLL.
Hmm, I just thought of a way to allow both the primary key and the secondary key to have null terminated strings. The lowest offset of ether key has to end at the higher offset, and the highest offset has to end at the end of the record. More goodie to add, but carefully since the secondary key logic was a copy of the primary key logic, with different parameters, and I already stripped out the support for a null terminated string (more than one line). Remember that bit about duplicate versions that aren't? Oh, well!
Dave.
|
|
|
|
|
Hi,
Wow. Quite a question.
not sure why you are doing all this in assembly, but whatever the language, this would be my approach:
for each comparison, calculate a difference, which is negative if object1<object2, zero if equal, and positive-non-zero when object1>object2
then multiply that difference by a factor which is +1 for ascending and -1 for descending. The result indicates precede (negative), don't know (zero), or follow (positive-non-zero).
When you have multiple sort criteria, handle the primary one first, if that results in "don't know", continue with the next criterion.
[EDITED because of HTML eater]
Luc Pattyn [Forum Guidelines] [My Articles]
- before you ask a question here, search CodeProject, then Google
- the quality and detail of your question reflects on the effectiveness of the help you are likely to get
- use the code block button (PRE tags) to preserve formatting when showing multi-line code snippets
modified on Tuesday, April 7, 2009 9:14 PM
|
|
|
|
|
Luc,
Thank you for the reply.
And I was worried about adding two XCHGs, and you want me to put in a MUL? OMG!
Actually, you can get the same effect for descending by just changing the ordering of the compare arguments, i.e. compare object2 to object1. This is exactly what C++ does for an IF, it keeps your comparison code (i.e. >=, generates a jae or jge depending on sign) but reverses the compared values and skips around the body of the IF if it is not >=. This, however, leaves the code much different than the code for an ascending sort, somewhat harder to maintain and enhance. That was why I was thinking about XCHG'ing the indices and keeping the same code for both sort types.
Another difficulty with that approach is that floating compares do not affect the usual flags, only the flags in the FPU. You have to extract the FPU flags and make special compares to effect a ja or a jb, and the tests are of a TEST nature with a je or jne, not a ja or jb. I have all of this working, and I am now testing the tie-breaker/secondary_key code.
Dave Augustine.
|
|
|
|
|
if you don't want to multiply by +1/-1 you can conditionally negate the difference, that's bound to be cheaper than two conditional XCHG. However a single MUL is probably the fastest on any Pentium or better thanks to having separate integer units and needing no unpredictable change of program flow.
An alternative is self-modifying code where you execute either a NOP or a NEG on diff.
Luc Pattyn [Forum Guidelines] [My Articles]
- before you ask a question here, search CodeProject, then Google
- the quality and detail of your question reflects on the effectiveness of the help you are likely to get
- use the code block button (PRE tags) to preserve formatting when showing multi-line code snippets
|
|
|
|
|
Luc,
Thank you again. When I get done, I will post (if not too big) or send you some zips of an ascending and a descending DWORD sort and the same for float - just the actual quick sort code.
Dave.
|
|
|
|
|
Luc,
I have completed development on my Quicksort. The following is the timing for the sort using just an array of DWORDS. The rest of the tests work as planed, ascending/descending, signed/unsigned, big-endian/little-endian, two different sort keys allowed, BYTES/WORDS/DWORDS/Floats/Doubles/extended_floating, BYTE or WORD null terminated strings, up to 2048 elements per key, fully automated test and tracing environment to verify results.
Do you have a feel for what a good timing benchmark might be for this size of a sort?
The following timings are only for the actual sorting, not the setup. The data
is prepared, start time taken, sorted, end time taken, time difference is
accumulated, loop back 1023 more times.
The random data originally was created by randomly selecting a number from the
set, saving the number, shrinking the set, and recalculating a new random number
on a decremented range. This was taking too long to shrink up a 1024*1024 DWORD
buffer for each extraction. The method was changed to just create a random
number in the range of 0 to ((1024*1024)-1), save the random number, decrement
the range, and calculate another random number for the decremented range. Since
there are potentially many duplicates, and probably not all numbers are present,
the checking was changed from comparing the sorted array against a known
ascending or descending array, to individually comparing each DWORD pair
(comparing element n to element n+1) and verifying that it is not above (if an
ascending sort) or not below (if a descending sort).
Note: The sort completed with only a 4096 byte stack, sorting 1024*1024 DWORDS.
Building Ascending data.
Building Descending data.
1024*1024 Ascending DWORDS sorted 1024 times Ascending, time in msec: 37144, good compare. 36.27 msec/iteration.
1024*1024 Ascending DWORDS sorted 1024 times Descending, time in msec: 46130, good compare. 45.05 msec/iteration.
1024*1024 Descending DWORDS sorted 1024 times Ascending, time in msec: 47689, good compare. 45.57 msec/iteration.
1024*1024 Descending DWORDS sorted 1024 times Descending, time in msec: 53870, good compare. 52.61 msec/iteration.
Building Random data.
1024*1024 Random DWORDS sorted 1024 times Ascending, time in msec: 158389, good compare. 154.67 msec/iteration.
1024*1024 Random DWORDS sorted 1024 times Descending, time in msec: 141898, good compare. 138.57 msec/iteration.
I do have code samples (.LST output) if you are interested.
Dave.
|
|
|
|
|
I don't fully understand your test cases, however here are two questions you can ask yourself:
1. why is there a big difference between "Ascending DWORDS sorted 1024 times Ascending" and "Descending DWORDS sorted 1024 times Descending"?
2. how does the timing compare with a regular quicksort as is offered in some libraries?
|
|
|
|
|
Luc,
What's with the membership change? After 5 years and 1,000,000 posts do they take away your Gold Membership? You must have been a very naughty boy.
The timing differences, I believe, are just normal Windows getting in the way of everything. I mean, my development machine is not connected to the internet, it is only doing one thing (executing my program), so why is there all this activity as indicated by the Task Manager? Paging with 4 million bytes involved probably plays a part in it also.
I will execute the two tests again with tracing enabled and pick some shorter tests (my standard testing is the character set A-Z for easy visual checking of results). The tracing will report the number of compares and exchanges, as well as report all of the exchanges as they occur. My algo starts by swapping the middle element for the right end (the pivot) and then finally determines that there is nothing to do (because it is already in order) so it just re-swaps the middle and end, but then it sorts the front half and then sorts the back half, both of which are in order so the same thing happens again twice, then 4 times... If the sort has to reverse the order (ascending data sorted descending), then after the first swap to put the middle at the end, it will end up swapping end elements, working to the middle (since everything is on the wrong side of the pivot), and ends up with a re-swap of the middle and the end.
Dave.
|
|
|
|
|
Hi Dave,
Don't worry, I'm using a new account to avoid all the crap gold members get to digest on the home page.
And its not 1 million posts, even CG does not reach 100K!
If all numbers are ascending and that is what you don't want, everything gets swapped.
If all numbers are descending and that is what you don't want, everything gets swapped.
I expect these two test cases to have the same speed. I understood they did.
If all numbers are ascending and that is what you want, no swaps occur.
If all numbers are descending and that is what you want, no swaps occur.
I expect these two test cases to have the same speed. I understood they did not, they were quite different, and by much more than some Windows unpredictability could explain.
Member 4194593 wrote: all this activity as indicated by the Task Manager? Paging with 4 million bytes involved probably plays a part in it
Huh. What is your hardware? how could 4MB be cause for paging?
If this is a miniature PC, try the same code on a real one!
|
|
|
|
|
Luc,
I found an error in the Descending Descending where I had supplied the ascending
data instead of the descending data, and I corrected that. I also modified the
timing procedure to create all of the data first, then execute the tests, and
then added a loop to run all of the tests 10 times. I added code to call
SetPriorityClass to set the priority to HIGH_PRIORITY_CLASS. I added code to
put the start of each procedure on an aligned 16 location to make sure that the
same code in the ascending and descending routines stayed on the same relative
boundaries to avoid differences in the hardware branch predictions - this would
only cause a 1 or 2 cycle difference in time, but with 1,000,000,000 *
1,000,000,000 compares it will make a difference - note that something as small
as a 26 record sort takes 109 compares and 14 exchanges.
I put the loops for the left scan and the right scan on 16 BYTE boundaries to
aid in the hardware branch prediction.
I modified the code to clean up the floating jump on condition tests (had to
re-order my register usages to free up eax, then debug and re-test).
I modified the test code to add a timing test for floats.
The following are the results from the first two iterations of the tests (almost
the same times as for DWORDS):
uilding Ascending data.
Building Descending data.
Building Random data.
Floating tests DWORD tests
1024*1024 Ascending Floats sorted 1024 times Ascending, time in msec: 32359, good compare. 32068, good compare.
1024*1024 Ascending Floats sorted 1024 times Descending, time in msec: 36407, good compare. 37310, good compare.
1024*1024 Descending Floats sorted 1024 times Ascending, time in msec: 36749, good compare. 37276, good compare.
1024*1024 Descending Floats sorted 1024 times Descending, time in msec: 36674, good compare. 38611, good compare.
1024*1024 Random Floats sorted 1024 times Ascending, time in msec: 130539, good compare. 132599, good compare.
1024*1024 Random Floats sorted 1024 times Descending, time in msec: 131883, good compare. 131187, good compare.
1024*1024 Ascending Floats sorted 1024 times Ascending, time in msec: 32005, good compare. 32005, good compare.
1024*1024 Ascending Floats sorted 1024 times Descending, time in msec: 37254, good compare. 36943, good compare.
1024*1024 Descending Floats sorted 1024 times Ascending, time in msec: 37126, good compare. 35773, good compare.
1024*1024 Descending Floats sorted 1024 times Descending, time in msec: 37446, good compare. 37359, good compare.
1024*1024 Random Floats sorted 1024 times Ascending, time in msec: 130639, good compare. 133531, good compare.
1024*1024 Random Floats sorted 1024 times Descending, time in msec: 130431, good compare. 131447, good compare.
Dave.
|
|
|
|
|
OK, this looks good. The timing anomaly has disappeared. Well done.
|
|
|
|
|
Luc,
It was an interesting search. The only "mistake" was the use of ascending data instead of descending data for the test. The rest of the changes were to avoid the branch prediction problems - keeping all of the tight loops on a 16 byte boundary and keeping the loops down to 16 bytes or less.
Dave.
|
|
|
|
|
Hi all
I would be very happy, if anyone could provide me with a algorithm to rotate a bitmap around it's center. The problem is that it has to run in .NET micro v3.0(otherwise I would have just used that very nice Graphics that exist in full version of .NET) in which they removed the Bitmap.RotateBlt method. Speed is not a(major) concern.
|
|
|
|
|
Hi,
without a method that offers free rotation, you would have to do it yourself, along these lines:
foreach row y2 in destination image
foreach column x2 in destination image
calculate corresponding location (x1,y1) in source image
copy from source image at (x1,y1) to destination image (x2,y2)
}
}
without translation, the rotation formulas would be something like
x1 = x2 * cosa - y2 * sina
y1 = x2 * sina + y2 * cosa
[EDIT: fixed a typo /]
where cosa and sina are the (constant) cosine and sine of the rotation angle.
You would need to add some constant to x1 and y1 to choose the center of rotation.
Luc Pattyn [Forum Guidelines] [My Articles]
- before you ask a question here, search CodeProject, then Google
- the quality and detail of your question reflects on the effectiveness of the help you are likely to get
- use the code block button (PRE tags) to preserve formatting when showing multi-line code snippets
modified on Monday, April 6, 2009 11:25 AM
|
|
|
|
|
Hi thanks for the reply
I'm having trouble making the center of rotation the center of the image. If I choose the constant to add to x1 and y1 to be 32(i'm using 64x64 pixel bitmaps), part of source bitmap is not copied.
What am I doing wrong?
in advance, Thx a bunch
|
|
|
|
|
Hi,
if you want to rotate an image around a point C that is at (CX1,CY1) in the source image and at
(CX2, CY2) in the destination image, then obviously you have to translate the coordinate system in both images like so:
(x1-CX1) = (x2-CX2) * cosa - (y2-CY2) * sina
(y1-CY1) = (x2-CX2) * sina + (y2-CY2) * cosa
Luc Pattyn [Forum Guidelines] [My Articles]
- before you ask a question here, search CodeProject, then Google
- the quality and detail of your question reflects on the effectiveness of the help you are likely to get
- use the code block button (PRE tags) to preserve formatting when showing multi-line code snippets
|
|
|
|
|
Hi,
I'm new to recurrent SOMs. However, I've already created a SOM. In order to make it an RSOM do I simply add a leaky integrator, or are there other aspects I need to change from my original SOM?
Thanks for any help.
|
|
|
|
|
Member 6038196 wrote: I'm new to recurrent SOMs.
I.e. are you new to Turkic currency [^]?
If the Lord God Almighty had consulted me before embarking upon the Creation, I would have recommended something simpler.
-- Alfonso the Wise, 13th Century King of Castile.
This is going on my arrogant assumptions. You may have a superb reason why I'm completely wrong.
-- Iain Clarke
[My articles]
|
|
|
|
|
lol I wish it was that simple. More like [^]
|
|
|
|
|
I posted this on C# and realized later this might be a more appropriate location for it.
Hi,
I have been searching around for a single elimination algorithm / pseudo code for tournament brackets and I haven't found a single site on it. The best I found was Wikipedia explaining single elimination. I have been reading code project articles for a while now and I know there are many experience and very knowledgeable developers here. Since this is fairly simple to code I was hoping if somebody could help me out with the code/pseudo code or an algorithm for it. I would appreciate it a lot. Looking forward to your replies.
|
|
|
|
|
You create a binary tree where the leaves represent the tournament's contestants. Each pair of contestants under an interior node play a match, and the winner then occupies that interior node.
Likewise, an interior node that has two other interior nodes under it represents the winner of the match between the winners corresponding to those two interior nodes.
The root of the binary tree represents the winner of the tournament.
Tree construction: One approach uses the Composite design pattern, an abstract Node class with two derived classes: Contestant and InteriorNode. Start with a list of Contestants. While that list has more than one element, combine two elements under a new InteriorNode, remove those elements from the list, and insert the new InteriorNode in the list. The list should be ordered by increasing depth, so that, for example, all Contestants are paired before any InteriorNodes are paired.
When the list is down to one element, that's the root of the tree.
modified on Friday, April 3, 2009 10:21 AM
|
|
|
|
|
Is there an algorithm that does a upside-down, upside-down tree. i.e, right side up(root on bottom). Thanks.
|
|
|
|
|
Tree algo:
1. Grasp page in both hands.
2. Turn page 180 degrees.
|
|
|
|
|
Very insightful. I see why your an expert now. Would have never figured that one out myself.
Quick question. If said tree isn't on paper but on my computer screen, do I grab the computer screen and flip it 180 degrees or the computer itself.
|
|
|
|
|
Seriously, it wasn't clear what your problem was. Do you just want to draw the tree upside down on the screen? If so, for a display of height H, substitute (H - y) for every y coordinate to flip the tree.
|
|
|
|
|