|
actually, i retested. VC6 is too smart - it was able to tell that i wasn't actually using the swapped numbers for anything and optimized the loop to nothing (verified by looking at the assembly).
here's a better test:
void myswap(short &x, short &y)
{
#if 0
short t;
t=x;
x = y;
y = t;
#else
x^=y^=x^=y;
#endif
}
int _tmain(int argc, TCHAR* argv[], TCHAR* envp[])
{
int nRetCode = 0;
if (!AfxWinInit(::GetModuleHandle(NULL), NULL, ::GetCommandLine(), 0))
{
cerr << _T("Fatal Error: MFC initialization failed") << endl;
nRetCode = 1;
}
else
{
short tx;
short ty;
DWORD dwStart = GetTickCount();
for (short x=0;x>=0;x++)
{
for (short y=0;y>=0;y++)
{
tx = x;
ty = y;
myswap(tx, ty);
if ((tx!=y) && (ty!=x))
{
cerr << tx << "!=" << ty << endl;
exit(0);
}
}
printf("%d\n", x);
}
DWORD dwStop = GetTickCount();
printf("%6.4f\n", ((double)dwStop - (double)dwStart) / 1000.);
}
return nRetCode;
}
in this one, the xor comes in at 37.2 seconds and the traditional way at 51.3 seconds.
interesting note. this is what the assembly looks like for the XOR:
; 22 : #if 0
; 23 : short t;
; 24 : t=x;
; 25 : x = y;
; 26 : y = t;
; 27 : #else
; 28 : x^=y^=x^=y;
00000 8b 4c 24 08 mov ecx, DWORD PTR _y$[esp-4]
00004 8b 44 24 04 mov eax, DWORD PTR _x$[esp-4]
00008 66 8b 11 mov dx, WORD PTR [ecx]
0000b 66 31 10 xor WORD PTR [eax], dx
0000e 66 8b 10 mov dx, WORD PTR [eax]
00011 66 31 11 xor WORD PTR [ecx], dx
00014 66 8b 09 mov cx, WORD PTR [ecx]
00017 66 31 08 xor WORD PTR [eax], cx
and here's the traditional way:
; 22 : #if 1
; 23 : short t;
; 24 : t=x;
; 25 : x = y;
00000 8b 54 24 08 mov edx, DWORD PTR _y$[esp-4]
00004 8b 44 24 04 mov eax, DWORD PTR _x$[esp-4]
00008 56 push esi
00009 66 8b 32 mov si, WORD PTR [edx]
0000c 66 8b 08 mov cx, WORD PTR [eax]
0000f 66 89 30 mov WORD PTR [eax], si
; 26 : y = t;
00012 66 89 0a mov WORD PTR [edx], cx
00015 5e pop esi
-c
Conservative:
One who admires radicals centuries after they're dead.
-- Leo C. Rosten
|
|
|
|
|
Russ Freeman wrote:
you *did* average it out right
yes.
i don't know that anyone here is recommending doing it this way. it's just interesting to see.
-c
Faith is the quality that enables you to eat blackberry jam
on a picnic without looking to see whether the seeds move.
|
|
|
|
|
Its pretty fascinating how much work and fuss this silly subject is causing. I'm using it just as a sig because it distracts people, now I wrote an article about it and see - even more distraction is caused.
I would never ever recomend using this in commercial code. But still its nice to know the technic.
cheers
int x=1, y=5;
x^=y^=x^=y;
<a href="http://www.codeproject.com/useritems/StupidXORTrick.asp" target="_blank">ClickHereForHelp();</a>
|
|
|
|
|
Russ Freeman wrote:
It is dangerous as the outcome is undefined according to the spec. See the C FAQ for details at http://www.eskimo.com/~scs/C-faq/q3.1.html[^]
What is undefined?
x^=y^=x^=y; is most definatly defined. In C++ all operators follow as strick precedence which includes associativity.
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vclang/html/_pluslang_c.2b2b_.operators.asp
Assignment opperators are of lower precedence than xor and are right to left associative. So
a^=b^=c^=d; is always equivalent to
c = c ^d;
d = b ^ c;
a = a ^ b;
However in this case the associativity of assignment doesn't matter as either way.
x^=y^=x^=y; is equivalent to
x = x ^ y;
y = y ^ x;
x = x ^ y;
While the precedence table always specifies the evaluation order of expressions, it does not specify the order in which terms are evaluated.
So with
int a = b() + c() * d();
You know for certian that
temp = c() * d(); is evaluated first
temp = b() + temp; is evaluated second
and int a = temp; is evaluated last
Whether c() or d() is evaluated fist is undefined, or at least it was last time I checked. This might have changed when C++ was standardized.
|
|
|
|
|
Sorry that was poorly written. I was actually agreeing that the order in which the functions c() and d() were evaluated is undefined.
However I still don't see how any part of the posted code has undefined behaviour. What line are you refering to?
"And there is one more interesting little thing about this code...
It is dangerous as the outcome is undefined according to the spec"
|
|
|
|
|
I think the standard method is faster because contemporary processors have multiple processing pipelines. As long as you can keep feeding the processor more instructions it can process them in parallel. Using the XOR trick cause the processor to stall waiting for the result of the previous XOR instruction to complete before it can process the current XOR instruction.
I guessing at least, I’m probably wrong.
|
|
|
|
|
I don’t think my browser is refreshing. I just noticed Goran’s post.
|
|
|
|
|
Russ Freeman wrote:
And there is one more interesting little thing about this code...
It is dangerous as the outcome is undefined according to the spec. See the C FAQ for details at http://www.eskimo.com/~scs/C-faq/q3.1.html[^]
For another example of undefined behaviour see this: http://www.eskimo.com/~scs/C-faq/q3.3.html[^]
So, whilst the code works today you may be scratching your head wondering why the latest patch or upgrade has broken your application only to discover that the undefined behaviour you took advantage is now undefined in a completely different way Good Luck.
In the same vane...which function is executed first?
int a = b() + c() * d();
Actually, I'm not convinced this would be considered a side-effect. Consider the more common:
int a;
int b;
int c;
a = b = c = 5;
Now, given that there's no guarantee what the values of a , b , and c are before initialization, what is the result of this operation? Obviously, a doesn't take on whatever value b had before initialization, nor b c 's. It evaluates right-to-left.
Likewise, since the author is using assignment operators (even the "modifying" ones), my interpretation is that the results would not be undefined/unspecified/implementation-defined, or, if they are, that the standard chain-initialization would also be undefined/unspecified/implementation-defined. This wouldn't be a "side-effect" any more than "a = a + 1" would be.
|
|
|
|
|
Russ Freeman wrote:
Now, given that there's no guarantee what the values of a, b, and c are before initialization, what is the result of this operation?
Honestly, with a question like that the only answer I can give is; Read a book on "C".
(A: a=5, b=5, c=5 thanks to the way the language works with c being assigned first, read the spec or a good book for an explanation of the above discussion).
I suggest the same to you, if you believe that Andreas' code produces undefined results.
|
|
|
|
|
It cannot be faster since when doing
xor eax, ebx
xor ebx, eax
xor eax, ebx
you're stalling pipelines because all three successive instructions wait on the result of the previous ones.
On the other hand, when you have something like this:
mov [temp], eax
mov eax, ebx
mov ebx, [temp]
only the third one depends on a result of previous instruction (first).
Although, there are lots of factors, but, not going deeply into details, you save one cycle with the second approach (3 to 2). BTW, don't get fooled by the 'temp' variable. Because of the cache and delayed writting, 'temp' pretty much acts as a normal register.
BTW, your question doesn't really make sense... People should focus on algorithms, not low level optimising. For quite a while, compilers are far better than humans in making optimisations, especially when the generated code targets more complex architectures...
- Goran.
|
|
|
|
|
Too bad real world tests don't jive with what you say. XOR is faster.
As far as not bothering with opimization, yes and no. Blind optimization is always stupid. You need to run tests to see what needs to be improved. Compilers aren't mind readers. They can generate horrific code. Expecially in the case of heavily templated code, the compiler the lose track of registers and do some really stupid things.
Tim Smith
"Programmers are always surrounded by complexity; we can not avoid it... If our basic tool, the language in which we design and code our programs, is also complicated, the language itself becomes part of the problem rather that part of the solution."
Hoare - 1980 ACM Turing Award Lecture
|
|
|
|
|
1. No it isn't. Test example is incredibly badly written. Variables to be swapped should be full integers, not short ones.
2. In the low level department, compilers are pretty good. Only issue optimizers have with templates is the depth of optimisations they are allowed to perform. MSVC often stinks here, but, for example, Intel's compiler is exceptional, especially in the latest release (Kai C++).
- Goran.
|
|
|
|
|
My question in no way implies that the fastest method will is the always most correct.
Readability is usually far more important than execution speed. Efficient Design is far more imortant than efficent code. Code tuning is only appropriate after the finished program has been profiled and even then the effect of each optimization needs to be measured to ensure the the loss in readiblity and maintanability is justified.
|
|
|
|
|
I'm surprised no one pointed this out yet, even after someone pointed out the assembley generated for an xor, but the xor method has to copy the values that need to be xor'd into registers first (at least one of the variables), do the xor, and copy them back. To see how stupid this actually is, write the code in assembley where both of the variables originally reside in memory. You find out that at least one of the values has to be copied to a register anyway. So assuming that ecx and edx contain the address of the variables to be swapped, this is basically the xor code:
[code]
mov eax,[ecx]
xor eax,[edx]
xor [edx],eax
xor eax,[edx]
mov [ecx],eax
[/code]
not [i]too[/i] shabby, but why not just do this:
[code]
mov eax,[ecx]
mov ebx,[edx]
mov [ecx],ebx
mov [edx],eax
[/code]
or this:
[code]
mov eax,[ecx]
xchg [edx],eax
xchg [ecx],eax
[/code]
Optimizing compilers don't know what you are intending to do, otherwise it would probably generate this:
[code]
mov eax,[ecx]
mov ebx,[edx]
xchg eax,ebx
mov [ecx],ebx
mov [edx],eax
[/code]
but I still like the code above that better for exchanging two integers in memory.
|
|
|
|
|
Its just rather cool, if you ask me.
Roger Allen
Sonork 100.10016
I think I need a new quote, I am on the prowl, so look out for a soft cute furry looking animal, which is really a Hippo in disguise. Its probably me.
|
|
|
|
|
Yes. But you have to explain again and again and again what it does and how it works.
At least now you can refer to this article
int x=1, y=5;
x^=y^=x^=y;
<a href="http://www.codeproject.com/useritems/StupidXORTrick.asp" target="_blank">ClickHereForHelp();</a>
|
|
|
|
|
do you know any other tricks?
|
|
|
|
|
I can juggle, but only with 1 ball.
int x=1, y=5;
x^=y^=x^=y;
|
|
|
|
|
x ^= x; is one way to do x = 0; . But I guess the compiler would optimize away MOV reg/mem, 0 to XOR reg/mem, reg/mem .
XOR requires less opcode and is at least as fast (if not faster) than MOV. XOR on the 386 required only one CPU cycle, while a MOV on memory would require at least 2 cycles IIRC.
|
|
|
|
|
yep, the MS compiler does exactly this: whenever you assign 0 to a variable it just XOR's it with itself.
int x=1, y=5;
x^=y^=x^=y;
<a href="http://www.codeproject.com/useritems/StupidXORTrick.asp" target="_blank">ClickHereForHelp();</a>
|
|
|
|
|