|
Take it easy, man!
Rajesh, just enjoy discussion with people around the world which have different options for the same question.
regards,
George
|
|
|
|
|
The point was that there was nothing wrong in my answer (my opinion) and someone still voted it down. I just wanted their feedback in particular to know if anything was wrong with my answer, so that I can know something new. Not that I care for the vote.
|
|
|
|
|
whoso ever downvoted you .. will be beaten with stick! any ways le me also square it of!
"Opinions are neither right nor wrong. I cannot change your opinion. I can, however, change what influences your opinion." - David Crow Never mind - my own stupidity is the source of every "problem" - Mixture
cheers,
Alok Gupta
VC Forum Q&A :- I/ IV
Support CRY- Child Relief and You/xml>
|
|
|
|
|
Thanks man. You're kind.
|
|
|
|
|
George_George wrote: why using LEA to do multiplication is faster than using MUL?
You'd have to know about the internal architecture and circuitry of the CPU to answer that; I don't and I doubt there would be many people except for people that work (or have worked) at Intel that would.
George_George wrote: "The TEB's linear address can be found at offset 0x18 in the TEB." -- what means linear address? Something like array, which elements are put next to each other? What means non-linear address?
To understand what's going on here you have to know a little about Intel CPUs and segment registers. Basically C/C++ has no concept of segment registers and such (it assumes a linear address space) so this is a page-table mapping trick done by the OS to make the TEB addressable in such an environment.
Steve
|
|
|
|
|
Thanks Steve,
1.
Where to look for documents for cycles needed for a specific instruction (e.g. LEA and MUL)?
2.
"it assumes a linear address space" -- it you mean low layer CPU/register or high layer C/C++?
regards,
George
|
|
|
|
|
George_George wrote: Where to look for documents for cycles needed for a specific instruction (e.g. LEA and MUL)?
Download the datasheet for the CPU.
George_George wrote: "it assumes a linear address space" -- it you mean low layer CPU/register or high layer C/C++?
It's complicated and very low level. I suggest you start reading something like this[^].
Steve
|
|
|
|
|
Thanks Steve,
"datasheet for the CPU" -- could you give me some links or keywords to search? I am new to this topic.
regards,
George
|
|
|
|
|
Have you Googled for "Pentium Datasheet"?
|
|
|
|
|
|
Thanks Steve,
1.
I have read the link you referred. It talks about how protected mode is using segment based accessment model. But during the whole article, it never mentioned what means linear and non-lnear -- and this is my question.
2.
Could you provide a link for the data sheet you means please? Sorry I am new to this area.
regards,
George
|
|
|
|
|
George_George wrote: I have read the link you referred. It talks about how protected mode is using segment based accessment model. But during the whole article, it never mentioned what means linear and non-lnear -- and this is my question.
This is an oversimplification, but here goes.
In protected mode a number of segment register are used:
CS : Code segment
DS : Data segment
SS : Stack segment
ES : Extra segment
FS : Another extra segment
GS : Yet another extra segment
Different instructions use different segments for different purposes (some instructions also allow for the default segment used to be overridden). Each segment can represent a physically distinct linearly addressable memory space; it’s possible to set things up so that memory addressable in one segment is not addressable in another. In Windows most of the segment registers map to the same memory, which is good as languages such as C/C++ have no concept of segments (or you could think of it as only supporting one segment). The FS segment is an exception however and serves a special purpose: the memory in it is the TIB. So languages like C/C++ can access the TIB the same memory is also mapped into the other segments. The address in the other segments where it’s mapped is stored at FS:[0x18].
George_George wrote: Could you provide a link for the data sheet you means please? Sorry I am new to this area.
See here[^]. Makes good bedtime reading¿
Steve
|
|
|
|
|
Thanks Steve,
1.
I understand the concept of segment. My question is what means linear and non-linear address? My confusion is, I did some search, but can not find any concept like linear or non-linear. Any comments or ideas?
2.
Sorry I am new to Intel manual, and looks like there is quite a few. To look up instruction machine cycle for x64, I should look at the following two manuals?
Intel® 64 and IA-32 Architectures Software Developer's Manual
Volume 2A: Instruction Set Reference, A-M
Describes the format of the instruction and provides reference pages for instructions (from A to M). This volume also contains the table of contents for both Volumes 2A and 2B.
Download ›
(PDF 2.99MB) Order a printed copy ›
(SKU #253666)
Intel® 64 and IA-32 Architectures Software Developer's Manual
Volume 2B: Instruction Set Reference, N-Z
Provides reference pages for instructions (from N to Z). VMX instructions are treated in a separate chapter. This volume also contains the appendices and index support for Volumes 2A and 2B.
Download ›
(PDF 5.60MB) Order a printed copy ›
(SKU #253667)
regards,
George
|
|
|
|
|
1.
From the same article, below your quoted sentence.
The LEA instruction uses hardwired address generation tables that makes multiplying by a select set of numbers very fast (for example, multiplying by 3, 5, and 9). Twisted, but true.
That means LEA instruction is faster than MUL only for a small set of multipliers's value.
2.
George_George wrote: The TEB's linear address can be found at offset 0x18 in the TEB." -- what means linear address? Something like array, which elements are put next to each other?
I think it means direct address, i.e.
mov eax,dword ptr fs:[00000018h] load eax with the address of TEB , hence the following instruction
mov eax,dword ptr [eax+24h] loads eax with value found at offset 0x24 int the TEB (the Thread ID ).
George_George wrote: What means non-linear address?
I suppose it is indirect addressing (via FS register in this context).
If the Lord God Almighty had consulted me before embarking upon the Creation, I would have recommended something simpler.
-- Alfonso the Wise, 13th Century King of Castile.
This is going on my arrogant assumptions. You may have a superb reason why I'm completely wrong.
-- Iain Clarke
[My articles]
|
|
|
|
|
Thanks CPallini,
1.
What means "hardwired address generation tables"? Show more descriptions please?
2.
I am confused about your two assembly instruction because I think direct access mean not using pointer, i.e. [] operator in assembly language. But you are using [] in both assembly language statements and I think the two samples given should be both indirect/pointer/non-linear accessing. Please feel free to correct me if I am wrong.
regards,
George
|
|
|
|
|
George_George wrote: "hardwired address generation tables"
It's referring the electronics in the CPU.
Steve
|
|
|
|
|
Thanks Steve,
I am a little confused. MUL is based on shifting operation, but as mentioned, it is slower than LEA for some special operations, e.g. * 2, *3, *5, etc.
How LEA is implemented and why it is faster than MUL?
regards,
George
|
|
|
|
|
Carlo's first reply to you was very clear:
"The LEA instruction uses hardwired address generation tables that makes multiplying by a select set of numbers very fast (for example, multiplying by 3, 5, and 9)."
LEA can operate at hardware level and thus is significantly faster than MUL. But this applies only for a small set of numbers.
|
|
|
|
|
Sorry for my bad English, Rajesh!
My confusion is,
LEA mean load effective address, and the purpose of this instruction is to get the value of the address, and so how could it be used for the purpose of multiplication?
regards,
George
|
|
|
|
|
Yes, LEA means "Load effective address". But lead can not only load an address, it can compute it on the fly.
LEA EAX,[ESP+14]
puts the result of (value of ESP) plus 14 into EAX.
And it can compute more complicated calculations:
LEA EAX,[EAX*4+EAX]
works, and as I got from the text you have given, it does not actually use a calculation (involving things like cache access and multiply units). Instead, it uses a table lookup. But that does only work for for some multipliers.
So, in effect, for certain multipliers, LEA is faster than MUL is.
Let's think the unthinkable, let's do the undoable, let's prepare to grapple with the ineffable itself, and see if we may not eff it after all. Douglas Adams, "Dirk Gently's Holistic Detective Agency"
|
|
|
|
|
Thanks jhwurmbach,
You bring good point. My further question, in my previous experience with table looking or similar implementation, it only deals with constants' lookup with limited numbers (e.g. lookup one's ID by its name in a large table). So, both input and output are finite.
How could LEA use table to deal with multiplication? My confusion is, since multiplication has infinite possible input and output, how could a constant size table be used to deal with all values?
regards,
George
|
|
|
|
|
Hmm. Here, at least the input is limited to the multiplicants mentioned.
But the remaining number of results would still be very high.
Let's think the unthinkable, let's do the undoable, let's prepare to grapple with the ineffable itself, and see if we may not eff it after all. Douglas Adams, "Dirk Gently's Holistic Detective Agency"
|
|
|
|
|
Thanks jhwurmbach,
Could you explain or describe how multiplication could be implemented as table look-up solution please?
regards,
George
|
|
|
|
|
Some more read material: Pentium Optimization Cross-Reference[^].
From the page: LEA is better than SHL on the Pentium because it pairs in both pipes, SHL pairs only in the U pipe.
Also, as CPallini pointed out, the document states that lea can be beneficial than mul only when multiplied by 2, 3, 4, 5, 7, 8, 9.
|
|
|
|
|
Thanks Rajesh,
I read through the related section you referred. Really good.
1.
MUL is implemented through SHL? My previous thought is CPU has an individual multiplication implementation, especially for values to multiply are not power of 2 (e.g. * 3);
2.
Why when multiplying "2, 3, 4, 5, 7, 8, 9" are better to use LEA? How do we get such numbers?
regards,
George
|
|
|
|
|