Introduction
You can use this code on small controllers used on embedded systems, that lacks many arithmetic operations, like Arduino. It calculates the square root stored in BX
(31 bit) and puts the result on AL
(7 bit) and the remainder on AH
(7 bit). It is possible to use the 63/15 bit wide version using the intel extended registers (EBX
for input and EAX
for results). It is also possible to modify it for arbitrary precision. Only shifts and compare used:
asm( "mov ax, 0; stc" )
asm( "loop:" )
asm( "rcl bx, 1; rcl ah, 1" )
asm( "shl bx, 1; rcl ah, 1" )
asm( "shl al, 2" )
asm( "or al, 1" )
asm( "cmp ah, al" )
asm( "jc skip" )
asm( "sub ah, al" )
asm( "or al, 2" )
asm( "skip:" )
asm( "shr al, 1" )
asm( "cmp bx,0x8000" )
asm( "clc" )
asm( "jnz loop" )
As you can see, it is really tiny and fast.
Background
While remembering the old long division method for square roots, I thought that the particularities of base 2 binary numeric system could be exploited to simplify this trick, related to "human" base 10 approach.
Using the Code
The easiest way to test this code is using GCC, as a wrapper:
unsigned char sqrt( unsigned short w )
{ asm( "mov bx, %0 " : "=r" ( w ) );
asm( "mov ax, 0; stc" );
asm( "loop:" );
asm( "rcl bx, 1; rcl ah, 1" );
asm( "shl bx, 1; rcl ah, 1" );
asm( "shl al, 2" );
asm( "or al, 1" );
asm( "cmp ah, al" );
asm( "jc skip" );
asm( "sub ah, al" );
asm( "or al, 2" );
asm( "skip:" );
asm( "shr al, 1" );
asm( "cmp bx,0x8000" );
asm( "clc" );
asm( "jnz loop" );
}
int main()
{ unsigned short w= sqrt( 25*25+1 );
return( w );
}
You can cut and paste to sqrt.c, add trace points as wished and build it with: gcc.exe -masm=intel sqrt.c
Another option is to use an IDE like codeblocks and trace the execution.
Points of Interest
This code only uses the source and result registers (BX and AX), so it doesn't "pollute" any other register. It is possible that a quick version exploits the register boundary, but it takes more registers to achieve this. The instruction set used is present in almost all microcontrollers, so porting it is easy.
As an observation, the square root or a binary number takes half of the bits of operand, thus, if operand is 15 bits long, the result can fit on 7 bits (this case), If 31 bits long, fits on 15 bits, and so.
History
- 5th June, 2019: Initial version