Targetting 8-bit CPUs with no division instruction (in hardware), this snippet finds the number X that, given N (both 8-bit unsigned) so that N*X=1 (mod 256).
Background
Finding the multiplicative inverse in modular arithmetics can be accomplished with Extended Euclid's Algorithm, but that requires division operations and looping/recursion for an average of 6 iterations/depth (over all input values from 1..255).
This is a piece of ugly code that CLANG (or also GCC a bit less optimized) boils down to something finishing in less than 100~200 cycles on the tinyest 8-bit CPUs, in constant time, with no lookup tables, no multiplication/division/modulo instructions.
The Code
typedef unsigned char byte;
struct bit8
{ int a:1,b:1,c:1,d:1,e:1,f:1,g:1,h:1;
};
__attribute__((noinline))
byte inv8(byte _i)
{
byte is {_i}, ix; bit8 i_1,i_2,i_3,i_4,i_5;
is>>=1; ix=_i^is; i_1=*(bit8*)&ix;
is>>=1; ix=_i^is; i_2=*(bit8*)&ix;
is>>=1; ix=_i^is; i_3=*(bit8*)&ix;
is>>=1; ix=_i^is; i_4=*(bit8*)&ix;
is>>=1; ix=_i^is; i_5=*(bit8*)&ix;
bit8 k {0,0,0, i_1.b, i_1.c&i_2.b,
i_3.b^(i_1.d&!i_2.c&!i_3.b),
(!i_4.b&!i_1.e&(i_3.c|!i_2.d)) ^ (!i_1.c&!i_2.b),
(i_5.b^(i_1.e&i_2.c&!i_3.b))
| (((i_2.e|i_1.f)^i_4.c) & (i_4.c|!i_3.d&!i_5.b))
};
return _i^*(byte*)&k;
};
Points of Interest
Note that the input parameter should be an ODD number, so the result can make sense, if you think about it.
The scheme does not seem to me to be extendable to wider integers... ideas, comments and questions welcome!
History
- 8th July, 2022: Initial version