In this article, I show how I adapted a mental calculation technique to check some small factors faster than division.
Download
Background
This article is part of a set of articles. Each article highlight an aspect around Integer Factorization.
The xPrompt Download and prg files
xPrompt.zip contains the binary of the xHarbour Scripting Engine for Windows. This is the app needed to run the prg code.
Usage :
- Copy prg file in same directory than xPrompt
- Launch xPromt
- Type "do FileName.prg" to run the code
xHarbour is freeware and can be dowloaded at xHarbour.org[^]
xHarbour is part of the xBase family languages:
xBase - Wikipedia[
^]
Introduction
Division is a good general purpose algorithm to check if a prime is factor of an integer, but sometimes we can get faster.
Small primes: Faster than division
When dealing with integer factorization, it is common to have a first stage for checking small factors before starting main algorithm. It happen that it is possible to check some of those factors faster than with a division.
Division is an efficient general purpose algorithm, but faster algorithms exist for particular divisors. Those algorithms resort to Mental Calculation techniques.
Some mental calculation techniques allow you to check easily the divisibility of a number for specific factors. The same techniques can be used on computer to check a few factors faster than with divisions because those thechniques also works with base 2.
Checking the Base
Humans Counting
Base 10 is how we count. Everyone can tell instantly if a number is multiple of 10 without resorting to division because the unit digit is all what matters. And everyone can also deduce if multiple of 2 and 5 as they are factors of 10.
2 is a factor if unit is 0, 2, 4, 6, 8.
5 is a factor if unit is 0, 5.
Ex: 3141592653
unit is 3, so 2 and 5 are not factor of 3141592653.
Cost is constant time. O(n) = 1.
Computer Counting
Base 2 is how computers count. The unit allow to know if 2 is a factor of a number. The fast operation is a bitwise AND.
2 is a factor if unit is 0.
Ex: 3141592653 => 0b10111011010000001110011001001101
unit is 1, so 2 is not a factor of 3141592653.
Cost is constant time. O(n) = 1.
if (N & 1) = 0
return(2)
endif
if ((Cand & 1) == 0) {
return 2;
}
Checking Other Factors
Humans Counting
Everyone having learned to do additions and multiplications by hand have also learned how to proof check those operations by using the casting out nines method. The method can be used to know if a number have 3 as factor, and also 11 and others with a little teaking. All this is possible because, 3 is a factor of 10 - 1 and 11 is a factor of 100 - 1.
The casting out nines method is about adding the digits of a number and reapply the method until the result is 1 digit.
Factor 3
The techbique is about checking 9 which is base 10 - 1, 3 being only a factor of 9.
3 is a factor if result is 3, 6 or 9.
Ex: 3141592653 => 3+1+4+1+5+9+2+6+5+3 => 39 => 3+9 => 12 => 1+2 => 3
Result is 3, so 3 is a factor of 3141592653.
Cost for n is log(n). O(n) = log(n).
Factor 11
The technique is about checking 11 which is base 10 + 1. We adapt the method with packets of 2 digits.
for 11, we use the fact that 99 (100-1) is a multiple of 11.
11 is a factor if result is 0, 11, 22, 33 ... 88, 99 or if difference of the 2 digits is 0.
3141592653 => 31+41+59+26+53 => 210 => 2+10 => 12
Result is 12, so 11 is not a factor of 3141592653 because 1<>2.
Cost for n is log(n). O(n) = log(n).
Factor 101
The technique is about checking 101 which is base 10^2 + 1. We adapt the method with packets of 4 digits.
for 101, we use the fact that 9999 (10000-1) is a multiple of 101.
101 is a factor if result is 0, 101, 202, 303 ... 9999. or if difference of pairs of digits is 0.
3141592653 => 31+4159+2653 => 6843
So 101 is not a factor of 3141592653 because 68<>43.
Cost for n is log(n). O(n) = log(n).
Cascade
Those factor check can be combined when the number of digits in a check is a multiple of the number of digit of another.
Because 10000 is a power of 100, the result for factor 101 can be used to check factor 11 and 9.
3141592653 => 31+4159+2653 => 6843 => 101 is not a factor
Because 10000 is a power of 100, the result for 10000 can be reused for 100.
6843 => 68+43 => 111 => 1+11 => 12 => 11 is not a factor
Because 100 is a power of 10, the result for 100 can be reused for 10.
12 => 1+2 => 3 => 3 is a factor
The cost for factors 11 and 3 are in constant time when done after 101.
Computer Counting
The principle of "casting out nines" can be applied to base 2. Processors have very efficient tools that can be used to butcher numbers, those tools are addition/subtraction, bitwise operation and shifting.
Interesting Small Primes
3 2^2-1=3=1*3
5 2^2+1=5=1*5
7 2^3-1=7=1*7
11 2^5+1=33=1*3*11
13 2^6+1=65=1*5*13
17 2^4+1=17=1*17
19 2^9+1=513=1*3*3*3*19
23 2^11-1=2047=1*23*89
29 2^14+1=16385=1*5*29*113
31 2^5-1=31=1*31
37 2^18+1=262145=1*5*13*37*109
41 2^10+1=1025=1*5*5*41
43 2^7+1=129=1*3*43
47 2^23-1=8388607=1*47*178481
53 2^26+1=67108865=1*5*53*157*1613
59 2^29+1=536870913=1*3*59*3033169
Cascade Factors 17, 5 and 3
Ex: 3141592653 to binary 0b10111011010000001110011001001101
Reduce by 2 ^ 16 as intermediate optimization
0b10111011010000001110011001001101 => 0b1011101101000000+0b1110011001001101
=> 0b11010000110001101 => 0b1+0b1010000110001101 => 0b1010000110001110
Check 17: Reduce by 2 ^ 8
0b10100001+0b10001110 => 0b1000101110 => 0b10+0b00101110 => 0b00110000 => 0b0011 0b0000
17 is not a factor of 3141592653 because 0b0011<>0b0000
Check 5: Reduce by 2 ^ 4
0b00110000 => 0b0011+0b0000 => 0b0011 => 0b00 0b11
5 is not a factor of 3141592653 because 0b00<>0b11
Check 3: Reduce by 2 ^ 2
0b0011 => 0b00+0b11 => 0b11
3 is a factor of 3141592653
Cost for n is log(n) + constant. O(n) = log(n).
Intermediate oprimization
In modern processors, registers are 64 bits and simple operation take 1 clockcycle, no matter the size of an operation, the cost is the same. When one want to reduce a 64 bits value to a 8 bits value, it takes 7 operations.
By using intermediate steps, we can divide the size of operation by 2 every time. A 64 bits value is first reduced to 32 bits, the 16 bits, and finally to 8 bits. That 3 operations.
Algorithm
There are 4 cases to handle:
- Factor is a power of 2 minus 1
Reduce to power of 2.
Compare result with factor.
- Factor is a factor of a power of 2 minus 1
Reduce to power of 2.
do a modulo.
- Factor is a power of 2 plus 1
Reduce to double of power of 2.
test if both half are same.
- Factor is a factor of a power of 2 plus 1
Reduce to double of power of 2.
do a modulo on the difference of both half.
and optimizations:
Generator
As things can get complicated, It was easier to automate the code generation.
List of prime to integrate
list={ {3, 2, -1, .F.}, {5, 2, +1, .F.}, {7, 3, -1, .F.}, {11, 5, +1, .T.}, ;
{13, 6, +1, .T.}, {17, 4, +1, .F.}, {19, 9, +1, .T.}, {23, 11, -1, .T.}, ;
{29, 14, +1, .T.}, {31, 5, -1, .F.}, {37, 18, +1, .T.}, {41, 10, +1, .T.}, ;
{43, 7, +1, .T.}, {47, 23, -1, .T.}, {53, 26, +1, .T.}, {59, 29, +1, .T.}}
Topological sort for cascade
for scan=2 to len(list)
ptr= 0
for place= 1 to scan-1
if list[scan,5] % list[place,5] = 0
if list[scan,5] = list[place,5]
tmp= list[scan]
adel(list, scan, .T.)
ains(list, place, tmp, .T.)
ptr=0
exit
endif
if ispower(list[scan,5] / list[place,5], 2)= 1
tmp= list[scan]
adel(list, scan, .T.)
ains(list, place, tmp, .T.)
ptr=0
exit
endif
endif
if list[place,5] % list[scan,5] = 0
if ispower(list[place,5] / list[scan,5], 2)= 1
ptr= place
endif
endif
next
if ptr != 0
tmp= list[scan]
adel(list, scan, .T.)
ains(list, ptr+1, tmp, .T.)
endif
next
Adding intermediate steps
for scan=len(list) to 2 step -1
while list[scan-1,5] % list[scan,5] = 0
tmp= list[scan-1,5] / list[scan,5]
tmpp= ispower(list[scan-1,5] / list[scan,5], 2)
if tmp > 2 .and. tmpp=1
ains(list, scan, {0,0,0,.F.,list[scan,5]*2}, .T.)
else
exit
endif
enddo
while list[scan-1,5] < list[scan,5]
if list[scan,5] < int_size/2
ains(list, scan, {0,0,0,.F.,list[scan,5]*2}, .T.)
else
exit
endif
enddo
next
while list[1,5] < int_size/2
ains(list, 1, {0,0,0,.F.,list[1,5]*2}, .T.)
enddo
Code generator
? int_type+" IsSFactNonDiv("+int_type+" Cand) {"
? " "+int_type+" Mask, Prime, tmp;"
? " int Size;"
? " // fact 2"
? " if ((Cand & 1) == 0) {"
? " return 2;"
? " }"
?
for scan=1 to len(list)
if list[scan,1] != 0
? " Prime="+str(list[scan,1],,,.T.)+";"
endif
if scan=1
org= "Cand"
elseif list[scan,5]*2=list[scan-1,5]
org= "N"+str(list[scan,5]*2,,,.T.)
else
org= "Cand"
endif
if (scan=1 .or. list[scan,5] != list[scan-1,5])
if list[scan,5] < int_size
? " Size="+str(list[scan,5],,,.T.)+";"
? " Mask= (0ULL-1) >> ("+str(int_size,,,.T.)+"-Size);"
if list[scan,3]=0
? " "+int_type+" N"+str(list[scan,5],,,.T.)+"= ("+org+" >> Size) + ("+org+" & Mask);"
else
? " "+int_type+" N"+str(list[scan,5],,,.T.)+"= FactFold("+org+", Size, Mask);"
endif
else
? " "+int_type+" N"+str(list[scan,5],,,.T.)+"= "+org+";"
endif
endif
if list[scan,3]= 1 .and. list[scan,4]
? " Size="+str(list[scan,2],,,.T.)+";"
? " Mask= (0ULL-1) >> ("+str(int_size,,,.T.)+"-Size);"
? " tmp = abs(( N"+str(list[scan,5],,,.T.)+" >> Size) - ( N"+str(list[scan,5],,,.T.)+" & Mask));"
? " tmp = FactReduce(tmp, Prime);"
? " if (tmp == 0) {"
? " return Prime;"
? " }"
elseif list[scan,3]= 1
? " Size="+str(list[scan,2],,,.T.)+";"
? " Mask= (0ULL-1) >> ("+str(int_size,,,.T.)+"-Size);"
? " if (( N"+str(list[scan,5],,,.T.)+" >> Size) == ( N"+str(list[scan,5],,,.T.)+" & Mask)) {"
? " return Prime;"
? " }"
elseif list[scan,3]= -1 .and. list[scan,4]
? " tmp = FactReduce( N"+str(list[scan,5],,,.T.)+", Prime);"
? " if (tmp == 0) {"
? " return Prime;"
? " }"
elseif list[scan,3]= -1
? " if ( N"+str(list[scan,5],,,.T.)+" == Prime) {"
? " return Prime;"
? " }"
endif
?
next
? " return Cand;"
? "}"
Resulting code
function SmallPrimes(N)
if (N & 1) = 0
return(2)
endif
N16= Reduce(N, 16)
if (N16 >> 8) = (N16 & 255)
return(257)
endif
N8= Reduce(N16, 8)
if (N8 >> 4) = (N8 & 15)
return(17)
endif
N4= Reduce(N8, 4)
if (N4 >> 2) = (N4 & 3)
return(5)
endif
N2= Reduce(N4, 2)
if N2 = 3
return(3)
endif
N12= Reduce(N, 12)
Tmp= abs((N12 >> 6) - (N12 & 63))
while Tmp >= 13
Tmp= Tmp- 13
enddo
if Tmp = 0
return(13)
endif
N3= Reduce(N12, 3)
if N3 = 7
return(7)
endif
N10= Reduce(N, 10)
Tmp= abs((N10 >> 5) - (N10 & 31))
while Tmp >= 11
Tmp= Tmp- 11
enddo
if Tmp = 0
return(11)
endif
N5= Reduce(N10, 5)
if N5 = 31
return(31)
endif
N7= Reduce(N, 7)
if N7 = 127
return(127)
endif
return 1
function Reduce(N, Size)
local Tmp, Mask
Tmp= N
Mask= (1 << Size) -1
while (Tmp >> Size) <> 0
tmp= (Tmp >> Size) + (Tmp & Mask)
enddo
return(Tmp)
inline long long FactModulo(long long cand, long long fact) {
while (cand >= fact) {
if (cand & 1) {
cand = cand - fact;
}
cand = cand >> 1;
}
return cand;
}
inline long long FactFold(long long cand, long long shift, long long Mask) {
do {
cand = (cand >> shift) + (cand & Mask);
} while (cand >> shift);
return cand;
}
long long IsSFactNonDiv(long long Cand) {
long long Mask, Prime, tmp;
int Size;
if ((Cand & 1) == 0) {
return 2;
}
Size = 32;
Mask = (0ULL - 1) >> (64 - Size);
long long N32 = (Cand >> Size) + (Cand & Mask);
Size = 16;
Mask = (0ULL - 1) >> (64 - Size);
long long N16 = (N32 >> Size) + (N32 & Mask);
Prime = 17;
Size = 8;
Mask = (0ULL - 1) >> (64 - Size);
long long N8 = FactFold(N16, Size, Mask);
Size = 4;
Mask = (0ULL - 1) >> (64 - Size);
if ((N8 >> Size) == (N8 & Mask)) {
return Prime;
}
Prime = 5;
Size = 4;
Mask = (0ULL - 1) >> (64 - Size);
long long N4 = FactFold(N8, Size, Mask);
Size = 2;
Mask = (0ULL - 1) >> (64 - Size);
if ((N4 >> Size) == (N4 & Mask)) {
return Prime;
}
Prime = 3;
Size = 2;
Mask = (0ULL - 1) >> (64 - Size);
long long N2 = FactFold(N4, Size, Mask);
if (N2 == Prime) {
return Prime;
}
Size = 48;
Mask = (0ULL - 1) >> (64 - Size);
long long N48 = (Cand >> Size) + (Cand & Mask);
Size = 24;
Mask = (0ULL - 1) >> (64 - Size);
long long N24 = (N48 >> Size) + (N48 & Mask);
Prime = 13;
Size = 12;
Mask = (0ULL - 1) >> (64 - Size);
long long N12 = FactFold(N24, Size, Mask);
Size = 6;
Mask = (0ULL - 1) >> (64 - Size);
tmp = abs((N12 >> Size) - (N12 & Mask));
tmp = FactModulo(tmp, Prime);
if (tmp == 0) {
return Prime;
}
Size = 6;
Mask = (0ULL - 1) >> (64 - Size);
long long N6 = (N12 >> Size) + (N12 & Mask);
Prime = 7;
Size = 3;
Mask = (0ULL - 1) >> (64 - Size);
long long N3 = FactFold(N6, Size, Mask);
if (N3 == Prime) {
return Prime;
}
Size = 40;
Mask = (0ULL - 1) >> (64 - Size);
long long N40 = (Cand >> Size) + (Cand & Mask);
Prime = 41;
Size = 20;
Mask = (0ULL - 1) >> (64 - Size);
long long N20 = FactFold(N40, Size, Mask);
Size = 10;
Mask = (0ULL - 1) >> (64 - Size);
tmp = abs((N20 >> Size) - (N20 & Mask));
tmp = FactModulo(tmp, Prime);
if (tmp == 0) {
return Prime;
}
Prime = 11;
Size = 10;
Mask = (0ULL - 1) >> (64 - Size);
long long N10 = FactFold(N20, Size, Mask);
Size = 5;
Mask = (0ULL - 1) >> (64 - Size);
tmp = abs((N10 >> Size) - (N10 & Mask));
tmp = FactModulo(tmp, Prime);
if (tmp == 0) {
return Prime;
}
Prime = 31;
Size = 5;
Mask = (0ULL - 1) >> (64 - Size);
long long N5 = FactFold(N10, Size, Mask);
if (N5 == Prime) {
return Prime;
}
Prime = 37;
Size = 36;
Mask = (0ULL - 1) >> (64 - Size);
long long N36 = FactFold(Cand, Size, Mask);
Size = 18;
Mask = (0ULL - 1) >> (64 - Size);
tmp = abs((N36 >> Size) - (N36 & Mask));
tmp = FactModulo(tmp, Prime);
if (tmp == 0) {
return Prime;
}
Prime = 19;
Size = 18;
Mask = (0ULL - 1) >> (64 - Size);
long long N18 = FactFold(N36, Size, Mask);
Size = 9;
Mask = (0ULL - 1) >> (64 - Size);
tmp = abs((N18 >> Size) - (N18 & Mask));
tmp = FactModulo(tmp, Prime);
if (tmp == 0) {
return Prime;
}
Prime = 23;
Size = 11;
Mask = (0ULL - 1) >> (64 - Size);
long long N11 = FactFold(Cand, Size, Mask);
tmp = FactModulo(N11, Prime);
if (tmp == 0) {
return Prime;
}
Size = 56;
Mask = (0ULL - 1) >> (64 - Size);
long long N56 = (Cand >> Size) + (Cand & Mask);
Prime = 29;
Size = 28;
Mask = (0ULL - 1) >> (64 - Size);
long long N28 = FactFold(N56, Size, Mask);
Size = 14;
Mask = (0ULL - 1) >> (64 - Size);
tmp = abs((N28 >> Size) - (N28 & Mask));
tmp = FactModulo(tmp, Prime);
if (tmp == 0) {
return Prime;
}
Prime = 43;
Size = 14;
Mask = (0ULL - 1) >> (64 - Size);
long long N14 = FactFold(N28, Size, Mask);
Size = 7;
Mask = (0ULL - 1) >> (64 - Size);
tmp = abs((N14 >> Size) - (N14 & Mask));
tmp = FactModulo(tmp, Prime);
if (tmp == 0) {
return Prime;
}
Size = 46;
Mask = (0ULL - 1) >> (64 - Size);
long long N46 = (Cand >> Size) + (Cand & Mask);
Prime = 47;
Size = 23;
Mask = (0ULL - 1) >> (64 - Size);
long long N23 = FactFold(N46, Size, Mask);
tmp = FactModulo(N23, Prime);
if (tmp == 0) {
return Prime;
}
Prime = 53;
Size = 52;
Mask = (0ULL - 1) >> (64 - Size);
long long N52 = FactFold(Cand, Size, Mask);
Size = 26;
Mask = (0ULL - 1) >> (64 - Size);
tmp = abs((N52 >> Size) - (N52 & Mask));
tmp = FactModulo(tmp, Prime);
if (tmp == 0) {
return Prime;
}
Prime = 59;
Size = 58;
Mask = (0ULL - 1) >> (64 - Size);
long long N58 = FactFold(Cand, Size, Mask);
Size = 29;
Mask = (0ULL - 1) >> (64 - Size);
tmp = abs((N58 >> Size) - (N58 & Mask));
tmp = FactModulo(tmp, Prime);
if (tmp == 0) {
return Prime;
}
return Cand;
}
Benchmark
Runtime is so fast that timing is done on repeated execution of routines. Timing done on a i3 3GHz.
Timing
Integer Factor Division My Method
6561 3 9 ms 6 ms
125 5 18.001 ms 4 ms
2401 7 39.002 ms 9 ms
14641 11 41.002 ms 20.001 ms
3601 13 42.002 ms 11 ms
83521 17 52.003 ms 3 ms
49999 49999 147.008 ms 59.003 ms
4611686018427387899 4611686018427387899 143.008 ms 55.003 ms
4611686018427387877 4611686018427387877 143.008 ms 55.003 ms
On last three numbers, my method is 2.5 times faster than division.
Points of Interest
Interest is limited if value is within register size, is value is big integer, the technique can be extended with more gains.
Unfortunately, it works only on a limited set of primes.
External Links:
History
- 20190501: First draft
- 20201230: Second version