When trying to perform integer division of some number n by some divisor d, we may instead wish to approximate this with multiplication by some magic factor followed by integer division with a power of the base of the number system of our hardware. The last step can be performed using a right shift operation, and the multiply-shift combination is likely to be faster than performing integer division. For example our base might be 2 and our divisor 10, as when trying to produce the decimal string representing our number n. The correctness condition we impose is that we must obtain the same result as if we performed the integer division. Since the right shift operation rounds downward on positive integers, our approximation must be no less than the result of the integer division, and may only be greater by an amount that does not overflow into the result for the successor integer. As an example, we cannot allow the result of the integer division 20//10 to be approximated by anything less since downward rounding will follow, and we cannot allow the approximation for the integer division 29//10 to reach as high as 30//10. The correctness condition can be expressed as:
n n * magic n + 1
- <= --------- < -----
d basepower d
Taking the cross product of the first inequality yields
basepower <= d * magic
The n has simplified away so this inequality can be made to hold regardless of n. We are left with a d-multiple that is no less than basepower. As an example we might take basepower to be 2^16=65536 and magic to be 6554 which gives
n // 10 <= n * 6554 // 65536
The second inequality yields
n * magic * d < basepower * (n + 1)
Since magic * d is a d-multiple that is no less than basepower, we make this explicit by writing
magic * d = basepower + over
Substitution and simplification yield
n * over < basepower
As an example we might have
6554 * 10 = 65536 + 4
which yields
n * 4 < 65536
Several conclusions can be drawn. One is that minimizing 'over' helps satisfy this inequality. This is equivalent to choosing magic * d to be the smallest d-multiple no less than basepower. Another conclusion is that if 'over' is not 0, it will be at least 1 and thus basepower will be greater than n. The main conclusion, however, is that n has not simplified away and therefore in the general case the inequality will not hold for arbitrary n. For any choice of magic factor and basepower where 'over' is not 0, we can always find a large enough n to break this inequality. It is for this reason that the process of picking an appropriate magic factor and basepower does not depend merely on the base and the divisor, it also depends on the maximum value of n upto which we expect our method to work. To go over this limit we need to find a higher basepower and magic factor to satisfy both inequaliies of the correctness condition. This strategy of iterative search can be implemented as follows:
def diim_basebits (base, target, strict):
shift = 1
exp = base
while exp < target:
exp *= base
shift += 1
if strict and (exp == target):
exp *= base
shift += 1
return (shift, exp)
def work_diim (base, div, limit, *, debug = False):
msg = print
if debug:
msg ("base {:,d} div {:,d} limit {:,d}".format (
base, div, limit))
shift, exp = diim_basebits (base, div, False)
more = True
tries = 0
while more:
tries += 1
exptodiv = divmod (exp, div)
if exptodiv [1]:
factor = exptodiv [0] + 1
over = div - exptodiv [1]
else:
factor = exptodiv [0]
over = 0
product = over * limit
if debug:
msg ("try {:,d} shift {:,d} exp {:,d}".format (
tries, shift, exp))
msg (" factor {:,d} over {:,d} product {:,d}".format (
factor, over, product))
if product < exp:
more = False
else:
exp *= base
shift += 1
product = limit * factor
pbits, pexp = diim_basebits (base, product, True)
if debug:
msg ("x div {:,d} -> (x * {:,d}) >>({:,d}) {:,d}".format (
div, factor, base, shift))
msg (" basebits {:,d} max {:,d}".format (
pbits, product))
return {
"factor": factor,
"shift": shift,
"basebits": pbits,
}
This is an inner working function called by a wrapper that takes care of invalid arguments. An example that divides an u16 by 10:
base 2 div 10 limit 65,535
try 1 shift 4 exp 16
factor 2 over 4 product 262,140
try 2 shift 5 exp 32
factor 4 over 8 product 524,280
try 3 shift 6 exp 64
factor 7 over 6 product 393,210
try 4 shift 7 exp 128
factor 13 over 2 product 131,070
try 5 shift 8 exp 256
factor 26 over 4 product 262,140
try 6 shift 9 exp 512
factor 52 over 8 product 524,280
try 7 shift 10 exp 1,024
factor 103 over 6 product 393,210
try 8 shift 11 exp 2,048
factor 205 over 2 product 131,070
try 9 shift 12 exp 4,096
factor 410 over 4 product 262,140
try 10 shift 13 exp 8,192
factor 820 over 8 product 524,280
try 11 shift 14 exp 16,384
factor 1,639 over 6 product 393,210
try 12 shift 15 exp 32,768
factor 3,277 over 2 product 131,070
try 13 shift 16 exp 65,536
factor 6,554 over 4 product 262,140
try 14 shift 17 exp 131,072
factor 13,108 over 8 product 524,280
try 15 shift 18 exp 262,144
factor 26,215 over 6 product 393,210
try 16 shift 19 exp 524,288
factor 52,429 over 2 product 131,070
x div 10 -> (x * 52,429) >>(2) 19
basebits 32 max 3,435,934,515
To divide an u16 by 10 we can multiply by 52,429 then right shift by 19, and we need 32 bits for the multiplication. One can occasionally find these constants in library code that generates decimal representations. Examples:
>>> (12345 * 52429) >> 19
1234
>>> (2020 * 52429) >> 19
202
>>> (1337 * 52429) >> 19
133
For an u32:
base 2 div 10 limit 4,294,967,295
try 1 shift 4 exp 16
factor 2 over 4 product 17,179,869,180
try 2 shift 5 exp 32
factor 4 over 8 product 34,359,738,360
try 3 shift 6 exp 64
factor 7 over 6 product 25,769,803,770
try 4 shift 7 exp 128
factor 13 over 2 product 8,589,934,590
try 5 shift 8 exp 256
factor 26 over 4 product 17,179,869,180
try 6 shift 9 exp 512
factor 52 over 8 product 34,359,738,360
try 7 shift 10 exp 1,024
factor 103 over 6 product 25,769,803,770
try 8 shift 11 exp 2,048
factor 205 over 2 product 8,589,934,590
try 9 shift 12 exp 4,096
factor 410 over 4 product 17,179,869,180
try 10 shift 13 exp 8,192
factor 820 over 8 product 34,359,738,360
try 11 shift 14 exp 16,384
factor 1,639 over 6 product 25,769,803,770
try 12 shift 15 exp 32,768
factor 3,277 over 2 product 8,589,934,590
try 13 shift 16 exp 65,536
factor 6,554 over 4 product 17,179,869,180
try 14 shift 17 exp 131,072
factor 13,108 over 8 product 34,359,738,360
try 15 shift 18 exp 262,144
factor 26,215 over 6 product 25,769,803,770
try 16 shift 19 exp 524,288
factor 52,429 over 2 product 8,589,934,590
try 17 shift 20 exp 1,048,576
factor 104,858 over 4 product 17,179,869,180
try 18 shift 21 exp 2,097,152
factor 209,716 over 8 product 34,359,738,360
try 19 shift 22 exp 4,194,304
factor 419,431 over 6 product 25,769,803,770
try 20 shift 23 exp 8,388,608
factor 838,861 over 2 product 8,589,934,590
try 21 shift 24 exp 16,777,216
factor 1,677,722 over 4 product 17,179,869,180
try 22 shift 25 exp 33,554,432
factor 3,355,444 over 8 product 34,359,738,360
try 23 shift 26 exp 67,108,864
factor 6,710,887 over 6 product 25,769,803,770
try 24 shift 27 exp 134,217,728
factor 13,421,773 over 2 product 8,589,934,590
try 25 shift 28 exp 268,435,456
factor 26,843,546 over 4 product 17,179,869,180
try 26 shift 29 exp 536,870,912
factor 53,687,092 over 8 product 34,359,738,360
try 27 shift 30 exp 1,073,741,824
factor 107,374,183 over 6 product 25,769,803,770
try 28 shift 31 exp 2,147,483,648
factor 214,748,365 over 2 product 8,589,934,590
try 29 shift 32 exp 4,294,967,296
factor 429,496,730 over 4 product 17,179,869,180
try 30 shift 33 exp 8,589,934,592
factor 858,993,460 over 8 product 34,359,738,360
try 31 shift 34 exp 17,179,869,184
factor 1,717,986,919 over 6 product 25,769,803,770
try 32 shift 35 exp 34,359,738,368
factor 3,435,973,837 over 2 product 8,589,934,590
x div 10 -> (x * 3,435,973,837) >>(2) 35
basebits 64 max 14,757,395,256,390,660,915
we can multiply by 3,435,973,837 then right shift by 35, and we need 64 bits for the multiplication. Examples:
>>> (123456789 * 3435973837) >> 35
12345678
>>> (20202020 * 3435973837) >> 35
2020202
>>> (13371337 * 3435973837) >> 35
1337133
To divide an u32 by 3:
base 2 div 3 limit 4,294,967,295
try 1 shift 2 exp 4
factor 2 over 2 product 8,589,934,590
try 2 shift 3 exp 8
factor 3 over 1 product 4,294,967,295
try 3 shift 4 exp 16
factor 6 over 2 product 8,589,934,590
try 4 shift 5 exp 32
factor 11 over 1 product 4,294,967,295
try 5 shift 6 exp 64
factor 22 over 2 product 8,589,934,590
try 6 shift 7 exp 128
factor 43 over 1 product 4,294,967,295
try 7 shift 8 exp 256
factor 86 over 2 product 8,589,934,590
try 8 shift 9 exp 512
factor 171 over 1 product 4,294,967,295
try 9 shift 10 exp 1,024
factor 342 over 2 product 8,589,934,590
try 10 shift 11 exp 2,048
factor 683 over 1 product 4,294,967,295
try 11 shift 12 exp 4,096
factor 1,366 over 2 product 8,589,934,590
try 12 shift 13 exp 8,192
factor 2,731 over 1 product 4,294,967,295
try 13 shift 14 exp 16,384
factor 5,462 over 2 product 8,589,934,590
try 14 shift 15 exp 32,768
factor 10,923 over 1 product 4,294,967,295
try 15 shift 16 exp 65,536
factor 21,846 over 2 product 8,589,934,590
try 16 shift 17 exp 131,072
factor 43,691 over 1 product 4,294,967,295
try 17 shift 18 exp 262,144
factor 87,382 over 2 product 8,589,934,590
try 18 shift 19 exp 524,288
factor 174,763 over 1 product 4,294,967,295
try 19 shift 20 exp 1,048,576
factor 349,526 over 2 product 8,589,934,590
try 20 shift 21 exp 2,097,152
factor 699,051 over 1 product 4,294,967,295
try 21 shift 22 exp 4,194,304
factor 1,398,102 over 2 product 8,589,934,590
try 22 shift 23 exp 8,388,608
factor 2,796,203 over 1 product 4,294,967,295
try 23 shift 24 exp 16,777,216
factor 5,592,406 over 2 product 8,589,934,590
try 24 shift 25 exp 33,554,432
factor 11,184,811 over 1 product 4,294,967,295
try 25 shift 26 exp 67,108,864
factor 22,369,622 over 2 product 8,589,934,590
try 26 shift 27 exp 134,217,728
factor 44,739,243 over 1 product 4,294,967,295
try 27 shift 28 exp 268,435,456
factor 89,478,486 over 2 product 8,589,934,590
try 28 shift 29 exp 536,870,912
factor 178,956,971 over 1 product 4,294,967,295
try 29 shift 30 exp 1,073,741,824
factor 357,913,942 over 2 product 8,589,934,590
try 30 shift 31 exp 2,147,483,648
factor 715,827,883 over 1 product 4,294,967,295
try 31 shift 32 exp 4,294,967,296
factor 1,431,655,766 over 2 product 8,589,934,590
try 32 shift 33 exp 8,589,934,592
factor 2,863,311,531 over 1 product 4,294,967,295
x div 3 -> (x * 2,863,311,531) >>(2) 33
basebits 64 max 12,297,829,381,041,378,645
the magic factor is:
>>> hex (2863311531)
'0xaaaaaaab'
Examples:
>>> (123456789 * 2863311531) >> 33
41152263
>>> 123456789 // 3
41152263
>>> (20202020 * 2863311531) >> 33
6734006
>>> 20202020 // 3
6734006
>>> (13371337 * 2863311531) >> 33
4457112
>>> 13371337 // 3
4457112
If we have a ternary machine and we want to produce hex strings of numbers up to a million:
base 3 div 16 limit 1,000,000
try 1 shift 3 exp 27
factor 2 over 5 product 5,000,000
try 2 shift 4 exp 81
factor 6 over 15 product 15,000,000
try 3 shift 5 exp 243
factor 16 over 13 product 13,000,000
try 4 shift 6 exp 729
factor 46 over 7 product 7,000,000
try 5 shift 7 exp 2,187
factor 137 over 5 product 5,000,000
try 6 shift 8 exp 6,561
factor 411 over 15 product 15,000,000
try 7 shift 9 exp 19,683
factor 1,231 over 13 product 13,000,000
try 8 shift 10 exp 59,049
factor 3,691 over 7 product 7,000,000
try 9 shift 11 exp 177,147
factor 11,072 over 5 product 5,000,000
try 10 shift 12 exp 531,441
factor 33,216 over 15 product 15,000,000
try 11 shift 13 exp 1,594,323
factor 99,646 over 13 product 13,000,000
try 12 shift 14 exp 4,782,969
factor 298,936 over 7 product 7,000,000
try 13 shift 15 exp 14,348,907
factor 896,807 over 5 product 5,000,000
x div 16 -> (x * 896,807) >>(3) 15
basebits 26 max 896,807,000,000
we can multiply by 896,807 then right shift by 15 ternary positions. The multiplication requires 26 ternary positions. Examples:
>>> (123456 * 896807) // (3 ** 15)
7716
>>> 123456 // 16
7716
>>> (202020 * 896807) // (3 ** 15)
12626
>>> 202020 // 16
12626
>>> (133713 * 896807) // (3 ** 15)
8357
>>> 133713 // 16
8357
If we meet a time traveler from ancient Sumer who has a base-60 machine, and we want to produce hex strings of numbers up to a million:
base 60 div 16 limit 1,000,000
try 1 shift 1 exp 60
factor 4 over 4 product 4,000,000
try 2 shift 2 exp 3,600
factor 225 over 0 product 0
x div 16 -> (x * 225) >>(60) 2
basebits 5 max 225,000,000
The final 'over' being 0 shows that this will work for arbitrarily large numbers, provided that we have the space for the multiplication. Examples:
>>> (123456 * 225) // (60 ** 2)
7716
>>> 123456 // 16
7716
>>> (202020 * 225) // (60 ** 2)
12626
>>> 202020 // 16
12626
>>> (133713 * 225) // (60 ** 2)
8357
>>> 133713 // 16
8357
The advantage of this iterative method is that it allows the problem to be tackled using only basic knowledge of fractions, without having to resort to rings and modular inverses. Here is a small challenge. Try to find a necessary and sufficient condition for the base and the divisor so that the final 'over' value will be 0. Once you have the condition try to write some code that implements it with some reasonable efficiency.