[ prog / sol / mona ]

prog


What are you working on?

165 2020-12-31 15:10

When trying to perform integer division of some number n by some divisor d, we may instead wish to approximate this with multiplication by some magic factor followed by integer division with a power of the base of the number system of our hardware. The last step can be performed using a right shift operation, and the multiply-shift combination is likely to be faster than performing integer division. For example our base might be 2 and our divisor 10, as when trying to produce the decimal string representing our number n. The correctness condition we impose is that we must obtain the same result as if we performed the integer division. Since the right shift operation rounds downward on positive integers, our approximation must be no less than the result of the integer division, and may only be greater by an amount that does not overflow into the result for the successor integer. As an example, we cannot allow the result of the integer division 20//10 to be approximated by anything less since downward rounding will follow, and we cannot allow the approximation for the integer division 29//10 to reach as high as 30//10. The correctness condition can be expressed as:

n    n * magic   n + 1
- <= --------- < -----
d    basepower     d  

Taking the cross product of the first inequality yields

basepower <= d * magic

The n has simplified away so this inequality can be made to hold regardless of n. We are left with a d-multiple that is no less than basepower. As an example we might take basepower to be 2^16=65536 and magic to be 6554 which gives

n // 10 <= n * 6554 // 65536

The second inequality yields

n * magic * d < basepower * (n + 1)

Since magic * d is a d-multiple that is no less than basepower, we make this explicit by writing

magic * d = basepower + over

Substitution and simplification yield

n * over < basepower

As an example we might have

6554 * 10 = 65536 + 4

which yields

n * 4 < 65536

Several conclusions can be drawn. One is that minimizing 'over' helps satisfy this inequality. This is equivalent to choosing magic * d to be the smallest d-multiple no less than basepower. Another conclusion is that if 'over' is not 0, it will be at least 1 and thus basepower will be greater than n. The main conclusion, however, is that n has not simplified away and therefore in the general case the inequality will not hold for arbitrary n. For any choice of magic factor and basepower where 'over' is not 0, we can always find a large enough n to break this inequality. It is for this reason that the process of picking an appropriate magic factor and basepower does not depend merely on the base and the divisor, it also depends on the maximum value of n upto which we expect our method to work. To go over this limit we need to find a higher basepower and magic factor to satisfy both inequaliies of the correctness condition. This strategy of iterative search can be implemented as follows:

def diim_basebits (base, target, strict):
   shift = 1
   exp   = base

   while exp < target:
      exp   *= base
      shift += 1

   if strict and (exp == target):
      exp   *= base
      shift += 1

   return (shift, exp)

def work_diim (base, div, limit, *, debug = False):
   msg = print

   if debug:
      msg ("base {:,d} div {:,d} limit {:,d}".format (
         base, div, limit))

   shift, exp = diim_basebits (base, div, False)
   more       = True
   tries      = 0

   while more:
      tries   += 1
      exptodiv = divmod (exp, div)

      if exptodiv [1]:
         factor = exptodiv [0] + 1
         over   = div - exptodiv [1]
      else:
         factor = exptodiv [0]
         over   = 0

      product = over * limit

      if debug:
         msg ("try {:,d} shift {:,d} exp {:,d}".format (
            tries, shift, exp))
         msg ("  factor {:,d} over {:,d} product {:,d}".format (
            factor, over, product))

      if product < exp:
         more = False
      else:
         exp   *= base
         shift += 1

   product     = limit * factor
   pbits, pexp = diim_basebits (base, product, True)

   if debug:
      msg ("x div {:,d} -> (x * {:,d}) >>({:,d}) {:,d}".format (
         div, factor, base, shift))
      msg ("  basebits {:,d} max {:,d}".format (
         pbits, product))

   return {
      "factor":   factor,
      "shift":    shift,
      "basebits": pbits,
   }
166 2020-12-31 15:12

This is an inner working function called by a wrapper that takes care of invalid arguments. An example that divides an u16 by 10:

base 2 div 10 limit 65,535
try 1 shift 4 exp 16
  factor 2 over 4 product 262,140
try 2 shift 5 exp 32
  factor 4 over 8 product 524,280
try 3 shift 6 exp 64
  factor 7 over 6 product 393,210
try 4 shift 7 exp 128
  factor 13 over 2 product 131,070
try 5 shift 8 exp 256
  factor 26 over 4 product 262,140
try 6 shift 9 exp 512
  factor 52 over 8 product 524,280
try 7 shift 10 exp 1,024
  factor 103 over 6 product 393,210
try 8 shift 11 exp 2,048
  factor 205 over 2 product 131,070
try 9 shift 12 exp 4,096
  factor 410 over 4 product 262,140
try 10 shift 13 exp 8,192
  factor 820 over 8 product 524,280
try 11 shift 14 exp 16,384
  factor 1,639 over 6 product 393,210
try 12 shift 15 exp 32,768
  factor 3,277 over 2 product 131,070
try 13 shift 16 exp 65,536
  factor 6,554 over 4 product 262,140
try 14 shift 17 exp 131,072
  factor 13,108 over 8 product 524,280
try 15 shift 18 exp 262,144
  factor 26,215 over 6 product 393,210
try 16 shift 19 exp 524,288
  factor 52,429 over 2 product 131,070
x div 10 -> (x * 52,429) >>(2) 19
  basebits 32 max 3,435,934,515

To divide an u16 by 10 we can multiply by 52,429 then right shift by 19, and we need 32 bits for the multiplication. One can occasionally find these constants in library code that generates decimal representations. Examples:

>>> (12345 * 52429) >> 19
1234
>>> (2020 * 52429) >> 19
202
>>> (1337 * 52429) >> 19
133
167 2020-12-31 15:14

For an u32:

base 2 div 10 limit 4,294,967,295
try 1 shift 4 exp 16
  factor 2 over 4 product 17,179,869,180
try 2 shift 5 exp 32
  factor 4 over 8 product 34,359,738,360
try 3 shift 6 exp 64
  factor 7 over 6 product 25,769,803,770
try 4 shift 7 exp 128
  factor 13 over 2 product 8,589,934,590
try 5 shift 8 exp 256
  factor 26 over 4 product 17,179,869,180
try 6 shift 9 exp 512
  factor 52 over 8 product 34,359,738,360
try 7 shift 10 exp 1,024
  factor 103 over 6 product 25,769,803,770
try 8 shift 11 exp 2,048
  factor 205 over 2 product 8,589,934,590
try 9 shift 12 exp 4,096
  factor 410 over 4 product 17,179,869,180
try 10 shift 13 exp 8,192
  factor 820 over 8 product 34,359,738,360
try 11 shift 14 exp 16,384
  factor 1,639 over 6 product 25,769,803,770
try 12 shift 15 exp 32,768
  factor 3,277 over 2 product 8,589,934,590
try 13 shift 16 exp 65,536
  factor 6,554 over 4 product 17,179,869,180
try 14 shift 17 exp 131,072
  factor 13,108 over 8 product 34,359,738,360
try 15 shift 18 exp 262,144
  factor 26,215 over 6 product 25,769,803,770
try 16 shift 19 exp 524,288
  factor 52,429 over 2 product 8,589,934,590
try 17 shift 20 exp 1,048,576
  factor 104,858 over 4 product 17,179,869,180
try 18 shift 21 exp 2,097,152
  factor 209,716 over 8 product 34,359,738,360
try 19 shift 22 exp 4,194,304
  factor 419,431 over 6 product 25,769,803,770
try 20 shift 23 exp 8,388,608
  factor 838,861 over 2 product 8,589,934,590
try 21 shift 24 exp 16,777,216
  factor 1,677,722 over 4 product 17,179,869,180
try 22 shift 25 exp 33,554,432
  factor 3,355,444 over 8 product 34,359,738,360
try 23 shift 26 exp 67,108,864
  factor 6,710,887 over 6 product 25,769,803,770
try 24 shift 27 exp 134,217,728
  factor 13,421,773 over 2 product 8,589,934,590
try 25 shift 28 exp 268,435,456
  factor 26,843,546 over 4 product 17,179,869,180
try 26 shift 29 exp 536,870,912
  factor 53,687,092 over 8 product 34,359,738,360
try 27 shift 30 exp 1,073,741,824
  factor 107,374,183 over 6 product 25,769,803,770
try 28 shift 31 exp 2,147,483,648
  factor 214,748,365 over 2 product 8,589,934,590
try 29 shift 32 exp 4,294,967,296
  factor 429,496,730 over 4 product 17,179,869,180
try 30 shift 33 exp 8,589,934,592
  factor 858,993,460 over 8 product 34,359,738,360
try 31 shift 34 exp 17,179,869,184
  factor 1,717,986,919 over 6 product 25,769,803,770
try 32 shift 35 exp 34,359,738,368
  factor 3,435,973,837 over 2 product 8,589,934,590
x div 10 -> (x * 3,435,973,837) >>(2) 35
  basebits 64 max 14,757,395,256,390,660,915

we can multiply by 3,435,973,837 then right shift by 35, and we need 64 bits for the multiplication. Examples:

>>> (123456789 * 3435973837) >> 35
12345678
>>> (20202020 * 3435973837) >> 35
2020202
>>> (13371337 * 3435973837) >> 35
1337133
168 2020-12-31 15:16

To divide an u32 by 3:

base 2 div 3 limit 4,294,967,295
try 1 shift 2 exp 4
  factor 2 over 2 product 8,589,934,590
try 2 shift 3 exp 8
  factor 3 over 1 product 4,294,967,295
try 3 shift 4 exp 16
  factor 6 over 2 product 8,589,934,590
try 4 shift 5 exp 32
  factor 11 over 1 product 4,294,967,295
try 5 shift 6 exp 64
  factor 22 over 2 product 8,589,934,590
try 6 shift 7 exp 128
  factor 43 over 1 product 4,294,967,295
try 7 shift 8 exp 256
  factor 86 over 2 product 8,589,934,590
try 8 shift 9 exp 512
  factor 171 over 1 product 4,294,967,295
try 9 shift 10 exp 1,024
  factor 342 over 2 product 8,589,934,590
try 10 shift 11 exp 2,048
  factor 683 over 1 product 4,294,967,295
try 11 shift 12 exp 4,096
  factor 1,366 over 2 product 8,589,934,590
try 12 shift 13 exp 8,192
  factor 2,731 over 1 product 4,294,967,295
try 13 shift 14 exp 16,384
  factor 5,462 over 2 product 8,589,934,590
try 14 shift 15 exp 32,768
  factor 10,923 over 1 product 4,294,967,295
try 15 shift 16 exp 65,536
  factor 21,846 over 2 product 8,589,934,590
try 16 shift 17 exp 131,072
  factor 43,691 over 1 product 4,294,967,295
try 17 shift 18 exp 262,144
  factor 87,382 over 2 product 8,589,934,590
try 18 shift 19 exp 524,288
  factor 174,763 over 1 product 4,294,967,295
try 19 shift 20 exp 1,048,576
  factor 349,526 over 2 product 8,589,934,590
try 20 shift 21 exp 2,097,152
  factor 699,051 over 1 product 4,294,967,295
try 21 shift 22 exp 4,194,304
  factor 1,398,102 over 2 product 8,589,934,590
try 22 shift 23 exp 8,388,608
  factor 2,796,203 over 1 product 4,294,967,295
try 23 shift 24 exp 16,777,216
  factor 5,592,406 over 2 product 8,589,934,590
try 24 shift 25 exp 33,554,432
  factor 11,184,811 over 1 product 4,294,967,295
try 25 shift 26 exp 67,108,864
  factor 22,369,622 over 2 product 8,589,934,590
try 26 shift 27 exp 134,217,728
  factor 44,739,243 over 1 product 4,294,967,295
try 27 shift 28 exp 268,435,456
  factor 89,478,486 over 2 product 8,589,934,590
try 28 shift 29 exp 536,870,912
  factor 178,956,971 over 1 product 4,294,967,295
try 29 shift 30 exp 1,073,741,824
  factor 357,913,942 over 2 product 8,589,934,590
try 30 shift 31 exp 2,147,483,648
  factor 715,827,883 over 1 product 4,294,967,295
try 31 shift 32 exp 4,294,967,296
  factor 1,431,655,766 over 2 product 8,589,934,590
try 32 shift 33 exp 8,589,934,592
  factor 2,863,311,531 over 1 product 4,294,967,295
x div 3 -> (x * 2,863,311,531) >>(2) 33
  basebits 64 max 12,297,829,381,041,378,645

the magic factor is:

>>> hex (2863311531)
'0xaaaaaaab'

Examples:

>>> (123456789 * 2863311531) >> 33
41152263
>>> 123456789 // 3
41152263
>>> (20202020 * 2863311531) >> 33
6734006
>>> 20202020 // 3
6734006
>>> (13371337 * 2863311531) >> 33
4457112
>>> 13371337 // 3
4457112
169 2020-12-31 15:18

If we have a ternary machine and we want to produce hex strings of numbers up to a million:

base 3 div 16 limit 1,000,000
try 1 shift 3 exp 27
  factor 2 over 5 product 5,000,000
try 2 shift 4 exp 81
  factor 6 over 15 product 15,000,000
try 3 shift 5 exp 243
  factor 16 over 13 product 13,000,000
try 4 shift 6 exp 729
  factor 46 over 7 product 7,000,000
try 5 shift 7 exp 2,187
  factor 137 over 5 product 5,000,000
try 6 shift 8 exp 6,561
  factor 411 over 15 product 15,000,000
try 7 shift 9 exp 19,683
  factor 1,231 over 13 product 13,000,000
try 8 shift 10 exp 59,049
  factor 3,691 over 7 product 7,000,000
try 9 shift 11 exp 177,147
  factor 11,072 over 5 product 5,000,000
try 10 shift 12 exp 531,441
  factor 33,216 over 15 product 15,000,000
try 11 shift 13 exp 1,594,323
  factor 99,646 over 13 product 13,000,000
try 12 shift 14 exp 4,782,969
  factor 298,936 over 7 product 7,000,000
try 13 shift 15 exp 14,348,907
  factor 896,807 over 5 product 5,000,000
x div 16 -> (x * 896,807) >>(3) 15
  basebits 26 max 896,807,000,000

we can multiply by 896,807 then right shift by 15 ternary positions. The multiplication requires 26 ternary positions. Examples:

>>> (123456 * 896807) // (3 ** 15)
7716
>>> 123456 // 16
7716
>>> (202020 * 896807) // (3 ** 15)
12626
>>> 202020 // 16
12626
>>> (133713 * 896807) // (3 ** 15)
8357
>>> 133713 // 16
8357

If we meet a time traveler from ancient Sumer who has a base-60 machine, and we want to produce hex strings of numbers up to a million:

base 60 div 16 limit 1,000,000
try 1 shift 1 exp 60
  factor 4 over 4 product 4,000,000
try 2 shift 2 exp 3,600
  factor 225 over 0 product 0
x div 16 -> (x * 225) >>(60) 2
  basebits 5 max 225,000,000

The final 'over' being 0 shows that this will work for arbitrarily large numbers, provided that we have the space for the multiplication. Examples:

>>> (123456 * 225) // (60 ** 2)
7716
>>> 123456 // 16
7716
>>> (202020 * 225) // (60 ** 2)
12626
>>> 202020 // 16
12626
>>> (133713 * 225) // (60 ** 2)
8357
>>> 133713 // 16
8357

The advantage of this iterative method is that it allows the problem to be tackled using only basic knowledge of fractions, without having to resort to rings and modular inverses. Here is a small challenge. Try to find a necessary and sufficient condition for the base and the divisor so that the final 'over' value will be 0. Once you have the condition try to write some code that implements it with some reasonable efficiency.

199


VIP:

do not edit these