While working on the pattern for incrementing the S-group order I realized that there is no need to call seq3_inc with modified arguments in seq4_inc >>67 after it has already been called in seq4_covered. The two lines:
incS0 = countS1 * S1 + seq3_inc (s, c, 0) [0]
countS0 = covered
from seq4_inc can be replaced with:
countS0 = covered
incS0 = countS0 + countS1
This does not alter the runtime complexity but does give a small speed boost by eliminating a few multiplications. The new timing for the u256 overflow R(10^39) is just over 400 milliseconds in vanilla python.
$ python3 -m timeit -s 'import flake.hofs as mod' 'mod.work_seq4 (1000000000000000000000000000000000000000)'
10 loops, best of 3: 417 msec per loop
More importantly, a similar relation will hold for all higher S-groups.