Partial S-groups have been added during the endgame. The time for computing R(10^20) is down to 2 milliseconds in vanilla python from the 80 milliseconds of >>67:
$ python3 -m timeit -s 'import flake.hofs as mod' 'mod.work_seq4 (100000000000000000000)'
1000 loops, best of 3: 1.98 msec per loop
The time for computing R(10^30) is down to 45 milliseconds in vanilla python from the 14 seconds of >>67:
$ python3 -m timeit -s 'import flake.hofs as mod' 'mod.work_seq4 (1000000000000000000000000000000)'
10 loops, best of 3: 45.3 msec per loop
Such an improvement in runtime cannot be explained by a reduction in constant factor. The natural conclusion has to be that the previously stated runtime order of growth was wrong, but I do not yet see why this is. I will give it some thought. Regardless, while adding this trick I realized that it does not apply to the endgame alone but to all S-group order switches, so I'll just add it between second and third as well.