Here is what happened with the runtime. The stated power is the runtime of the highest-order loop, and I naively assumed that because this gave the overall runtime in >>36, it will continue to do so in >>67. This assumption was false.
In >>36 the highest-order loop folds second-order S-groups. It continues until eventually one of the second-order S-groups doesn't fit because there aren't enough first-order S-groups left to fill it up. The number of passes through the loop is proportional to n^(1/4), and the number of first-order S-groups that the unsatisfied final second-order S-group wanted to consume is also proportional to n^(1/4). These match up nicely here but not in >>67. The number of first-order S-groups that the unsatisfied final second-order S-group had available gives the number of passes through the second 'while' loop of >>36. This 'while' loop will fold the remaining first-order S-groups one by one. Both loops having O(n^(1/4)) passes the overall runtime is O(n^(1/4)).
But in >>67 this chain of reasoning only holds for the first two 'while' loops. The first one folds G3s until a final G3 wants more G2s than are left. The number of passes is O(n^(1/8)), and so is the number of leftover G2s. The latter gives the number of passes through the second 'while' loop. This loop folds the remaining G2s one by one. Eventually a final G2 doesn't fit, not having enough G1s left to fill it up. The remaining G1s are consumed by the third loop one by one. However, groups contain each other via a chain of square roots, and expand back into each other as squares. So the number of leftover G1s is now O(n^(1/4)), and thus the third loop degrades the overall runtime back to O(n^(1/4)).
This only showed up for larger numbers because it had to overcome the much higher constant factor of the first loop. In the new, corrected version all the 'while' loops except the first one have been eliminated, and have been replaced with a single partial S-group each. The first loop now determines the overall runtime because it is the only S-group loop left. So the runtime is now properly O(n^(1/8)), and I'll also post some empirical data to back this up. In the meantime here is R(10^41):
$ g4 41
k 41 index 100000000000000000000000000000000000000000
levels: 5
100000000000000000000000000000000000000000 -> 5000000000000000000029814239698402168816728284052931912470389462588984951289354475 0xA8ACD7C0222311BCD695E19C79874B9A881767964E801B562297EFD84483853F18EB 272