It is fairly obvious that the space consumption can be brought down to O(n^0.25). I might write that version next since the current version struggles to go above R(10^15). Or I might post the epsilon analysis first, since there is already enough data to empirically show that epsilon is zero.
You store the first 2 * n ** 0.5 R values encoded in your groups list.
This is false. No R value is ever stored other than the last one. The groups are not R values. The groups are (start, length) pairs of gapless S blocks.
If you computed work_seq(10000) and dumped its internal state before returning, you couldn't just restore it and continue to R(20000), you would have to recompute the R values that were not saved.
This is entirely false due to your misreading of the 'groups' list. No R value is ever stored other than the last one. No R value is ever used to compute the S sequence. The S sequence generates itself -- this observation is key to the group technique's efficiency.