The GMP port >>143 of >>101 has the infrastructure completed and the first two layers ported. The same order of growth in python is in >>36, and it computed the u128 overflow R(10^20) in 300 milliseconds. The time in C+GMP is 50 milliseconds, so a speedup by a factor of 6.
$ time bin/gmphofs 100000000000000000000
levels: 5
5000000000942800098290022420982686040347 132
real 0m0,049s
user 0m0,049s
sys 0m0,000s
I expect the speedup for higher layers to suffer from less work being done in non-layer bookkeeping and more in multiplications, but to improve from eliminating allocations for immutable ints. I do not know which effect will prevail. Both python ints and gmpy2.mpzs are immutable so there is some allocation going on in the layer computations, although gmpy2 helps with caching.
https://gmpy2.readthedocs.io/en/latest/overview.html#miscellaneous-gmpy2-functions -> get_cache
But in GMP the mpz_t values are mutable and can be presized, and this is used to avoid any allocations in the main loop.
https://gmplib.org/manual/Efficiency
Time to port the third layer.