IIUC they're transforming the data before compressing it. Also IIUC this is an established method.
Because of the nature of the data and the transform involved, you can get reasonable results with random numbers. That's already been done, but this work brute forces seeds to optimize the compression ratio and then derives the transform on the fly from the seed in order to save on memory bandwidth.
I feel like (again, non-expert) there are much deeper implications about current ML models here. The fact that a randomized transform can have this sort of impact seems to imply that there's much less information encoded by the data than we otherwise might expect given its sheer size.
Regarding Pi. You can't encode arbitrary data using arbitrary sequences and expect to come out ahead on average. But you can encode specific data using algorithms that exhibit specific behavior.