←back to thread

171 points pizza | 1 comments | | HN request time: 0s | source
Show context
RainyDayTmrw ◴[] No.43603323[source]
How do you reconcile this with the (I believe) widely accepted idea that you can't meaningfully compress data using offsets into Pi?
replies(2): >>43604557 #>>43605578 #
fc417fc802 ◴[] No.43604557[source]
Not an expert but my impression is that the title and intro are worded in a highly misleading manner.

IIUC they're transforming the data before compressing it. Also IIUC this is an established method.

Because of the nature of the data and the transform involved, you can get reasonable results with random numbers. That's already been done, but this work brute forces seeds to optimize the compression ratio and then derives the transform on the fly from the seed in order to save on memory bandwidth.

I feel like (again, non-expert) there are much deeper implications about current ML models here. The fact that a randomized transform can have this sort of impact seems to imply that there's much less information encoded by the data than we otherwise might expect given its sheer size.

Regarding Pi. You can't encode arbitrary data using arbitrary sequences and expect to come out ahead on average. But you can encode specific data using algorithms that exhibit specific behavior.

replies(1): >>43604759 #
1. fc417fc802 ◴[] No.43604759[source]
Maybe I'm wrong. Figure 2 seems to depict exactly what's described by the title, searching for a combination of random numbers that recovers an approximation of the weights. But if that's true then I have the same question about information theoretics that you posed above.