←back to thread

181 points ekiauhce | 2 comments | | HN request time: 0.001s | source
Show context
nojvek ◴[] No.42230025[source]
I’d take this bet.

> With this offer, you can tune your algorithm to my data.

One can generate a 1GB file or 10GB file. It is highly likely that there is some form of a repeatable pattern in there to shave off 50-100 bytes by sliding window search.

Then the decompressor is essentially - at this index, expand this pattern. The compressed file excludes the range of pattern.

One may not always get such a pattern, but on multiple tries it’s very feasible. Payout just needs one win in 50 tries.

You could generate a 100GB file. The bigger the file the higher the chance of repeating pattern.

The challenge is won if compressed_file + decompressor is less one byte than original file.

One could have a self executing decompressor to save some file overhead bytes.

replies(4): >>42230142 #>>42230987 #>>42232566 #>>42237324 #
1. crazygringo ◴[] No.42230987[source]
As a general principle, absolutely.

In practice, I wonder what size of file we're talking about that would result in net compression on random data 50% of the time?

I have no intuition whether it's 1 GB, 1 TB, 1 PB, or beyond.

replies(1): >>42232573 #
2. SideQuark ◴[] No.42232573[source]
Nope. As your file gets longer so does the data needed to specify where the repeats are. See my other estimate in this thread. It fails spectacularly.