←back to thread

181 points ekiauhce | 1 comments | | HN request time: 0.208s | source
Show context
nojvek ◴[] No.42230025[source]
I’d take this bet.

> With this offer, you can tune your algorithm to my data.

One can generate a 1GB file or 10GB file. It is highly likely that there is some form of a repeatable pattern in there to shave off 50-100 bytes by sliding window search.

Then the decompressor is essentially - at this index, expand this pattern. The compressed file excludes the range of pattern.

One may not always get such a pattern, but on multiple tries it’s very feasible. Payout just needs one win in 50 tries.

You could generate a 100GB file. The bigger the file the higher the chance of repeating pattern.

The challenge is won if compressed_file + decompressor is less one byte than original file.

One could have a self executing decompressor to save some file overhead bytes.

replies(4): >>42230142 #>>42230987 #>>42232566 #>>42237324 #
1. SideQuark ◴[] No.42232566[source]
Your idea fails pretty easily. Simply do the math, and you’ll find your idea won’t work.

One argument: if your method worked, then we could compress nearly any file. This violates the pigeonhole principle.

Another example: for a 4GB file you’d need roughly a 4 byte integer to specify where the repeat started and where the second was, for 8 bytes. Then you need a length byte. Now you need a repeated 10+ bytes sequence to make this compress anything. There are 256^10=2^80 possible such 10 bytes sequence sequences, and only ~2^32 of them in a 4 gb file. So the odds of a repeated are around 1 in 2^50.

Tweak methods and estimate as you wish. It fails.