←back to thread

The $5000 Compression Challenge (2001)

(www.patrickcraig.co.uk)

181 points ekiauhce | 1 comments | 23 Nov 24 21:10 UTC | HN request time: 0.208s | source

Show context

nojvek ◴[24 Nov 24 19:57 UTC] No.42230025[source]▶

>>42223905 (OP) #

I’d take this bet.

> With this offer, you can tune your algorithm to my data.

One can generate a 1GB file or 10GB file. It is highly likely that there is some form of a repeatable pattern in there to shave off 50-100 bytes by sliding window search.

Then the decompressor is essentially - at this index, expand this pattern. The compressed file excludes the range of pattern.

One may not always get such a pattern, but on multiple tries it’s very feasible. Payout just needs one win in 50 tries.

You could generate a 100GB file. The bigger the file the higher the chance of repeating pattern.

The challenge is won if compressed_file + decompressor is less one byte than original file.

One could have a self executing decompressor to save some file overhead bytes.

replies(4): >>42230142 #>>42230987 #>>42232566 #>>42237324 #

1. SideQuark ◴[25 Nov 24 02:10 UTC] No.42232566[source]▶

Your idea fails pretty easily. Simply do the math, and you’ll find your idea won’t work.

One argument: if your method worked, then we could compress nearly any file. This violates the pigeonhole principle.

Another example: for a 4GB file you’d need roughly a 4 byte integer to specify where the repeat started and where the second was, for 8 bytes. Then you need a length byte. Now you need a repeated 10+ bytes sequence to make this compress anything. There are 256^10=2^80 possible such 10 bytes sequence sequences, and only ~2^32 of them in a 4 gb file. So the odds of a repeated are around 1 in 2^50.

Tweak methods and estimate as you wish. It fails.