The $5000 Compression Challenge (2001)

(www.patrickcraig.co.uk)

181 points ekiauhce | 3 comments | 23 Nov 24 21:10 UTC | HN request time: 0s | source

Show context

ccleve ◴[23 Nov 24 23:55 UTC] No.42224858[source]▶

I wonder if it would have been possible to win the challenge legitimately?

If a randomly-generated file happened to contain some redundancy through sheer chance, you could hand-craft a compressor to take advantage of it. This compressor would not work in general for random data, but it could work for this one particular case.

It's a bet worth taking, because the payoff, 50:1 ($5,000 to $100), is pretty good. Play the game 50 times and you might get a file you could compress.

The challenge, then, would be for the person offering the bet to generate a really random file that contained no such redundancy. That might not be easy.

replies(3): >>42224907 #>>42225027 #>>42225057 #

dmurray ◴[24 Nov 24 00:03 UTC] No.42224907[source]▶

>>42224858 #

I think you can make some argument about why this isn't possible at 50:1 odds.

A plausible "decompressor" is at least, say, 30 or 100 bytes, so the random file needs to have 30 bytes less entropy than you expected, which happens with probability X where X << 1/50. Sum over the whole domain of reasonable decompressors, and you still don't get there.

This argument could do with more rigor, but I think it's correct. Give me 100 million to 1 odds, though, and I'll take my chances trying to brute force a compressor.

replies(1): >>42225074 #

lambdaone ◴[24 Nov 24 00:37 UTC] No.42225074[source]▶

>>42224907 #

This is actually an extremely interesting question. 'Weak' files that are more easily compressable than others certainly exist, but with low probability.

For example, the all-zeros file is a member of the set of all random 3 megabyte files, and it would certainly be possible to compress that, if by great good fortune you were lucky enough to receive it - albeit something that is unlikely to ever happen in the possible lifetime of the universe, given current physical theories.

Is it possible to quantify this idea of a 'weak' file more accurately?

replies(4): >>42225130 #>>42225562 #>>42232581 #>>42239570 #

1. l33t7332273 ◴[24 Nov 24 02:43 UTC] No.42225562[source]▶

>>42225074 #

One thing you can do, as the other commenter pointed out, is consider entropy of the file.

However, this restriction is too much for the purposes of this challenge. We don’t actually need a file with low entropy, in fact I claim that a weak file exists for files with entropy 8 (the maximum entropy value) - epsilon for each epsilon > 0.

What we actually need is a sufficiently large chunk in a file to have low entropy. The largeness is in absolute terms, not relative terms.

A very simple file would be taking a very large file with maximum entropy and adding 200 0’s to the end. This would not decrease the entropy of the file much, but it gives way to a compression algorithm that should be able to save ~100 bytes

replies(1): >>42226941 #

2. kevinventullo ◴[24 Nov 24 09:35 UTC] No.42226941[source]▶

>>42225562 (TP) #

Note that if this large chunk occurs in the middle of the file, then you will need extra space to encode that position. For example, a random bit string of length 2^n is decently likely to have a run of n zeroes. But this doesn’t help you because you need n bits just to encode where that run happens.

replies(1): >>42231354 #

3. l33t7332273 ◴[24 Nov 24 22:59 UTC] No.42231354[source]▶

>>42226941 #

But storing an index for a file of length 2^n takes only n bits, so you need that run of 0’s to be of length n+1 to win

↑