The $5000 Compression Challenge (2001)

1. fxtentacle ◴[23 Nov 24 23:38 UTC] No.42224763[source]▶

This guy clearly failed because he didn't actually do any compression, he just ab-used the filesystem to store parts of the data and then tried to argue that metadata was not data...

But FYI someone else actually managed to compress that exact same data: https://jasp.net/tjnn/2018/1.xhtml

replies(6): >>42224865 #>>42224936 #>>42225046 #>>42225431 #>>42225920 #>>42235599 #

2. recursive ◴[23 Nov 24 23:56 UTC] No.42224865[source]▶

>>42224763 (TP) #

That argument was made, and addressed in the emails. How are you defining compression? Actual compassion doesn't seem to have been a requirement in the challenge.

3. maxmcd ◴[24 Nov 24 00:09 UTC] No.42224936[source]▶

>>42224763 (TP) #

Is there any more information about this solution?

4. stavros ◴[24 Nov 24 00:32 UTC] No.42225046[source]▶

>>42224763 (TP) #

If Mike didn't want multiple files, he should have said no to multiple files. The fact that he wanted $100 per try shows that he just got outhustled at his own hustle. He should have been more particular about the rules, and upheld the ones he himself set.

replies(1): >>42225559 #

5. Timwi ◴[24 Nov 24 02:05 UTC] No.42225431[source]▶

>>42224763 (TP) #

He clearly failed at doing any actual compression, but he did not fail at the challenge. He satisfied every single condition.

He obviously knows that. He knows exactly what compression is and that his solution is not it. He showed (successfully) that meeting the challenge as it was worded did not require compression, which the setter of the challenge didn't realize.

6. PaulHoule ◴[24 Nov 24 02:42 UTC] No.42225559[source]▶

>>42225046 #

Should have charged $100 a file.

7. elahieh ◴[24 Nov 24 04:29 UTC] No.42225920[source]▶

>>42224763 (TP) #

I'm rather skeptical of this claim that the data was compressed in 2018 because there is no further information, apart from a hash value given.

If it's a true claim they must have identified some "non-random" aspect of the original data, and then they could have given more info.

replies(2): >>42231463 #>>42235151 #

8. tugu77 ◴[24 Nov 24 23:18 UTC] No.42231463[source]▶

>>42225920 #

Easy.

Save the sha256 hash of original.dat in compressed.dat. The decompressor cats /dev/random until data of the right size comes out with the correct hash.

Now there are two cases.

1. The reconstructed data is actually equal to original.dat. Challenge won, cash in $5000.

2. The reconstructed data differs from original.dat. It has the same hash though, so you found a collision in sha256. World fame.

In either case, win!

9. hgomersall ◴[25 Nov 24 10:58 UTC] No.42235151[source]▶

>>42225920 #

Not necessarily. Consider a big file of random uniformly distributed bytes. It's easy to show that in practice some bytes are more common than others (because random), and that necessarily therefore the expected spacing between those specific bytes is less than 256, which gives you a small fraction of a bit you can save in a recoding of those specific bytes (distance from last byte of a specific value).

With a big enough file those fractions of a bit add up to a non trivial number of bits. You can be cunning about how you encode your deltas too (next delta makes use of remaining unused bits from previous delta).

I haven't worked through all the details, so it might be in the end result everything rebalances to say no, but I'd like to withhold judgement for the moment.

replies(1): >>42236986 #

10. nkrisc ◴[25 Nov 24 12:14 UTC] No.42235599[source]▶

>>42224763 (TP) #

If the challenge allowed for a decompressor + compressed file size as little as 1 byte less than the original file, was it ever really about compression then? If that’s not in the spirit of the competition, then why was it allowed by the competition?

11. l33t7332273 ◴[25 Nov 24 15:19 UTC] No.42236986{3}[source]▶

>>42235151 #

>It's easy to show that in practice some bytes are more common than others (because random)

I don’t follow. Wouldn’t that be (because not random)

replies(1): >>42238208 #

12. hgomersall ◴[25 Nov 24 17:37 UTC] No.42238208{4}[source]▶

>>42236986 #

If you generate a billion bytes using a random byte generator, and bin the resultant array into 256 bins, it will not be perfectly flat. You can use that non-flatness to encode your bits more efficiently. I suspect just using codes to do it won't work well because the bin values are so close so you'll struggle to get codes that are efficient enough, but I suspect you can use the second order difference-between-specific byte as the encoded value. That has a much more pronounced distribution heavily weighted to small values.