The $5000 Compression Challenge (2001)

(www.patrickcraig.co.uk)

181 points ekiauhce | 3 comments | 23 Nov 24 21:10 UTC | HN request time: 0.795s | source

Show context

omoikane ◴[24 Nov 24 00:14 UTC] No.42224958[source]▶

The original email thread was from 2001, and it gets posted to HN periodically:

https://news.ycombinator.com/from?site=patrickcraig.co.uk

For another compression challenge that is still ongoing, try "500000€ Prize for Compressing Human Knowledge" (also known as "Hutter Prize"):

http://prize.hutter1.net/

https://news.ycombinator.com/item?id=37502329 - Hutter Prize for compressing human knowledge (2023-09-13, 215 comments)

replies(2): >>42225155 #>>42232092 #

vlovich123 ◴[24 Nov 24 00:53 UTC] No.42225155[source]▶

>>42224958 #

I have a fundamental problem with the Hutter prize stating that intelligence is related to compression & then sponsoring a prize for lossless compression. Intelligence is related to lossy compression. Lossless is mainly a mechanistic act.

replies(5): >>42225198 #>>42225266 #>>42231630 #>>42232462 #>>42233497 #

_hark ◴[24 Nov 24 01:23 UTC] No.42225266[source]▶

>>42225155 #

Say we have some dataset composed of D bytes.

Next, say I find some predictive model of the data M, where M is composed of N bytes. Furthermore, let us say that the entropy of the dataset under the model is H bytes.

Then, if N + H < D, my model compresses the data.

It doesn't matter if the model is deterministic or probabilistic: a probabilistic model can be used to (losslessly) compress a dataset with entropy coding.

One more argument for compression being equivalent to intelligence: Across many fields of statistical machine learning, there are generalization bounds which have the form:

test error <= train error + model complexity

That is, we don't expect to do any worse on new (test) data, than the sum of the train error and the model complexity (smallest compressed size). Notice that if you interpret the train error as the entropy of the data (i.e. error under a cross entropy loss), then the models which satisfy the statistical generalization bound correspond to those which best compress the data. In other words: the model which produces the shortest description of the data is the one which is expected to generalize best.

replies(2): >>42225426 #>>42231019 #

1. dooglius ◴[24 Nov 24 02:04 UTC] No.42225426[source]▶

>>42225266 #

This seems to assume that there is a tractable way to encode H efficiently, but this seems very difficult given a model that is focused on understanding the content. Ex: I can easily write a program that can do basic arithmetic, but given say a bitmap scan of elementary school math materials, such a program gives me no easy way to compress that; rather something generic like PNG (that does not know or understand the material) will far outperform.

replies(1): >>42225704 #

2. _hark ◴[24 Nov 24 03:23 UTC] No.42225704[source]▶

>>42225426 (TP) #

Great point. This points to the related issue: what do we want to compress? Do we want to compress "the answer", here the arithmetic expression's solution, or do we want to compress the image?

You can formalize this with rate--distortion theory, by defining a distortion function that says what your objective is. That implies a well-defined complexity relative to that distortion function.

Okay to be clear, I've written a paper on exactly this topic which will be announced in a week or so. So you won't find anything on the subject yet, haha. But I use almost exactly this example.

replies(1): >>42226127 #

3. Jerrrry ◴[24 Nov 24 05:38 UTC] No.42226127[source]▶

>>42225704 #

  >Okay to be clear, I've written a paper on exactly this topic which will be announced in a week or so. So you won't find anything on the subject yet, haha. But I use almost exactly this example.

I would use Floating Point arithmetic as the example/analogy: one trades off accuracy/precision for exactness/correct-ness when in the extremities. Answers near the more granular representations will be only be able to represented by their nearest value. If this is forsee-able, the floating point implementation can be adjusted to change where the floating point's "extremities'" are.

I used the above analogy and the following when articulating the magnitude of near-lossless-ness that large LLM's have managed to attain, especially when all of the humanities corpus is compressed into a USB flash drive; the Kolmogorov complexity re-mitted/captured is similar to a master-piece like the Mona Lisa having to be described in X many brush-strokes to achieve Y amount of fidelity.

↑