What Is Entropy?

(jasonfantl.com)

Show context

glial ◴[14 Apr 25 19:46 UTC] No.43685469[source]▶

One thing that helped me was the realization that, at least as used in the context of information theory, entropy is a property of an individual (typically the person receiving a message) and NOT purely of the system or message itself.

> entropy quantifies uncertainty

This sums it up. Uncertainty is the property of a person and not a system/message. That uncertainty is a function of both a person's model of a system/message and their prior observations.

You and I may have different entropies about the content of the same message. If we're calculating the entropy of dice rolls (where the outcome is the 'message'), and I know the dice are loaded but you don't, my entropy will be lower than yours.

replies(4): >>43685585 #>>43686121 #>>43687411 #>>43688999 #

ninetyninenine ◴[14 Apr 25 19:55 UTC] No.43685585[source]▶

>>43685469 #

Not true. The uncertainty of the dice rolls is not controlled by you. It is the property of the loaded dice itself.

Here's a better way to put it. If I roll the dice infinite times. The uncertainty of the outcome of the dice will become evident in the distribution of the outcomes of the dice. Whether you or another person is certain or uncertain of this does not indicate anything.

Now when you realize this you'll start to think about this thing in probability called frequentists vs. bayesian and you'll realize that all entropy is, is a consequence of probability and that the philosophical debate in probability applies to entropy as well because they are one and the same.

I think the word "entropy" confuses people into thinking it's some other thing when really it's just probability at work.

replies(3): >>43685604 #>>43686183 #>>43691399 #

bloppe ◴[14 Apr 25 20:50 UTC] No.43686183[source]▶

>>43685585 #

Probability is subjective though, because macrostates are subjective.

The notion of probability relies on the notion of repeatability: if you repeat a coin flip infinite times, what proportion of outcomes will be heads, etc. But if you actually repeated the toss exactly the same way every time, say with a finely-tuned coin-flipping machine in a perfectly still environment, you would always get the same result.

We say that a regular human flipping a coin is a single macrostate that represents infinite microstates (the distribution of trajectories and spins you could potentially impart on the coin). But who decides that? Some subjective observer. Another finely tuned machine could conceivably detect the exact trajectory and spin of the coin as it leaves your thumb and predict with perfect accuracy what the outcome will be. According to that machine, you're not repeating anything. You're doing a new thing every time.

replies(1): >>43687296 #

canjobear ◴[14 Apr 25 23:11 UTC] No.43687296[source]▶

>>43686183 #

Probability is a bunch of numbers that add to 1. Sometimes you can use this to represent subjective beliefs. Sometimes you can use it to represent objectively existing probability distributions. For example, an LLM is a probability distribution on a following token given previous tokens. If two "observers" disagree about an LLM's probability assigned to some token, then only at most one of them can be correct. So the probability is objective.

replies(2): >>43688108 #>>43689218 #

1. kgwgk ◴[15 Apr 25 05:25 UTC] No.43689218[source]▶

>>43687296 #

> If two "observers" disagree about an LLM's probability assigned to some token, then only at most one of them can be correct.

The observer who knows the implementation in detail and the state of the pseudo-random number generator can predict the next token with certainty. (Or almost certainty, if we consider flip-switching cosmic rays, etc.)

replies(1): >>43689432 #

2. canjobear ◴[15 Apr 25 06:00 UTC] No.43689432[source]▶

>>43689218 (TP) #

That’s the probability to observe a token given the prompt and the seed. The probability assigned to a token given the prompt alone is a separate thing, which is objectively defined independent of any observer and can be found by reading out the model logits.

replies(1): >>43689453 #

3. kgwgk ◴[15 Apr 25 06:03 UTC] No.43689453[source]▶

>>43689432 #

Yes, that’s a purely mathematical abstract concept that exists outside of space and time. The labels “objective” and “subjective” are usually used to talk about probabilities in relation to the physical world.

replies(2): >>43689575 #>>43689589 #

4. ◴[15 Apr 25 06:23 UTC] No.43689575{3}[source]▶

>>43689453 #

5. canjobear ◴[15 Apr 25 06:25 UTC] No.43689589{3}[source]▶

>>43689453 #

An LLM distribution exists in the physical world, just as much as this comment does. It didn’t exist before the model was trained. It has relation to the physical world: it assigns probabilities to subword units of text. It has commercial value that it wouldn’t have if its objective probability values were different.

replies(1): >>43689760 #

6. kgwgk ◴[15 Apr 25 06:53 UTC] No.43689760{4}[source]▶

>>43689589 #

> It has relation to the physical world: it assigns probabilities to subword units of text.

How is that probability assignment linked to the physical world exactly? In the physical world the computer will produce a token. You rejected before that it was about predicting the token that would be produced.

replies(1): >>43689822 #

7. kgwgk ◴[15 Apr 25 07:04 UTC] No.43689822{5}[source]▶

>>43689760 #

Or maybe you mean that the probability assignments are not about the output of a particular LLM implementation in the real world but about subword units of text in the wild.

In that case how could two different LLMs do different assigments to the same physical world without being wrong? Would they be “objective” but unrelated to the “object”?

↑