What Is Entropy?

(jasonfantl.com)

288 points jfantl | 1 comments | 14 Apr 25 18:32 UTC | HN request time: 0.22s | source

Show context

glial ◴[14 Apr 25 19:46 UTC] No.43685469[source]▶

One thing that helped me was the realization that, at least as used in the context of information theory, entropy is a property of an individual (typically the person receiving a message) and NOT purely of the system or message itself.

> entropy quantifies uncertainty

This sums it up. Uncertainty is the property of a person and not a system/message. That uncertainty is a function of both a person's model of a system/message and their prior observations.

You and I may have different entropies about the content of the same message. If we're calculating the entropy of dice rolls (where the outcome is the 'message'), and I know the dice are loaded but you don't, my entropy will be lower than yours.

replies(4): >>43685585 #>>43686121 #>>43687411 #>>43688999 #

ninetyninenine ◴[14 Apr 25 19:55 UTC] No.43685585[source]▶

>>43685469 #

Not true. The uncertainty of the dice rolls is not controlled by you. It is the property of the loaded dice itself.

Here's a better way to put it. If I roll the dice infinite times. The uncertainty of the outcome of the dice will become evident in the distribution of the outcomes of the dice. Whether you or another person is certain or uncertain of this does not indicate anything.

Now when you realize this you'll start to think about this thing in probability called frequentists vs. bayesian and you'll realize that all entropy is, is a consequence of probability and that the philosophical debate in probability applies to entropy as well because they are one and the same.

I think the word "entropy" confuses people into thinking it's some other thing when really it's just probability at work.

replies(3): >>43685604 #>>43686183 #>>43691399 #

glial ◴[14 Apr 25 19:57 UTC] No.43685604[source]▶

>>43685585 #

I concede that my framing was explicitly Bayesian, but with that caveat, it absolutely is true: your uncertainty is a function of your knowledge, which is a model of the world, but is not equivalent to the world itself.

Suppose I had a coin that only landed on heads. You don't know this and you flip the coin. According to your argument, for the first flip, your entropy about the outcome of the flip is zero. However, you wouldn't be able to tell me which way the coin would land, making your entropy nonzero. This is a contradiction.

replies(1): >>43686906 #

1. nyrikki ◴[14 Apr 25 22:19 UTC] No.43686906[source]▶

>>43685604 #

To add to this.

Both the Bayesian vs frequentist interpretations make understanding the problem challenging, as both are powerful interpretations to find the needle in the haystack, when the problem is finding the hay in the haystack.

A better lens is that a recursive binary sequence (coin flips) is an algorithmically random sequence if and only if it is a Chaitin's number.[1]

Chaitin's number is normal, which is probably easier understood with decimal digits meaning that with any window size, over time the distribution, the distribution of 0-9 will be the same.

This is why HALT ≈ open frame ≈ system identification ≈ symbol grounding problems.

Probabilities are very powerful for problems like The dining philosophers problem or the Byzantine generals problem, they are still grabbing needles every time they reach into the hay stack.

Pretty much any almost all statement is a hay in the haystack problem. For example almost all real numbers are normal, but we have only found a few.

We can construct them, say with .101010101 in base 2 .123123123123 in base 3 etc...but we can't access them.

Given access to the true reals, you have 0 percent chance of picking a computable number, rational, etc... but a 100% chance of getting a normal number or 100% chance of getting an uncomputable number.

Bayesian vs frequentist interpretations allow us to make useful predictions, but they are the map, not the territory.

Bayesian iid data and Frequentist iid random variables play the exact similar roles Enthalpy, Gibbs free energy, statistical entropy, information theory entropy, Shannon Entropy etc...

The difference between them is the independent variables that they depend on and the needs of the model they are serving.

You can also approach the property that people often want to communicate when using the term entropy as effective measure 0 sets, null cover, martingales, kolmogorov complexity, compressibility, set shattering, etc...

As a lens, null cover is most useful in my mind, as a random real number should not have any "uncommon" properties, or look more like the normal reals.

This is very different from statistical methods, or any effective usable algorithm/program, which absolutely depend on "uncommon" properties.

Which is exactly the hay in the problem of finding the hay haystack problem, hay is boring.

[1]https://www.cs.auckland.ac.nz/~cristian/samplepapers/omegast...

↑