LLMs understand nullability

(dmodel.ai)

170 points mattmarcus | 1 comments | 07 Apr 25 14:52 UTC | HN request time: 0.206s | source

Show context

lsy ◴[07 Apr 25 17:48 UTC] No.43614042[source]▶

The article puts scare quotes around "understand" etc. to try to head off critiques around the lack of precision or scientific language, but I think this is a really good example of where casual use of these terms can get pretty misleading.

Because code LLMs have been trained on the syntactic form of the program and not its execution, it's not correct — even if the correlation between variable annotations and requested completions was perfect (which it's not) — to say that the model "understands nullability", because nullability means that under execution the variable in question can become null, which is not a state that it's possible for a model trained only on a million programs' syntax to "understand". You could get the same result if e.g. "Optional" means that the variable becomes poisonous and checking "> 0" is eating it, and "!= None" is an antidote. Human programmers can understand nullability because they've hopefully run programs and understand the semantics of making something null.

The paper could use precise, scientific language (e.g. "the presence of nullable annotation tokens correlates to activation of vectors corresponding to, and emission of, null-check tokens with high precision and accuracy") which would help us understand what we can rely on the LLM to do and what we can't. But it seems like there is some subconscious incentive to muddy how people see these models in the hopes that we start ascribing things to them that they aren't capable of.

replies(9): >>43614302 #>>43614352 #>>43614384 #>>43614470 #>>43614508 #>>43614723 #>>43615651 #>>43616059 #>>43616871 #

waldrews ◴[07 Apr 25 18:34 UTC] No.43614508[source]▶

>>43614042 #

I was going to say "so you believe the LLM's don't have the capacity to understand" but then I realized that the precise language would be something like "the presence of photons in this human's retinas in patterns encoding statements about LLM's having understanding correlates to the activation of neuron signaling chains corresponding to, and emission of, muscle activations engaging keyboard switches, which produce patterns of 'no they don't' with high frequency."

The critiques of mental state applied to the LLM's are increasingly applicable to us biologicals, and that's the philosophical abyss we're staring down.

replies(3): >>43615279 #>>43615833 #>>43615903 #

shafyy ◴[07 Apr 25 21:02 UTC] No.43615903[source]▶

>>43614508 #

Countering the argument that LLMs are just gloriefied probability machines and do not undertand or think with "how do you know humans are not the same" has been the biggest achievement of AI hypemen (and yes, it's mostly men).

Of course, now you can say "how do you know that our brains are not just efficient computers that run LLMs", but I feel like the onus of proof lies on the makers of this claim, not on the other side.

It is very likely that human intelligence is not just autocomplete on crack, given all we know about neuroscience so far.

replies(2): >>43616482 #>>43618260 #

BobbyTables2 ◴[08 Apr 25 04:06 UTC] No.43618260[source]▶

>>43615903 #

I’ve come to realize AI works as well as it does because it was trained extensively on the same kinds of things people normally ask. So, it already has the benefit of vast amounts of human responses.

Of course, ask it a PhD level question and it will confidently hallucinate more than Beavis & Butthead.

It really is a damn glorified autocomplete, unfortunately very useful as a search engine replacement.

replies(1): >>43619114 #

uh_uh ◴[08 Apr 25 07:10 UTC] No.43619114[source]▶

>>43618260 #

The LLM is a glorified autocomplete in as much as you are a glorified replicator. Yes, it was trained on autocomplete but that doesn't say much about what capabilities might emerge.

replies(1): >>43619130 #

shafyy ◴[08 Apr 25 07:13 UTC] No.43619130[source]▶

>>43619114 #

> Yes, it was trained on autocomplete but that doesn't say much about what capabilities might emerge.

No, but we know how it works and it is just a stochastic parrot. There is no magic in there.

What is more suprising to me that humans are so predictable that a glorified autocomplete works this well. Then again, it's not that suprising....

replies(1): >>43619959 #

uh_uh ◴[08 Apr 25 10:01 UTC] No.43619959[source]▶

>>43619130 #

Sorry but this is nonsense. Do you have a theory about when certain LLM capabilities emerge? AFAIK we don't have a good theory about when and why they do emerge.

But even if knew how something works (which in present case we don't), shouldn't diminish our opinion of it. Will you have a lesser opinion of human intelligence, once we figure out how it works?

replies(3): >>43620237 #>>43620572 #>>43624170 #

slowmovintarget ◴[08 Apr 25 17:19 UTC] No.43624170[source]▶

>>43619959 #

There has been, to date, no demonstrated emergence from LLMs. There has been probabilistic drift in their outputs based on their inputs (training set, training time, reinforcement, fine-tuning, system prompts, and inference parameters). All of these effects on outputs are predictable, and all are first order effects. We don't have any emergence yet.

We do have proofs that hallucination will always be a problem. We have proofs that the "reasoning" for models that "think" are actually regurgitation of human explanations written out. When asked to do truly novel things, the models fail. When asked to do high-precision things, the models fail. When asked to do high-accuracy things, the models fail.

LLMs don't understand. They are search engines. We are experience engines, and philosophically, we don't have a way to tokenize experience, we can only tokenize its description. So while LLMs can juggle descriptions all day long, these algorithms do so disconnected from the underlying experiences required for understanding.

replies(1): >>43624815 #

1. uh_uh ◴[08 Apr 25 18:16 UTC] No.43624815[source]▶

>>43624170 #

Examples of emergence:

1. Multi-step reasoning with backtracking when DeepSeek R1 was trained via GRPO.

2. Translation of languages they haven't even seen via in-context learning.

3. Arithmetic: heavily correlated with model size, but it does appear.

I could go on.

Albeit it's not an LLM, but a deep learning model trained via RL, would you say that AlphaZero's move 37 also doesn't count as emergence and the model has no understanding of Go?

↑