←back to thread

454 points nathan-barry | 6 comments | | HN request time: 0.815s | source | bottom
Show context
kibwen ◴[] No.45645307[source]
To me, the diffusion-based approach "feels" more akin to whats going on in an animal brain than the token-at-a-time approach of the in-vogue LLMs. Speaking for myself, I don't generate words one a time based on previously spoken words; I start by having some fuzzy idea in my head and the challenge is in serializing it into language coherently.
replies(14): >>45645350 #>>45645383 #>>45645401 #>>45645402 #>>45645509 #>>45645523 #>>45645607 #>>45645665 #>>45645670 #>>45645891 #>>45645973 #>>45647491 #>>45648578 #>>45652892 #
ma2rten ◴[] No.45645523[source]
Interpretability research has found that Autoregressive LLMs also plan ahead what they are going to say.
replies(2): >>45645712 #>>45646027 #
1. aidenn0 ◴[] No.45645712[source]
This seems likely just from the simple fact that they can reliably generate contextually correct sentences in e.g. German Imperfekt.
replies(3): >>45651812 #>>45651822 #>>45653730 #
2. ma2rten ◴[] No.45651812[source]
It's actually true on many levels, if you think about is needed for generating syntactically and grammatically correct sentences, coherent text and working code.
replies(1): >>45658031 #
3. treis ◴[] No.45651822[source]
I don't think you're wrong but I don't think your logic holds up here. If you have a literal translation like:

I have a hot dog _____

The word in the blank is not necessarily determined when the sentenced is started. Several verbs fit at the end and the LLM doesn't need to know which it's going to pick when it starts. Each word narrows down the possibilities:

I - Trillions Have - Billions a - millions hot - thousands dog - dozens _____ - Could be eaten, cooked, thrown, whatever.

If it chooses cooked at this point that doesn't necessarily mean that the LLM was going to do that when it chose "I" or "have"

replies(1): >>45652378 #
4. aidenn0 ◴[] No.45652378[source]
That's why I hedged with "seems likely" and added "in context." If this is in the middle of a paragraph, then there are many fewer options to fit in the blank from the very start.
5. rcxdude ◴[] No.45653730[source]
And, to pick an example from the research, being able to generate output that rhymes. In fact, it's hard to see how you would produce anything that would be considered coherent text without some degree of planning ahead at some level of abstraction. If it was truly one token at a time without any regard for what comes next it would constantly 'paint itself into a corner' and be forced to produce nonsense (which, it seems, does still happen sometimes, but without any planning it would occur constantly).
6. aidenn0 ◴[] No.45658031[source]
Just generating syntactically and grammatically correct sentences doesn't need much lookahead; prefixes to sentences that cannot be properly completed are going to be extremely unlikely to be generated.