←back to thread

454 points nathan-barry | 1 comments | | HN request time: 0s | source
Show context
kibwen ◴[] No.45645307[source]
To me, the diffusion-based approach "feels" more akin to whats going on in an animal brain than the token-at-a-time approach of the in-vogue LLMs. Speaking for myself, I don't generate words one a time based on previously spoken words; I start by having some fuzzy idea in my head and the challenge is in serializing it into language coherently.
replies(14): >>45645350 #>>45645383 #>>45645401 #>>45645402 #>>45645509 #>>45645523 #>>45645607 #>>45645665 #>>45645670 #>>45645891 #>>45645973 #>>45647491 #>>45648578 #>>45652892 #
ma2rten ◴[] No.45645523[source]
Interpretability research has found that Autoregressive LLMs also plan ahead what they are going to say.
replies(2): >>45645712 #>>45646027 #
aidenn0 ◴[] No.45645712[source]
This seems likely just from the simple fact that they can reliably generate contextually correct sentences in e.g. German Imperfekt.
replies(3): >>45651812 #>>45651822 #>>45653730 #
1. rcxdude ◴[] No.45653730{3}[source]
And, to pick an example from the research, being able to generate output that rhymes. In fact, it's hard to see how you would produce anything that would be considered coherent text without some degree of planning ahead at some level of abstraction. If it was truly one token at a time without any regard for what comes next it would constantly 'paint itself into a corner' and be forced to produce nonsense (which, it seems, does still happen sometimes, but without any planning it would occur constantly).