←back to thread

454 points nathan-barry | 1 comments | | HN request time: 0s | source
Show context
kibwen ◴[] No.45645307[source]
To me, the diffusion-based approach "feels" more akin to whats going on in an animal brain than the token-at-a-time approach of the in-vogue LLMs. Speaking for myself, I don't generate words one a time based on previously spoken words; I start by having some fuzzy idea in my head and the challenge is in serializing it into language coherently.
replies(14): >>45645350 #>>45645383 #>>45645401 #>>45645402 #>>45645509 #>>45645523 #>>45645607 #>>45645665 #>>45645670 #>>45645891 #>>45645973 #>>45647491 #>>45648578 #>>45652892 #
sailingparrot ◴[] No.45645973[source]
> the token-at-a-time approach of the in-vogue LLMs. Speaking for myself, I don't generate words one a time based on previously spoken words

Autoregressive LLMs don't do that either actually. Sure with one forward pass you only get one token at a time, but looking at what is happening in the latent space there are clear signs of long term planning and reasoning that go beyond just the next token.

So I don't think it's necessarily more or less similar to us than diffusion, we do say one word at a time sequentially, even if we have the bigger picture in mind.

replies(5): >>45646422 #>>45650316 #>>45654585 #>>45656793 #>>45663541 #
wizzwizz4 ◴[] No.45646422[source]
If a process is necessary for performing a task, (sufficiently-large) neural networks trained on that task will approximate that process. That doesn't mean they're doing it anything resembling efficiently, or that a different architecture / algorithm wouldn't produce a better result.
replies(2): >>45646920 #>>45647495 #
sailingparrot ◴[] No.45647495[source]
I’m not arguing about efficiency though ? Simply saying next token predictors cannot be thought of as actually just thinking about the next token with no long term plan.
replies(1): >>45648362 #
wizzwizz4 ◴[] No.45648362[source]
They rebuild the "long term plan" anew for every token: there's no guarantee that the reconstructed plan will remain similar between tokens. That's not how planning normally works. (You can find something like this every time there's this kind of gross inefficiency, which is why I gave the general principle.)
replies(3): >>45648898 #>>45648950 #>>45651754 #
1. nl ◴[] No.45651754{4}[source]
Right, and this is what "reasoning LLMs" work around by having explicitly labelly "reasoning tokens".

This lets them "save" the plan between tokens, so when regenerating the new token it is following the plan.