←back to thread

454 points nathan-barry | 1 comments | | HN request time: 0s | source
Show context
kibwen ◴[] No.45645307[source]
To me, the diffusion-based approach "feels" more akin to whats going on in an animal brain than the token-at-a-time approach of the in-vogue LLMs. Speaking for myself, I don't generate words one a time based on previously spoken words; I start by having some fuzzy idea in my head and the challenge is in serializing it into language coherently.
replies(14): >>45645350 #>>45645383 #>>45645401 #>>45645402 #>>45645509 #>>45645523 #>>45645607 #>>45645665 #>>45645670 #>>45645891 #>>45645973 #>>45647491 #>>45648578 #>>45652892 #
sailingparrot ◴[] No.45645973[source]
> the token-at-a-time approach of the in-vogue LLMs. Speaking for myself, I don't generate words one a time based on previously spoken words

Autoregressive LLMs don't do that either actually. Sure with one forward pass you only get one token at a time, but looking at what is happening in the latent space there are clear signs of long term planning and reasoning that go beyond just the next token.

So I don't think it's necessarily more or less similar to us than diffusion, we do say one word at a time sequentially, even if we have the bigger picture in mind.

replies(5): >>45646422 #>>45650316 #>>45654585 #>>45656793 #>>45663541 #
wizzwizz4 ◴[] No.45646422[source]
If a process is necessary for performing a task, (sufficiently-large) neural networks trained on that task will approximate that process. That doesn't mean they're doing it anything resembling efficiently, or that a different architecture / algorithm wouldn't produce a better result.
replies(2): >>45646920 #>>45647495 #
jama211 ◴[] No.45646920[source]
It also doesn’t mean they’re doing it inefficiently.
replies(1): >>45647093 #
pinkmuffinere ◴[] No.45647093{3}[source]
I read this to mean “just because the process doesn’t match the problem, that doesn’t mean it’s inefficient”. But I think it does mean that. I expect we intuitively know that data structures which match the structure of a problem are more efficient than those that don’t. I think the same thing applies here.

I realize my argument is hand wavey, i haven’t defined “efficient“ (in space? Time? Energy?), and there are other shortcomings, but I feel this is “good enough” to be convincing

replies(2): >>45647687 #>>45658371 #
1. jama211 ◴[] No.45658371{4}[source]
I suppose there’s something in what you’re saying, it’s just that’s it’s sorta vague and hard to parse for me. It also depends on the higher order problem space, for example: is it efficient if the problem is defined by “make something that can adapt to a problem space and solve it without manual engineering” rather than “make something with a long lead up time where you understand the problem space in advance and therefore have time to optimise the engine”. In the former, the neural network would indeed count as solving this efficiently, because of the given definition of the goal.