BERT is just a single text diffusion step

(nathan.rs)

Show context

kibwen ◴[20 Oct 25 15:52 UTC] No.45645307[source]▶

To me, the diffusion-based approach "feels" more akin to whats going on in an animal brain than the token-at-a-time approach of the in-vogue LLMs. Speaking for myself, I don't generate words one a time based on previously spoken words; I start by having some fuzzy idea in my head and the challenge is in serializing it into language coherently.

replies(14): >>45645350 #>>45645383 #>>45645401 #>>45645402 #>>45645509 #>>45645523 #>>45645607 #>>45645665 #>>45645670 #>>45645891 #>>45645973 #>>45647491 #>>45648578 #>>45652892 #

ma2rten ◴[20 Oct 25 16:08 UTC] No.45645523[source]▶

>>45645307 #

Interpretability research has found that Autoregressive LLMs also plan ahead what they are going to say.

replies(2): >>45645712 #>>45646027 #

1. aidenn0 ◴[20 Oct 25 16:24 UTC] No.45645712[source]▶

>>45645523 #

This seems likely just from the simple fact that they can reliably generate contextually correct sentences in e.g. German Imperfekt.

replies(3): >>45651812 #>>45651822 #>>45653730 #

2. ma2rten ◴[21 Oct 25 02:30 UTC] No.45651812[source]▶

>>45645712 (TP) #

It's actually true on many levels, if you think about is needed for generating syntactically and grammatically correct sentences, coherent text and working code.

replies(1): >>45658031 #

3. treis ◴[21 Oct 25 02:31 UTC] No.45651822[source]▶

>>45645712 (TP) #

I don't think you're wrong but I don't think your logic holds up here. If you have a literal translation like:

I have a hot dog _____

The word in the blank is not necessarily determined when the sentenced is started. Several verbs fit at the end and the LLM doesn't need to know which it's going to pick when it starts. Each word narrows down the possibilities:

I - Trillions Have - Billions a - millions hot - thousands dog - dozens _____ - Could be eaten, cooked, thrown, whatever.

If it chooses cooked at this point that doesn't necessarily mean that the LLM was going to do that when it chose "I" or "have"

replies(1): >>45652378 #

4. aidenn0 ◴[21 Oct 25 04:05 UTC] No.45652378[source]▶

>>45651822 #

That's why I hedged with "seems likely" and added "in context." If this is in the middle of a paragraph, then there are many fewer options to fit in the blank from the very start.

5. rcxdude ◴[21 Oct 25 08:36 UTC] No.45653730[source]▶

>>45645712 (TP) #

And, to pick an example from the research, being able to generate output that rhymes. In fact, it's hard to see how you would produce anything that would be considered coherent text without some degree of planning ahead at some level of abstraction. If it was truly one token at a time without any regard for what comes next it would constantly 'paint itself into a corner' and be forced to produce nonsense (which, it seems, does still happen sometimes, but without any planning it would occur constantly).

6. aidenn0 ◴[21 Oct 25 16:48 UTC] No.45658031[source]▶

>>45651812 #

Just generating syntactically and grammatically correct sentences doesn't need much lookahead; prefixes to sentences that cannot be properly completed are going to be extremely unlikely to be generated.

↑