(nathan.rs)

454 points nathan-barry | 5 comments | 20 Oct 25 14:31 UTC | HN request time: 0.001s | source

1. thatguysaguy ◴[20 Oct 25 16:21 UTC] No.45645680[source]▶

Back when BERT came out, everyone was trying to get it to generate text. These attempts generally didn't work, here's one for reference though: https://arxiv.org/abs/1902.04094

This doesn't have an explicit diffusion tie in, but Savinov et al. at DeepMind figured out that doing two steps at training time and randomizing the masking probability is enough to get it to work reasonably well.

replies(2): >>45648159 #>>45649173 #

2. binarymax ◴[20 Oct 25 19:31 UTC] No.45648159[source]▶

>>45645680 (TP) #

Interesting as I was in the (very large) camp that never considered it for generation, and saw it as a pure encoder for things like semantic similarity with an easy jump to classification, etc

3. thatjoeoverthr ◴[20 Oct 25 20:49 UTC] No.45649173[source]▶

>>45645680 (TP) #

Im just learning this from your text, after spending last week trying to get a BERT model to talk.

https://joecooper.me/blog/crosstalk/

I’ve still got a few ideas to try though so I’m not done having fun with it.

replies(1): >>45655874 #

4. Anon84 ◴[21 Oct 25 13:56 UTC] No.45655874[source]▶

>>45649173 #

The trick is to always put the [MASK] at the end:

"The [MASK]" "The quick [MASK]" etc

replies(1): >>45668581 #

5. thatjoeoverthr ◴[22 Oct 25 13:11 UTC] No.45668581{3}[source]▶

>>45655874 #

I've saved this and I'll study this when I come back to it. Thanks!

↑

BERT is just a single text diffusion step