←back to thread

454 points nathan-barry | 1 comments | | HN request time: 0.199s | source
Show context
jaaustin ◴[] No.45644944[source]
To my knowledge this connection was first noted in 2021 in https://arxiv.org/abs/2107.03006 (page 5). We wanted to do text diffusion where you’d corrupt words to semantically similar words (like “quick brown fox” -> “speedy black dog”) but kept finding that masking was easier for the model to uncover. Historically this goes back even further to https://arxiv.org/abs/1904.09324, which made a generative MLM without framing it in diffusion math.
replies(3): >>45645010 #>>45647380 #>>45648253 #
1. axiom92 ◴[] No.45647380[source]
Yeah, that's the first formal reference I remember as well (although, BERT is probably the first thing NLP folks will think of after reading about diffusion).

I collected a few other text-diffusion early references here about 3 years ago: https://github.com/madaan/minimal-text-diffusion?tab=readme-....