←back to thread

454 points nathan-barry | 1 comments | | HN request time: 0.401s | source
Show context
BoiledCabbage ◴[] No.45645494[source]
To me part of the appeal of image diffusion models was starting with random noise to produce an image. Why do text diffudion models start with a blank slate (ie all "masked" tokens), instead of with random tokens?
replies(2): >>45645710 #>>45646617 #
1. ttul ◴[] No.45646617[source]
It depends on what you want the model to do for you. If you want the model to complete text, then you would provide the input text unmasked followed by a number of masked tokens that it's the model's job to fill in. Perhaps your goal is to have the model simply make edits to a bit of code. In that case, you'd mask out the part that it's supposed to edit and the model would iteratively fill in those masked tokens with generated tokens.

One of the powerful abilities of text diffusion models is supposedly in coding. Auto-regressive LLMs don't inherently come with the ability to edit. They can generate instructions that another system interprets as editing commands. Being able to literally unmask the parts you want to edit is a pretty powerful paradigm that could improve if not just speed up many coding tasks.

I suspect that elements of text diffusion will be baked into coding models like GPT Codex (if they aren't already). There's no reason you could not train a diffusion output head specifically designed for code editing and the same model is able to make use of that head when it makes the most sense to do so.