(nathan.rs)

454 points nathan-barry | 1 comments | 20 Oct 25 14:31 UTC | HN request time: 0.208s | source

1. nodja ◴[20 Oct 25 17:30 UTC] No.45646627[source]▶

I think another easy improvement to this diffusion model would be for the logprobs to also affect the chance of a token being turned into a mask. So higher confidence tokens should have less of a chance to be pruned, should converge faster. I wonder if backprop would be able exploit that. (I'm not an ML engineer).

↑

BERT is just a single text diffusion step