(arxiv.org)

568 points PaulHoule | 1 comments | 07 Jul 25 12:31 UTC | HN request time: 0.22s | source

1. awaymazdacx5 ◴[07 Jul 25 14:49 UTC] No.44490962[source]▶

Having token embeddings with diffusion models, for 16x16 transformer encoding. Image is tokenized before transformers compile it. If decomposed virtualization modulates according to a diffusion model.

↑

Mercury: Ultra-fast language models based on diffusion