←back to thread

220 points Vt71fcAqt7 | 1 comments | | HN request time: 0.525s | source
Show context
cpldcpu ◴[] No.41864743[source]
>We introduce a new Autoencoder (AE) that aggressively increases the scaling factor to 32. Compared with AE-F8, our AE-F32 outputs 16× fewer latent tokens,

Basically they compress/decompress the images more, which means they need less computation during generation. But on the flip side this should mean less variability.

Isn't this more of a design trade-off than an optimization?

replies(1): >>41865236 #
1. Lerc ◴[] No.41865236[source]
It might not be compressing more (haven't yet looked at the paper). You can have fewer but larger tokens for the same amount of data.

It would decrease the workload by having fewer things to compare against balanced against workload per comparison. For normal N² that makes sense but the page says.

We introduce a new linear DiT, replacing vanilla quadratic attention and reducing complexity from O(N²) to O(N) Mix-FFN

So not sure what's up there.