The VAE Used for Stable Diffusion Is Flawed

1. ants_everywhere ◴[01 Feb 24 13:44 UTC] No.39215828[source]▶

>>39215242 (OP) #

I'm curious, was this well-known by experts already? How surprising is this?

I enjoyed the write up.

replies(3): >>39215866 #>>39215887 #>>39218438 #

2. GaggiX ◴[01 Feb 24 13:50 UTC] No.39215866[source]▶

>>39215828 (TP) #

I have never heard of this problem before, and I have seen a lot of discussion about VAE from researchers.

3. dwringer ◴[01 Feb 24 13:53 UTC] No.39215887[source]▶

>>39215828 (TP) #

If one ever tried to make edits to the latents prior to decoding them with a VAE in SD1.5 and then in SDXL, it could be seen that that local changes had somewhat unpredictable and global effects on the image in SD1.5, while in SDXL the changes have more predictable impacts to the output image and some of the different latent channels end up corresponding more directly to the resulting image channels.

Definitely a fascinating write-up. I have been curious about these differences for a while, though I had never considered this a "problem" per se.

4. numpad0 ◴[01 Feb 24 17:20 UTC] No.39218438[source]▶

>>39215828 (TP) #

I've once seen someone on Twitter wondering about to-them-obviously-bug with VAE leading to oddly saturated images in anime space, just my dumb brain keyword search though