←back to thread

220 points Vt71fcAqt7 | 1 comments | | HN request time: 0.201s | source
Show context
amelius ◴[] No.41869065[source]
Does this finally solve the class of "6 fingers/hand" problems?
replies(1): >>41869314 #
1. ttul ◴[] No.41869314[source]
That problem can be fixed through careful fine-tuning, at the cost of losing some generality because the model is punished for drawing bad fingers. This new method outlined in the paper operates in a highly spatially-compressed latent space, but with more channels than previous models, so each latent pixel has 2x the information content than Flux and 8x the content of SDXL. I do wonder whether the high spatial compression means that high resolution features like fingers will be messed up. On the other hand, the higher channel count in the latent space gives the model more detail per pixel to work with… I guess we’ll just have to see.