←back to thread

396 points doener | 3 comments | | HN request time: 0.001s | source
Show context
pawelduda ◴[] No.46174861[source]
Did anyone test it on 5090? I saw some 30xx reports and it seemed very fast
replies(2): >>46175501 #>>46177259 #
egeres ◴[] No.46177259[source]
Incredibly fast, on my 5090 with CUDA 13 (& the latest diffusers, xformers, transformers, etc...), 9 samplig steps and the "Tongyi-MAI/Z-Image-Turbo" model I get:

- 1.5s to generate an image at 512x512

- 3.5s to generate an image at 1024x1024

- 26.s to generate an image at 2048x2048

It uses almost all the 32Gb Gb of VRAM and GPU usage. I'm using the script from the HF post: https://huggingface.co/Tongyi-MAI/Z-Image-Turbo

replies(1): >>46179262 #
1. SV_BubbleTime ◴[] No.46179262[source]
Weird, even at 2048 I don’t think it should be using all your 32GB VRAM.
replies(1): >>46180877 #
2. egeres ◴[] No.46180877[source]
It stays around 26Gb at 512x512. I still haven't profiled the execution or looked much into the details of the architecture but I would assume it trades off memory for speed by creating caches for each inference step
replies(1): >>46182526 #
3. SV_BubbleTime ◴[] No.46182526[source]
IDK. Seems odd. It’s an 11GB model, I don’t know what it could caching in ram.