Z-Image: Powerful and highly efficient image generation model with 6B parameters

1. pawelduda ◴[06 Dec 25 17:10 UTC] No.46174861[source]▶

>>46095817 (OP) #

Did anyone test it on 5090? I saw some 30xx reports and it seemed very fast

replies(2): >>46175501 #>>46177259 #

2. Wowfunhappy ◴[06 Dec 25 18:35 UTC] No.46175501[source]▶

>>46174861 (TP) #

Even on my 4080 it's extremely fast, it takes ~15 seconds per image.

replies(1): >>46177791 #

3. egeres ◴[06 Dec 25 22:43 UTC] No.46177259[source]▶

>>46174861 (TP) #

Incredibly fast, on my 5090 with CUDA 13 (& the latest diffusers, xformers, transformers, etc...), 9 samplig steps and the "Tongyi-MAI/Z-Image-Turbo" model I get:

- 1.5s to generate an image at 512x512

- 3.5s to generate an image at 1024x1024

- 26.s to generate an image at 2048x2048

It uses almost all the 32Gb Gb of VRAM and GPU usage. I'm using the script from the HF post: https://huggingface.co/Tongyi-MAI/Z-Image-Turbo

replies(1): >>46179262 #

4. accrual ◴[06 Dec 25 23:56 UTC] No.46177791[source]▶

>>46175501 #

Did you use PyTorch Native or Diffusers Inference? I couldn't get the former working yet so I used Diffusers, but it's terribly slow on my 4080 (4 min/image). Trying again with PyTorch now, seems like Diffusers is expected to be slow.

replies(1): >>46177830 #

5. Wowfunhappy ◴[07 Dec 25 00:01 UTC] No.46177830{3}[source]▶

>>46177791 #

Uh, not sure? I downloaded the portable build of ComfyUI and ran the CUDA-specific batch file it comes with.

(I'm not used to using Windows and I don't know how to do anything complicated on that OS. Unfortunately, the computer with the big GPU also runs Windows.)

replies(1): >>46177979 #

6. accrual ◴[07 Dec 25 00:19 UTC] No.46177979{4}[source]▶

>>46177830 #

Haha, I know how it goes. Thanks, I'll give that a try!

Update: works great and much faster via ComfyUI + the provided workflow file.

7. SV_BubbleTime ◴[07 Dec 25 04:54 UTC] No.46179262[source]▶

>>46177259 #

Weird, even at 2048 I don’t think it should be using all your 32GB VRAM.

replies(1): >>46180877 #

8. egeres ◴[07 Dec 25 11:08 UTC] No.46180877{3}[source]▶

>>46179262 #

It stays around 26Gb at 512x512. I still haven't profiled the execution or looked much into the details of the architecture but I would assume it trades off memory for speed by creating caches for each inference step

replies(1): >>46182526 #

9. SV_BubbleTime ◴[07 Dec 25 15:44 UTC] No.46182526{4}[source]▶

>>46180877 #

IDK. Seems odd. It’s an 11GB model, I don’t know what it could caching in ram.