←back to thread

396 points doener | 2 comments | | HN request time: 0s | source
Show context
vunderba ◴[] No.46175068[source]
I've done some preliminary testing with Z-Image Turbo in the past week.

Thoughts

- It's fast (~3 seconds on my RTX 4090)

- Surprisingly capable of maintaining image integrity even at high resolutions (1536x1024, sometimes 2048x2048)

- The adherence is impressive for a 6B parameter model

Some tests (2 / 4 passed):

https://imgpb.com/exMoQ

Personally I find it works better as a refiner model downstream of Qwen-Image 20b which has significantly better prompt understanding but has an unnatural "smoothness" to its generated images.

replies(6): >>46175104 #>>46175331 #>>46177028 #>>46177043 #>>46177543 #>>46178707 #
echelon ◴[] No.46175104[source]
So does this finally replace SDXL?

Is Flux 1/2/Kontext left in the dust by the Z Image and Qwen combo?

replies(3): >>46175236 #>>46175387 #>>46178341 #
tripplyons ◴[] No.46175236[source]
SDXL has been outclassed for a while, especially since Flux came out.
replies(2): >>46175257 #>>46177243 #
aeon_ai ◴[] No.46175257{3}[source]
Subjective. Most in creative industries regularly still use SDXL.

Once Z-image base comes out and some real tuning can be done, I think it has a chance of replacing it for the function SDXL has

replies(2): >>46176045 #>>46178832 #
Scrapemist ◴[] No.46176045{4}[source]
Source?
replies(1): >>46177285 #
echelon ◴[] No.46177285{5}[source]
Most of the people I know doing local AI prefer SDXL to Flux. Lots of people are still using SDXL, even today.

Flux has largely been met with a collective yawn.

The only thing Flux had going for it was photorealism and prompt adherence. But the skin and jaws of the humans it generated looked weird, it was difficult to fine tune, and the licensing was weird. Furthermore, Flux never had good aesthetics. It always felt plain.

Nobody doing anime or cartoons used Flux. SDXL continues to shine here. People doing photoreal kept using Midjourney.

replies(1): >>46178815 #
kouteiheika ◴[] No.46178815{6}[source]
> it was difficult to fine tune

Yep. It's pretty difficult to fine tune, mostly because it's a distilled model. You can fine tune it a little bit, but it will quickly collapse and start producing garbage, even though fundamentally it should have been an easier architecture to fine-tune compared to SDXL (since it uses the much more modern flow matching paradigm).

I think that's probably the reason why we never really got any good anime Flux models (at least not as good as they were for SDXL). You just don't have enough leeway to be able to train the model for long enough to make the model great for a domain it's currently suboptimal for without completely collapsing it.

replies(2): >>46180111 #>>46186850 #
1. echelon ◴[] No.46186850{7}[source]
How much would it cost the community to pretrain something with a more modern architecture?

Assuming it was carefully done in stages (more compute) to make sure no mistakes are made?

I suppose we won't need to with the Chinese gifting so much open source recently?

replies(1): >>46195403 #
2. kouteiheika ◴[] No.46195403[source]
> How much would it cost the community to pretrain something with a more modern architecture?

Quite a lot. Search for "Chroma" (which was a partial-ish retraining of Flux Schnell) or Pony (which was a partial-ish retraining of SDXL). You're probably looking at a cost of at least tens of thousands or even hundred of thousands of dollars. Even bigger SDXL community finetunes like bigASP cost thousands.

And it's not only the compute that's the issue. You also need a ton of data. You need a big dataset, with millions of images, and you need it cleaned, filtered, and labeled.

And of course you need someone who knows what they're doing. Training these state-of-art models takes quite a bit of skill, especially since a lot of it is pretty much a black art.