←back to thread

Gemini 2.5 Flash Image

(developers.googleblog.com)
1092 points meetpateltech | 3 comments | | HN request time: 0s | source
Show context
fariszr ◴[] No.45027760[source]
This is the gpt 4 moment for image editing models. Nano banana aka gemini 2.5 flash is insanely good. It made a 171 elo point jump in lmarena!

Just search nano banana on Twitter to see the crazy results. An example. https://x.com/D_studioproject/status/1958019251178267111

replies(19): >>45028040 #>>45028152 #>>45028346 #>>45028352 #>>45029095 #>>45029173 #>>45029967 #>>45030536 #>>45031380 #>>45031995 #>>45032126 #>>45032293 #>>45032553 #>>45034187 #>>45034818 #>>45036034 #>>45036038 #>>45036949 #>>45038452 #
qingcharles ◴[] No.45029173[source]
I've been testing it for several weeks. It can produce results that are truly epic, but it's still a case of rerolling the prompt a dozen times to get an image you can use. It's not God. It's definitely an enormous step though, and totally SOTA.
replies(5): >>45029418 #>>45029473 #>>45030561 #>>45033110 #>>45035698 #
1. druskacik ◴[] No.45029473[source]
Is it because the model is not good enough at following the prompt, or because the prompt is unclear?

Something similar has been the case with text models. People write vague instructions and are dissatisfied when the model does not correctly guess their intentions. With image models it's even harder for model to guess it right without enough details.

replies(2): >>45029572 #>>45030646 #
2. qingcharles ◴[] No.45029572[source]
No, my prompts are very, very clear. It just won't follow them sometimes. Also this model seems to prefer shorter prompts, in my experience.
3. toddmorey ◴[] No.45030646[source]
Remember in image editing, the source image itself is a huge part of the prompt, and that's often the source of the ambiguity. The model may clearly understand your prompt to change the color of a shirt, but struggle to understand the boundaries of the shirt. I was just struggling to use AI to edit an image where the model really wanted the hat in the image to be the hair of the person wearing it. My guess for that bias is that it had just been trained on more faces without hats than with them on.