Gemini 2.5 Flash Image

(developers.googleblog.com)

1092 points meetpateltech | 2 comments | 26 Aug 25 14:01 UTC | HN request time: 0.437s | source

Also: https://deepmind.google/models/gemini/image/, https://techcrunch.com/2025/08/26/google-geminis-ai-image-mo...

Show context

vunderba ◴[26 Aug 25 16:49 UTC] No.45029113[source]▶

I've updated the GenAI Image comparison site (which focuses heavily on strict text-to-image prompt adherence) to reflect the new Google Gemini 2.5 Flash model (aka nano-banana).

https://genai-showdown.specr.net

This model gets 8 of the 12 prompts correct and easily comes within striking distance of the best-in-class models Imagen and gpt-image-1 and is a significant upgrade over the old Gemini Flash 2.0 model. The reigning champ, gpt-image-1, only manages to edge out Flash 2.5 on the maze and 9-pointed star.

What's honestly most astonishing to me is how long gpt-image-1 has remained at the top of the class - closing in on half a year which is basically a lifetime in this field. Though fair warning, gpt-image-1 is borderline useless as an "editor" since it almost always changes the whole image instead of doing localized inpainting-style edits like Kontext, Qwen, or Nano-Banana.

Comparison of gpt-image-1, flash, and imagen.

https://genai-showdown.specr.net?models=OPENAI_4O%2CIMAGEN_4...

replies(7): >>45030193 #>>45030194 #>>45030942 #>>45032937 #>>45033671 #>>45036899 #>>45041270 #

1. MrOrelliOReilly ◴[27 Aug 25 08:31 UTC] No.45036899[source]▶

>>45029113 #

This is incredibly useful! I was manually generating my own model comparisons last night, so great to see this :)

I will note that, personally, while adherence is a useful measure, it does miss some of the qualitative differences between models. For your "spheron" test for example, you note that "4o absolutely dominated this test," but the image exhibits all the hallmarks of a ChatGPT-generated image that I personally dislike (yellow, with veiny, almost impasto brush strokes). I have stopped using ChatGPT for image generation altogether because I find the style so awful. I wonder what objective measures one could track for "style"?

It reminders be a bit of ChatGPT vs Claude for software development... Regardless of how each scores on benchmarks, Claude has been a clear winner in terms of actual results.

replies(1): >>45043229 #

2. vunderba ◴[27 Aug 25 18:30 UTC] No.45043229[source]▶

>>45036899 (TP) #

Yeah - unfortunately the ubiquitous "piss filter" strikes again. You pretty much have to pass GPT-image-1 through a tone map, LUT, etc. in something like Krita or Photoshop to try to mitigate this. I'm honestly a bit surprised that they haven't built this in already given how obvious the color shift is.

↑