Gemini 2.5 Flash Image

(developers.googleblog.com)

1092 points meetpateltech | 5 comments | 26 Aug 25 14:01 UTC | HN request time: 0.001s | source

Also: https://deepmind.google/models/gemini/image/, https://techcrunch.com/2025/08/26/google-geminis-ai-image-mo...

Show context

fariszr ◴[26 Aug 25 15:15 UTC] No.45027760[source]▶

This is the gpt 4 moment for image editing models. Nano banana aka gemini 2.5 flash is insanely good. It made a 171 elo point jump in lmarena!

Just search nano banana on Twitter to see the crazy results. An example. https://x.com/D_studioproject/status/1958019251178267111

replies(19): >>45028040 #>>45028152 #>>45028346 #>>45028352 #>>45029095 #>>45029173 #>>45029967 #>>45030536 #>>45031380 #>>45031995 #>>45032126 #>>45032293 #>>45032553 #>>45034187 #>>45034818 #>>45036034 #>>45036038 #>>45036949 #>>45038452 #

echelon ◴[26 Aug 25 15:59 UTC] No.45028352[source]▶

>>45027760 #

> This is the gpt 4 moment for image editing models.

No it's not.

We've had rich editing capabilities since gpt-image-1, this is just faster and looks better than the (endearingly? called) "piss filter".

Flux Kontext, SeedEdit, and Qwen Edit are all also image editing models that are robustly capable. Qwen Edit especially.

Flux Kontext and Qwen are also possible to fine tune and run locally.

Qwen (and its video gen sister Wan) are also Apache licensed. It's hard not to cheer Alibaba on given how open they are compared to their competitors.

We've left the days of Dall-E, Stable Diffusion, and Midjourney of "prompt-only" text to image generation.

It's also looking like tools like ComfyUI are less and less necessary as those capabilities are moving into the model layer itself.

replies(4): >>45028405 #>>45030428 #>>45031918 #>>45038980 #

1. raincole ◴[26 Aug 25 16:03 UTC] No.45028405[source]▶

>>45028352 #

In other words, this is the gpt 4 moment for image editing models.

Gpt4 isn't "fundamentally different" from gpt3.5. It's just better. That's the exact point the parent commenter was trying to make.

replies(2): >>45028445 #>>45031812 #

2. retinaros ◴[26 Aug 25 16:06 UTC] No.45028445[source]▶

>>45028405 (TP) #

did you see the generated pic demis posted on X? it looks like slop from 2 years ago. https://x.com/demishassabis/status/1960355658059891018

replies(1): >>45028524 #

3. raincole ◴[26 Aug 25 16:11 UTC] No.45028524[source]▶

>>45028445 #

I've tested it on Google AI Studio since it's available to me (which is just a few hours so take it with a grain of salt). The prompt comprehension is uncannily good.

My test is going to https://unsplash.com/s/photos/random and pick two random images, send them both and "integrate the subject from the second image into the first image" as the prompt. I think Gemini 2.5 is doing far better than ChatGPT (admittedly ChatGPT was the trailblazer on this path). FluxKontext seems unable to do that at all. Not sure if I were using it wrong, but it always only considers one image at a time for me.

Edit: Honestly it might not be the 'gpt4 moment." It's better at combining multiple images, but now I don't think it's better at understanding elaborated text prompt than ChatGPT.

replies(1): >>45034181 #

4. jug ◴[26 Aug 25 20:18 UTC] No.45031812[source]▶

>>45028405 (TP) #

I'd say it's more like comparing Sonnet 3.5 to Sonnet 4. GPT-4 was a rather fundamental improvement. It jumped to professional applications compared to the only causal use you could use ChatGPT 3.5 for.

5. echelon ◴[27 Aug 25 00:51 UTC] No.45034181{3}[source]▶

>>45028524 #

> FluxKontext

Flux Kontext is an editing model, but the set of things it can do is incredibly limited. The style of prompting is very bare bones. Qwen (Alibaba) and SeedEdit (ByteDance) are a little better, but they themselves are nowhere near as smart as Gemini 2.5 Flash or gpt-image-1.

Gemini 2.5 Flash and gpt-image-1 are in a class of their own. Very powerful instructive image editing with the ability to understand multiple reference images.

> Edit: Honestly it might not be the 'gpt4 moment." It's better at combining multiple images, but now I don't think it's better at understanding elaborated text prompt than ChatGPT.

Both gpt-image-1 and Gemini 2.5 Flash feel like "Comfy UI in a prompt", but they're still nascent capabilities that get a lot wrong.

When we get a gpt-image-1 with Midjourney aesthetics, better adherence and latency, then we'll have our "GPT 4" moment. It's coming, but we're not there yet.

They need to learn more image editing tricks.

↑