(developers.googleblog.com)

1092 points meetpateltech | 1 comments | 26 Aug 25 14:01 UTC | HN request time: 0.481s | source

Also: https://deepmind.google/models/gemini/image/, https://techcrunch.com/2025/08/26/google-geminis-ai-image-mo...

Show context

elorant ◴[26 Aug 25 14:54 UTC] No.45027449[source]▶

I have a certain use case for such image generators. Feed them an entire news article I fetch from bbc and ask it to create an image to accompany the article. Thus far only midjourney managed to understand context. And now this, which is even more impressive. We live in interesting times.

replies(2): >>45028133 #>>45030204 #

1. vunderba ◴[26 Aug 25 18:13 UTC] No.45030204[source]▶

>>45027449 #

I think most of the SOTA models could probably handle this but you'd probably get better results using a pipeline:

1. Reduce article to a synopsis using an LLM

2. Generate 4-5 varying description prompts from the synopsis

3. Feed the prompts to an imagegen model

Though I'd wager that gpt-image-1 (in the ChatGPT) being multimodal could probably managed it as well.

↑

Gemini 2.5 Flash Image