(developers.googleblog.com)

1092 points meetpateltech | 3 comments | 26 Aug 25 14:01 UTC | HN request time: 0.499s | source

Also: https://deepmind.google/models/gemini/image/, https://techcrunch.com/2025/08/26/google-geminis-ai-image-mo...

1. elorant ◴[26 Aug 25 14:54 UTC] No.45027449[source]▶

I have a certain use case for such image generators. Feed them an entire news article I fetch from bbc and ask it to create an image to accompany the article. Thus far only midjourney managed to understand context. And now this, which is even more impressive. We live in interesting times.

replies(2): >>45028133 #>>45030204 #

2. oracleclyde ◴[26 Aug 25 15:44 UTC] No.45028133[source]▶

>>45027449 (TP) #

I just tried it inside Gemini with a Medium article. Here's my prompt: "Read the article at this url and provide a hero image that incapsulates the message the author wants to convey: https://bioneers.org/supreme-oligarchy-billionaires-supreme-..."

The response was a summary of the article that was pretty good, along with an image that dagnabbit, read the assignment.

3. vunderba ◴[26 Aug 25 18:13 UTC] No.45030204[source]▶

>>45027449 (TP) #

I think most of the SOTA models could probably handle this but you'd probably get better results using a pipeline:

1. Reduce article to a synopsis using an LLM

2. Generate 4-5 varying description prompts from the synopsis

3. Feed the prompts to an imagegen model

Though I'd wager that gpt-image-1 (in the ChatGPT) being multimodal could probably managed it as well.

↑

Gemini 2.5 Flash Image