←back to thread

Gemini 2.5 Flash Image

(developers.googleblog.com)
1092 points meetpateltech | 3 comments | | HN request time: 0.499s | source
1. elorant ◴[] No.45027449[source]
I have a certain use case for such image generators. Feed them an entire news article I fetch from bbc and ask it to create an image to accompany the article. Thus far only midjourney managed to understand context. And now this, which is even more impressive. We live in interesting times.
replies(2): >>45028133 #>>45030204 #
2. oracleclyde ◴[] No.45028133[source]
I just tried it inside Gemini with a Medium article. Here's my prompt: "Read the article at this url and provide a hero image that incapsulates the message the author wants to convey: https://bioneers.org/supreme-oligarchy-billionaires-supreme-..."

The response was a summary of the article that was pretty good, along with an image that dagnabbit, read the assignment.

3. vunderba ◴[] No.45030204[source]
I think most of the SOTA models could probably handle this but you'd probably get better results using a pipeline:

1. Reduce article to a synopsis using an LLM

2. Generate 4-5 varying description prompts from the synopsis

3. Feed the prompts to an imagegen model

Though I'd wager that gpt-image-1 (in the ChatGPT) being multimodal could probably managed it as well.