←back to thread

Gemini 2.5 Flash Image

(developers.googleblog.com)
1092 points meetpateltech | 1 comments | | HN request time: 0.481s | source
Show context
elorant ◴[] No.45027449[source]
I have a certain use case for such image generators. Feed them an entire news article I fetch from bbc and ask it to create an image to accompany the article. Thus far only midjourney managed to understand context. And now this, which is even more impressive. We live in interesting times.
replies(2): >>45028133 #>>45030204 #
1. vunderba ◴[] No.45030204[source]
I think most of the SOTA models could probably handle this but you'd probably get better results using a pipeline:

1. Reduce article to a synopsis using an LLM

2. Generate 4-5 varying description prompts from the synopsis

3. Feed the prompts to an imagegen model

Though I'd wager that gpt-image-1 (in the ChatGPT) being multimodal could probably managed it as well.