Gemini 2.5 Flash Image

(developers.googleblog.com)

1092 points meetpateltech | 2 comments | 26 Aug 25 14:01 UTC | HN request time: 0.455s | source

Also: https://deepmind.google/models/gemini/image/, https://techcrunch.com/2025/08/26/google-geminis-ai-image-mo...

Show context

skybrian ◴[26 Aug 25 16:55 UTC] No.45029197[source]▶

Like most image generators, it didn’t pass the piano keyboard test. (Black keys are wrong.)

https://aistudio.google.com/app/prompts?state=%7B%22ids%22:%...

replies(9): >>45029266 #>>45029269 #>>45029353 #>>45029404 #>>45029503 #>>45029767 #>>45029823 #>>45029961 #>>45032710 #

pbhjpbhj ◴[26 Aug 25 17:52 UTC] No.45029961[source]▶

>>45029197 #

Are their models that have vector space that includes ideas, not just words/media but not entirely corporeal aspects?

So when generating a video of someone playing a keyboard the model would incorporate the idea of repeating groups of 8 tones, which is a fixed ideational aspect which might not be strongly represented in words adjacent to "piano".

It seems like models need help with knowing what should be static, or homomorphic, across or within images associated with the same word vectors and that words alone don't provide a strong enough basis [*1] for this.

*1 - it's so hard to find non-conflicting words, obviously I don't mean basis as in basis vectors, though there is some weak analogy.

replies(1): >>45031004 #

1. heyjamesknight ◴[26 Aug 25 19:21 UTC] No.45031004[source]▶

>>45029961 #

How would you encode those ideas?

replies(1): >>45039430 #

2. pbhjpbhj ◴[27 Aug 25 13:31 UTC] No.45039430[source]▶

>>45031004 (TP) #

I don't know, in part that's why I asked ... I wonder if there's a way to provide a loosely-defined space.

Perhaps it's a second word-vector space that allows context defined associations? Maybe it just needs tighter association of piano_keyboard with 8-step_repetition??

↑