Gemini 2.5 Flash Image | slacker news

1. skybrian ◴[26 Aug 25 16:55 UTC] No.45029197[source]▶

Like most image generators, it didn’t pass the piano keyboard test. (Black keys are wrong.)

https://aistudio.google.com/app/prompts?state=%7B%22ids%22:%...

replies(9): >>45029266 #>>45029269 #>>45029353 #>>45029404 #>>45029503 #>>45029767 #>>45029823 #>>45029961 #>>45032710 #

2. mikepurvis ◴[26 Aug 25 16:59 UTC] No.45029266[source]▶

>>45029197 (TP) #

Interesting! I feel like that's maybe similar to the business of being able to correctly generate images of text— it looks like the idea of a keyboard to a non-musician, but is immediately wrong to someone who is actually familiar with it at all.

I wonder if the bot is forced to generate something new— certainly for a prompt like that it would be acceptable to just pick the first result off a google image search and be like "there, there's your picture of a piano keyboard".

3. joombaga ◴[26 Aug 25 16:59 UTC] No.45029269[source]▶

>>45029197 (TP) #

What is the piano keyboard test? Your link requires granting AI Studio access to Google Drive, which I do not want to do.

replies(1): >>45029407 #

4. vunderba ◴[26 Aug 25 17:05 UTC] No.45029353[source]▶

>>45029197 (TP) #

Anything that is heavily periodic can definitely trip up image gen - that being I just used Flux Kontext T2I and got a got pretty close (disregard the hammers though since thats a right mess). Only towards the upper register did it start to make mistakes.

https://imgur.com/a/fyX42my

5. psbp ◴[26 Aug 25 17:08 UTC] No.45029404[source]▶

>>45029197 (TP) #

Doesn't pass the analog clock test either.

6. raincole ◴[26 Aug 25 17:08 UTC] No.45029407[source]▶

>>45029269 #

Just ask it to generate a correct piano keyboard. It's something the current gen of image generator AIs fail at.

replies(1): >>45031084 #

7. Workaccount2 ◴[26 Aug 25 17:16 UTC] No.45029503[source]▶

>>45029197 (TP) #

The selling point of this model really seems to be it's consistency between generations rather than it's raw generating ability.

for instance:

https://aistudio.google.com/app/prompts/1gTG-D92MyzSKaKUeBu2...

replies(1): >>45031105 #

8. carimura ◴[26 Aug 25 17:37 UTC] No.45029767[source]▶

>>45029197 (TP) #

or my "hands with palms facing down" test.... no matter how hard I try it just can't get open hands, palms down.

replies(2): >>45030007 #>>45030065 #

9. cubefox ◴[26 Aug 25 17:41 UTC] No.45029823[source]▶

>>45029197 (TP) #

Like most image models, except GPT-4o, it also didn't pass the wooden Penrose triangle test. (It creates normal triangles.)

10. pbhjpbhj ◴[26 Aug 25 17:52 UTC] No.45029961[source]▶

>>45029197 (TP) #

Are their models that have vector space that includes ideas, not just words/media but not entirely corporeal aspects?

So when generating a video of someone playing a keyboard the model would incorporate the idea of repeating groups of 8 tones, which is a fixed ideational aspect which might not be strongly represented in words adjacent to "piano".

It seems like models need help with knowing what should be static, or homomorphic, across or within images associated with the same word vectors and that words alone don't provide a strong enough basis [*1] for this.

*1 - it's so hard to find non-conflicting words, obviously I don't mean basis as in basis vectors, though there is some weak analogy.

replies(1): >>45031004 #

11. pbhjpbhj ◴[26 Aug 25 17:56 UTC] No.45030007[source]▶

>>45029767 #

I guess the vast majority of images have the palms the other way, that this biases the output. It's like how we misinterpret images to generate optical illusions, because we're expecting valid 3D structures (Escher's staircases, say).

replies(1): >>45030084 #

12. vunderba ◴[26 Aug 25 18:00 UTC] No.45030065[source]▶

>>45029767 #

It's probably just a matter of rerolling a few times. I was able to get it around 25% of the time.

https://imgur.com/a/H9gH3Zy

replies(1): >>45035104 #

13. vunderba ◴[26 Aug 25 18:01 UTC] No.45030084{3}[source]▶

>>45030007 #

Yes - it's the same reason generating a 5-leaf clover fails - massive amounts of training data that predisposes the model against it.

14. heyjamesknight ◴[26 Aug 25 19:21 UTC] No.45031004[source]▶

>>45029961 #

How would you encode those ideas?

replies(1): >>45039430 #

15. ZiiS ◴[26 Aug 25 19:26 UTC] No.45031084{3}[source]▶

>>45029407 #

Do most humans pass?

replies(3): >>45031284 #>>45031551 #>>45031760 #

16. skybrian ◴[26 Aug 25 19:28 UTC] No.45031105[source]▶

>>45029503 #

I can’t see it. You probably need to set permissions to “anyone with the link can access.”

17. adzm ◴[26 Aug 25 19:38 UTC] No.45031284{4}[source]▶

>>45031084 #

2-2-1-2-2-2-1

replies(1): >>45031583 #

18. phainopepla2 ◴[26 Aug 25 19:56 UTC] No.45031551{4}[source]▶

>>45031084 #

Presumably most humans with a camera do

19. polynomial ◴[26 Aug 25 19:58 UTC] No.45031583{5}[source]▶

>>45031284 #

I still feel like most humans would fail, haha.

replies(1): >>45034829 #

20. raincole ◴[26 Aug 25 20:13 UTC] No.45031760{4}[source]▶

>>45031084 #

Most humans fail at 4 digits multiplication, or drawing a cube in perspective.

21. conception ◴[26 Aug 25 21:45 UTC] No.45032710[source]▶

>>45029197 (TP) #

Failed my horizontal text test as well.

22. twodave ◴[27 Aug 25 02:29 UTC] No.45034829{6}[source]▶

>>45031583 #

Maybe, but anyone who knows what a chromatic scale is should be able to reason it out. E# == F, B# == C, so no black keys between those.

23. carimura ◴[27 Aug 25 03:17 UTC] No.45035104{3}[source]▶

>>45030065 #

that's pretty good. I was using a cartoon girl as an example of a dance move for kids.

https://g.co/gemini/share/0e0de0d42029

24. pbhjpbhj ◴[27 Aug 25 13:31 UTC] No.45039430{3}[source]▶

>>45031004 #

I don't know, in part that's why I asked ... I wonder if there's a way to provide a loosely-defined space.

Perhaps it's a second word-vector space that allows context defined associations? Maybe it just needs tighter association of piano_keyboard with 8-step_repetition??