Google is winning on every AI front

1. ruuda ◴[12 Apr 25 06:15 UTC] No.43661815[source]▶

I'm trying Imagen 3 to add pictures to a presentation in Google Slides, and it's making such basic mistakes that I thought image models weren't making any more by now. I tried for half an hour to prompt it into generating an illustration of a Thinkpad facing with the back to the viewer, so the keyboard is not visible. It couldn't do it, it would always make the keyboard face towards the viewer. Or you ask for an illustration of an animal pointing a finger, and it gives it an additional arm. Meanwhile you ask OpenAI to ghiblify a picture while changing the setting and adding 5 other things, and it absolutely nails it.

replies(3): >>43661826 #>>43661862 #>>43662012 #

2. remoquete ◴[12 Apr 25 06:17 UTC] No.43661826[source]▶

>>43661815 (TP) #

Image generation is extremely good in GPT now. Claude's edge is UX. But I doubt Google won't catch up on both fronts. It has the technology and manpower.

3. boznz ◴[12 Apr 25 06:27 UTC] No.43661862[source]▶

>>43661815 (TP) #

I thought it was just me. A few hours ago Gemini told me "As a language model, I'm not able to assist you with that." This was after generating an image a few minutes earlier. I think the copy/paste buffer pulled in some old source files I had attached a few days earlier (no idea how) because under the "sources and related content" it now showed two files Gemini is obviously calling its brother imagen for offloading the image generation, which is smart I guess if it works

replies(1): >>43662261 #

4. vunderba ◴[12 Apr 25 06:57 UTC] No.43662012[source]▶

>>43661815 (TP) #

From my comparison tests focusing on prompt adherence, I would agree 4o edges out Imagen3 as long as speed is not a concern.

https://genai-showdown.specr.net

If Imagen3 had the multimodal features that 4o had, it would certainly put it closer to 4o, but being able to instructively change an image (instruct pix2pix style) is incredibly powerful.

It's crazy how far GenAI for imagery has come. Just few short years ago, you would have struggled just to get three colored cubes stacked on top of each other in a specific order SHRDLU style. Now? You can prompt for a specific four-pane comic strip and have it reasonably follow your directives.

5. Hikikomori ◴[12 Apr 25 07:43 UTC] No.43662261[source]▶

>>43661862 #

Can Gemini 2.5 pro generate images? It only describes them for me.

replies(1): >>43662302 #

6. boznz ◴[12 Apr 25 07:51 UTC] No.43662302{3}[source]▶

>>43662261 #

I'm using 2.0 Flash and if I ask it, it says yes it can, but it does seem hit and miss as above.