←back to thread

Getting 50% (SoTA) on Arc-AGI with GPT-4o

(redwoodresearch.substack.com)
394 points tomduncalf | 2 comments | | HN request time: 0.487s | source
Show context
TheDudeMan ◴[] No.40712792[source]
"Vision is an especially large weakness."

But you can have GPT write code to reliably convert the image grid into a textual representation, right? And code to convert back to image and auto-verify.

replies(2): >>40713429 #>>40727801 #
1. p1esk ◴[] No.40713429[source]
GPT-4o might not have been trained on a sufficiently large amount of visual data to develop advanced spatial intelligence. Perhaps it needs to see a lot more images, or perhaps it needs to be trained differently (e.g. predict the next frame in a video). I suspect SORA has more spatial intelligence internally than 4o.
replies(1): >>40714606 #
2. ◴[] No.40714606[source]