←back to thread

Getting 50% (SoTA) on Arc-AGI with GPT-4o

(redwoodresearch.substack.com)
394 points tomduncalf | 9 comments | | HN request time: 0.001s | source | bottom
1. rgbrgb ◴[] No.40712154[source]
> 50% accuracy on the public test set for ARC-AGI by having GPT-4o

Isn't the public test set public on github and therefore GPT-4o trained on it?

replies(2): >>40712401 #>>40712472 #
2. bongodongobob ◴[] No.40712401[source]
I keep seeing this comment all over the place. Just because something exists 1 time in the training data doesn't mean it can just regurgitate that. That's not how training works. An LLM is not a knowledge database.
replies(2): >>40712453 #>>40713177 #
3. adroniser ◴[] No.40712453[source]
And yet it doesn't rule out that it can't. See new york times lawsuit
replies(1): >>40712544 #
4. daemonologist ◴[] No.40712472[source]
In this case I don't think having seen the Arc set would help much in writing and selecting python scripts to solve the test cases. (Unless someone else has tried this approach before and _their_ results are in the training data.)

It will be good to see the private set results though.

replies(2): >>40712793 #>>40713721 #
5. bongodongobob ◴[] No.40712544{3}[source]
From old pieces of articles that are quoted all over the internet? That's not surprising.
replies(1): >>40714639 #
6. cma ◴[] No.40712793[source]
Public discussions of solutions to the public test set will presumably have somewhat similar analogies and/or embeddings to aspects of the python programs that solve them.
7. spencerchubb ◴[] No.40713177[source]
It could exist many times. People can fork and clone the repo. People are likely to copy the examples and share them online.
8. Truth_In_Lies ◴[] No.40713721[source]
Yes, someone has tried https://iprc-dip.github.io/DARC/
9. ben_w ◴[] No.40714639{4}[source]
That's still sufficient for both The Times and for it to be a potential problem in this case.