(redwoodresearch.substack.com)

1. rgbrgb ◴[17 Jun 24 22:58 UTC] No.40712154[source]▶

> 50% accuracy on the public test set for ARC-AGI by having GPT-4o

Isn't the public test set public on github and therefore GPT-4o trained on it?

2. bongodongobob ◴[17 Jun 24 23:27 UTC] No.40712401[source]▶

I keep seeing this comment all over the place. Just because something exists 1 time in the training data doesn't mean it can just regurgitate that. That's not how training works. An LLM is not a knowledge database.

replies(2): >>40712453 #>>40713177 #

3. adroniser ◴[17 Jun 24 23:34 UTC] No.40712453[source]▶

>>40712401 #

And yet it doesn't rule out that it can't. See new york times lawsuit

replies(1): >>40712544 #

4. daemonologist ◴[17 Jun 24 23:36 UTC] No.40712472[source]▶

>>40712154 (TP) #

In this case I don't think having seen the Arc set would help much in writing and selecting python scripts to solve the test cases. (Unless someone else has tried this approach before and _their_ results are in the training data.)

It will be good to see the private set results though.

replies(2): >>40712793 #>>40713721 #

5. bongodongobob ◴[17 Jun 24 23:44 UTC] No.40712544{3}[source]▶

>>40712453 #

From old pieces of articles that are quoted all over the internet? That's not surprising.

replies(1): >>40714639 #

6. cma ◴[18 Jun 24 00:28 UTC] No.40712793[source]▶

>>40712472 #

Public discussions of solutions to the public test set will presumably have somewhat similar analogies and/or embeddings to aspects of the python programs that solve them.

7. spencerchubb ◴[18 Jun 24 01:30 UTC] No.40713177[source]▶

>>40712401 #

It could exist many times. People can fork and clone the repo. People are likely to copy the examples and share them online.

8. Truth_In_Lies ◴[18 Jun 24 03:08 UTC] No.40713721[source]▶

>>40712472 #

Yes, someone has tried https://iprc-dip.github.io/DARC/

9. ben_w ◴[18 Jun 24 06:27 UTC] No.40714639{4}[source]▶

>>40712544 #

That's still sufficient for both The Times and for it to be a potential problem in this case.

↑

Getting 50% (SoTA) on Arc-AGI with GPT-4o