Getting 50% (SoTA) on Arc-AGI with GPT-4o

(redwoodresearch.substack.com)

394 points tomduncalf | 3 comments | 17 Jun 24 21:51 UTC | HN request time: 0s | source

Show context

whiplash451 ◴[18 Jun 24 07:45 UTC] No.40715123[source]▶

The article jumps to the conclusion that "Given that current LLMs can perform decently well on ARC-AGI" after having used multiple hand-crafted tricks to get to these results, including "I also did a small amount of iteration on a 100 problem subset of the public test set" which is hidden in the middle of the article and not mentioned in the bullet list at the top.

Adding the close-to ad-hominem attack on Francois Chollet with the comics at the beginning (Francois never claimed to be a neuro-symbolic believer), this work does a significant disservice to the community.

replies(4): >>40715887 #>>40716039 #>>40716432 #>>40718813 #

1. killerstorm ◴[18 Jun 24 11:22 UTC] No.40716432[source]▶

>>40715123 #

I think this work is great.

A lot of top researchers claim that obvious deficiencies in LLM training are fundamental flaws in transformer architecture, as they are interested in doing some new research.

This work show that temporary issues are temporary. E.g. LLM is not trained on grid inputs, but can figure things out after preprocessing.

replies(1): >>40718119 #

2. whiplash451 ◴[18 Jun 24 14:15 UTC] No.40718119[source]▶

>>40716432 (TP) #

My claim is _not_ that this work is not useful. But however "great" your work is, misleading on the steps you took during your experiments and overselling your results is never a valid approach in research.

replies(1): >>40718747 #

3. killerstorm ◴[18 Jun 24 15:14 UTC] No.40718747[source]▶

>>40718119 #

This is a blog post, sir. All details are written down. He's very clear about methods, it seems you're 1) biased; 2) have too high standards for blog posts.

↑