ARC AGI v2: 17.6% -> 52.9%
SWE Verified: 76.3% -> 80%
That's pretty good!
ARC AGI v2: 17.6% -> 52.9%
SWE Verified: 76.3% -> 80%
That's pretty good!
Edit: if you disagree, try actually TAKING the Arc-AGI 2 test, then post.
The idea behind Arc-AGI is that you can train all you want on the answers, because knowing the solution to one problem isn't helpful on the others.
In fact, the way the test works is that the model is given several examples of worked solutions for each problem class, and is then required to infer the underlying rule(s) needed to solve a different instance of the same type of problem.
That's why comparing Arc-AGI to chess or other benchmaxxing exercises is completely off base.
(IMO, an even better test for AGI would be "Make up some original Arc-AGI problems.")