ARC AGI v2: 17.6% -> 52.9%
SWE Verified: 76.3% -> 80%
That's pretty good!
ARC AGI v2: 17.6% -> 52.9%
SWE Verified: 76.3% -> 80%
That's pretty good!
Edit: if you disagree, try actually TAKING the Arc-AGI 2 test, then post.
Look no farther than the hodgepodge of independent teams running cheaper models (and no doubt thousands of their own puzzles, many of which surely overlap with the private set) that somehow keep up with SotA, to see how impactful proper practice can be.
The benchmark isn’t particularly strong against gaming, especially with private data.
No, it isn't. Go take the test yourself and you'll understand how wrong that is. Arc-AGI is intentionally unlike any other benchmark.
Not to humble-brag, but I also outperform on IQ tests well beyond my actual intelligence, because "find the pattern" is fun for me and I'm relatively good at visual-spatial logic. I don't find their ability to measure 'intelligence' very compelling.
What would be an example of a test for machine intelligence that you would accept? I've already suggested one (namely, making up more of these sorts of tests) but it'd be good to get some additional opinions.