←back to thread

188 points gkamradt | 1 comments | | HN request time: 0s | source
Show context
gkamradt ◴[] No.43465162[source]
Hey HN, Greg from ARC Prize Foundation here.

Alongside Mike Knoop and François Francois Chollet, we’re launching ARC-AGI-2, a frontier AI benchmark that measures a model’s ability to generalize on tasks it hasn’t seen before, and the ARC Prize 2025 competition to beat it.

In Dec ‘24, ARC-AGI-1 (2019) pinpointed the moment AI moved beyond pure memorization as seen by OpenAI's o3.

ARC-AGI-2 targets test-time reasoning.

My view is that good AI benchmarks don't just measure progress, they inspire it. Our mission is to guide research towards general systems.

Base LLMs (no reasoning) are currently scoring 0% on ARC-AGI-2. Specialized AI reasoning systems (like R1 or o3-mini) are <4%.

Every (100%) of ARC-AGI-2 tasks, however, have been solved by at least two humans, quickly and easily. We know this because we tested 400 people live.

Our belief is that once we can no longer come up with quantifiable problems that are "feasible for humans and hard for AI" then we effectively have AGI. ARC-AGI-2 proves that we do not have AGI.

Change log from ARC-AGI-2 to ARC-AGI-2: * The two main evaluation sets (semi-private, private eval) have increased to 120 tasks * Solving tasks requires more reasoning vs pure intuition * Each task has been confirmed to have been solved by at least 2 people (many more) out of an average of 7 test taskers in 2 attempts or less * Non-training task sets are now difficulty-calibrated

The 2025 Prize ($1M, open-source required) is designed to drive progress on this specific gap. Last year's competition (also launched on HN) had 1.5K teams participate and had 40+ research papers published.

The Kaggle competition goes live later this week and you can sign up here: https://arcprize.org/competition

We're in an idea-constrained environment. The next AGI breakthrough might come from you, not a giant lab.

Happy to answer questions.

replies(13): >>43465254 #>>43466394 #>>43466647 #>>43467579 #>>43467810 #>>43468015 #>>43468067 #>>43468081 #>>43468268 #>>43468318 #>>43468455 #>>43468706 #>>43468931 #
synapsomorphy ◴[] No.43466647[source]
Thanks for your awesome work Greg!

The success of o3 directly contradicts us being in an "idea-constrained environment", what makes you believe that?

replies(2): >>43468917 #>>43469112 #
jononor ◴[] No.43469112[source]
Not Greg/team, so unrelated opinion. o3 solution for ARC v1 was incredibly expensive. Some good ideas are at least needed to take that cost down by a factor 100-10000x.
replies(1): >>43469964 #
1. torginus ◴[] No.43469964{3}[source]
Yeah my analogy for that solution is like claiming to have solved sorting arrays by using enormous compute to try all possible orderings of arrays of length 100.

It's not a real solution because:

- It's way too expensive

- It doesn't scale the way a real solution does