Most active commenters

Arc-AGI-2 and ARC Prize 2025

(arcprize.org)

Show context

artificialprint ◴[24 Mar 25 22:00 UTC] No.43465860[source]▶

Oh boy! Some of these tasks are not hard, but require full attention and a lot of counting just to get things right! ARC3 will go 3D perhaps? JK

Congrats on launch, lets see how long it'll take to get saturated

replies(2): >>43465929 #>>43466945 #

1. fchollet ◴[24 Mar 25 22:09 UTC] No.43465929[source]▶

>>43465860 #

ARC 3 is still spatially 2D, but it adds a time dimension, and it's interactive.

replies(3): >>43466406 #>>43466916 #>>43466966 #

2. artninja1988 ◴[24 Mar 25 23:14 UTC] No.43466406[source]▶

>>43465929 (TP) #

I think a lot of people got discouraged, seeing how openai solved arc agi 1 by what seems like brute forcing and throwing money at it. Do you believe arc was solved in the "spirit" of the challenge? Also all the open sourced solutions seem super specific to solving arc. Is this really leading us to human level AI at open ended tasks?

replies(2): >>43466887 #>>43467745 #

3. fchollet ◴[25 Mar 25 00:28 UTC] No.43466887[source]▶

>>43466406 #

It's useful to know what current AI systems can achieve with unlimited test-time compute resources. Ultimately though, the "spirit of the challenge" is efficiency, which is why we're specifically looking for solutions that are at least within 1-2 order of magnitude of cost from being competitive with humans. The Kaggle leaderboard is very resource-constrained, and on the public leaderboard you need to use less than $10,000 in compute to solve 120 tasks.

replies(1): >>43468035 #

4. Vecr ◴[25 Mar 25 00:32 UTC] No.43466916[source]▶

>>43465929 (TP) #

If you aren't joking, that will filter most humans.

replies(1): >>43466995 #

5. christianqchung ◴[25 Mar 25 00:42 UTC] No.43466966[source]▶

>>43465929 (TP) #

Are you in the process of creating tasks that behave as an acid test for AGI? If not, do you think such a task is feasible? I read somewhere in the ARC blog that they define AGI as when creating tasks that is hard for AI but easy for humans becomes virtually impossible.

6. wmf ◴[25 Mar 25 00:47 UTC] No.43466995[source]▶

>>43466916 #

They said at least two people out of 400 solved each problem so they're pretty hard.

replies(1): >>43468990 #

7. mrshadowgoose ◴[25 Mar 25 03:03 UTC] No.43467745[source]▶

>>43466406 #

Strong emphasis on "seems".

I'd encourage you to review the definition of "brute force", and then consider the absolutely immense combinatoric space represented by the grids these puzzles use.

"Brute force" simply cannot touch these puzzles. An amount of understanding and pattern recognition is strictly required, even with the large quantities of test-time compute that were used against arc-agi-1.

replies(1): >>43469429 #

8. Legend2440 ◴[25 Mar 25 04:03 UTC] No.43468035{3}[source]▶

>>43466887 #

Efficiency sounds like a hardware problem as much as a software problem.

$10000 in compute is a moving target, today's GPUs are much much better than 10 years ago.

replies(1): >>43468977 #

9. NitpickLawyer ◴[25 Mar 25 08:10 UTC] No.43468977{4}[source]▶

>>43468035 #

> $10000 in compute is a moving target

And it's also irrelevant in some fields. If you solve a "protein folding" problem that was a blocker for a pharma company, that 10k is peanuts now.

Same for coding. If you can spend 100$ / hr on a "mid-level" SWE agent but you can literally spawn 100 today and 0 tomorrow and reach your clients faster, again the cost is irrelevant.

10. NitpickLawyer ◴[25 Mar 25 08:13 UTC] No.43468990{3}[source]▶

>>43466995 #

I don't think that's correct. They had 400 people receive some questions, and only kept the questions that were solved by at least 2 people. The 400 people didn't all receive 120 questions (they'd have probably got bored).

If you go through the example problems you'll notice that most are testing the "aha" moment. Once you do a couple, you know what to expect, but with larger grids you have to stay focused and keep track of a few things to get it right.

11. Davidzheng ◴[25 Mar 25 09:35 UTC] No.43469429{3}[source]▶

>>43467745 #

Also there's no clear way to verify the solution. There could be easily multiple rules which works on the same examples

↑