Concrete benchmarks like these are very valuable.
Defining the reward function, which is basically what ARC is doing, is 50% of the problem solving process.
Defining the reward function, which is basically what ARC is doing, is 50% of the problem solving process.