Maybe he figured out a model that beats ARC-AGI by 85%?
Maybe he figured out a model that beats ARC-AGI by 85%?
People have, I think.
One of the published approaches (BARC) uses GPT-4o to generate a lot more training data.
The approach is scaling really well so far [1], and whether you expect linear scaling or exponential one [2], the 85% threshold can be reached, using the "transduction" model alone, after generating under 2 million tasks ($20K in OpenAI credits).
Perhaps for 2025, the organizers will redesign ARC-AGI to be more resistant to this sort of approach, somehow.
---
[1] https://www.kaggle.com/competitions/arc-prize-2024/discussio...
[2] If you are "throwing darts at a board", you get exponential scaling (the probability of not hitting bullseye at least once reduces exponentially with the number of throws). If you deliberately design your synthetic dataset to be non-redundant, you might get something akin to linear scaling (until you hit perfect accuracy, of course).
Honest question - is that so, and why? I thought you have to calculate the probability of each throw individually as nothing fundamentally connects the throws together, only that long term there will be a normal distribution of randomness.