Francois Chollet is leaving Google

(developers.googleblog.com)

377 points xnx | 1 comments | 13 Nov 24 22:28 UTC | HN request time: 0.221s | source

Show context

max_ ◴[13 Nov 24 23:19 UTC] No.42131308[source]▶

>>42130881 (OP) #

I wonder what he will be working on?

Maybe he figured out a model that beats ARC-AGI by 85%?

replies(1): >>42131784 #

trott ◴[14 Nov 24 00:26 UTC] No.42131784[source]▶

>>42131308 #

> Maybe he figured out a model that beats ARC-AGI by 85%?

People have, I think.

One of the published approaches (BARC) uses GPT-4o to generate a lot more training data.

The approach is scaling really well so far [1], and whether you expect linear scaling or exponential one [2], the 85% threshold can be reached, using the "transduction" model alone, after generating under 2 million tasks ($20K in OpenAI credits).

Perhaps for 2025, the organizers will redesign ARC-AGI to be more resistant to this sort of approach, somehow.

---

[1] https://www.kaggle.com/competitions/arc-prize-2024/discussio...

[2] If you are "throwing darts at a board", you get exponential scaling (the probability of not hitting bullseye at least once reduces exponentially with the number of throws). If you deliberately design your synthetic dataset to be non-redundant, you might get something akin to linear scaling (until you hit perfect accuracy, of course).

replies(4): >>42131848 #>>42132132 #>>42132502 #>>42132655 #

fastball ◴[14 Nov 24 01:16 UTC] No.42132132[source]▶

>>42131784 #

I like the idea of ARC-AGI and think it was worth a shot. But if someone has already hit the human-level threshold, I think the entire idea can be thrown out.

If the ARC-AGI challenge did not actually follow their expected graph[1], I see no reason to believe that any benchmark can be designed in a way where it cannot be gamed. Rather, it seems that the existing SOTA models just weren't well-optimized for that one task.

The only way to measure "AGI" is in however you define the "G". If your model can only do one thing, it is not AGI and doesn't really indicate you are closer, even if you very carefully designed your challenge.

[1] https://static.supernotes.app/ai-benchmarks-2.png

replies(3): >>42132191 #>>42132203 #>>42132310 #

nl ◴[14 Nov 24 01:28 UTC] No.42132203[source]▶

>>42132132 #

> The only way to measure "AGI" is in however you define the "G"

"I" isn't usefully defined either.

At least most people agree on "Artificial"

replies(1): >>42133124 #

1. echelon ◴[14 Nov 24 04:18 UTC] No.42133124[source]▶

>>42132203 #

That's the problem with intelligence vs the other things we're doing with deep learning.

Vision models, image models, video models, audio models? Solved. We've understood the physics of optics and audio for over half a century. We've had ray tracers for forever. It's all well understood, and now we're teaching models to understand it.

Intelligence? We can't even describe our own.

↑