Francois Chollet is leaving Google

(developers.googleblog.com)

377 points xnx | 2 comments | 13 Nov 24 22:28 UTC | HN request time: 0.443s | source

Show context

max_ ◴[13 Nov 24 23:19 UTC] No.42131308[source]▶

>>42130881 (OP) #

I wonder what he will be working on?

Maybe he figured out a model that beats ARC-AGI by 85%?

replies(1): >>42131784 #

trott ◴[14 Nov 24 00:26 UTC] No.42131784[source]▶

>>42131308 #

> Maybe he figured out a model that beats ARC-AGI by 85%?

People have, I think.

One of the published approaches (BARC) uses GPT-4o to generate a lot more training data.

The approach is scaling really well so far [1], and whether you expect linear scaling or exponential one [2], the 85% threshold can be reached, using the "transduction" model alone, after generating under 2 million tasks ($20K in OpenAI credits).

Perhaps for 2025, the organizers will redesign ARC-AGI to be more resistant to this sort of approach, somehow.

---

[1] https://www.kaggle.com/competitions/arc-prize-2024/discussio...

[2] If you are "throwing darts at a board", you get exponential scaling (the probability of not hitting bullseye at least once reduces exponentially with the number of throws). If you deliberately design your synthetic dataset to be non-redundant, you might get something akin to linear scaling (until you hit perfect accuracy, of course).

replies(4): >>42131848 #>>42132132 #>>42132502 #>>42132655 #

1. thrw42A8N ◴[14 Nov 24 00:35 UTC] No.42131848[source]▶

>>42131784 #

> If you are "throwing darts at a board", you get exponential scaling (the probability of not hitting bullseye reduces exponentially with the number of throws).

Honest question - is that so, and why? I thought you have to calculate the probability of each throw individually as nothing fundamentally connects the throws together, only that long term there will be a normal distribution of randomness.

replies(1): >>42131877 #

2. trott ◴[14 Nov 24 00:38 UTC] No.42131877[source]▶

>>42131848 (TP) #

> The probability of not hitting bullseye at least once ...

I added a clarification.

↑