←back to thread

555 points maheshrijal | 1 comments | | HN request time: 0.209s | source
Show context
carlita_express ◴[] No.43708146[source]
> we’ve observed that large-scale reinforcement learning exhibits the same “more compute = better performance” trend observed in GPT‑series pretraining.

Didn’t the pivot to RL from pretraining happen because the scaling “law” didn’t deliver the expected gains? (Or at least because O(log) increases in model performance became unreasonably costly?) I see they’ve finally resigned themselves to calling these trends, not laws, but trends are often fleeting. Why should we expect this one to hold for much longer?

replies(2): >>43709169 #>>43712766 #
1. og_kalu ◴[] No.43712766[source]
It doesn't need to hold forever or even 'much longer' depending on your definition of that duration. It just needs to hold on long enough to realize certain capabilities.

Will it ? Who knows. But seeing as this is something you can't predict ahead of time, it makes little sense not to try in so far as the whole thing is still feasible.