←back to thread

625 points lukebennett | 2 comments | | HN request time: 0.425s | source
Show context
guluarte ◴[] No.42139988[source]
Well, there have been no significant improvements to the GPT architecture over the past few years. I'm not sure why companies believe that simply adding more data will resolve the issues
replies(3): >>42140121 #>>42140384 #>>42141206 #
HarHarVeryFunny ◴[] No.42140384[source]
Obviously adding more data is a game of diminishing returns.

Going from 10% to 50% (500% more) complete coverage of common sense knowledge and reasoning is going to feel like a significant advance. Going from 90% to 95% (5% more) coverage is not going to feel the same.

Regardless of what Altman says, its been two years since OpenAI released GPT-4, and still no GPT-5 in sight, and they are now touting Q-star/strawberry/GPT-o1 as the next big thing instead. Sutskever, who saw what they're cooking before leaving, says that traditional scaling has plateaeud.

replies(1): >>42140899 #
og_kalu ◴[] No.42140899[source]
>Regardless of what Altman says, its been two years since OpenAI released GPT-4, and still no GPT-5 in sight.

It's been 20 months since 4 was released. 3 was released 32 months after 2. The lack of a release by now in itself does not mean much of anything.

replies(1): >>42141620 #
HarHarVeryFunny ◴[] No.42141620[source]
By itself, sure, but there are many sources all pointing to the same thing.

Sutskever, recently ex. OpenAI, one of the first to believe in scaling, now says it is plateauing. Do OpenAI have something secret he was unaware of? I doubt it.

FWIW, GPT-2 and GPT-3 were about a year apart (2019 "Language models are Unsupervised Multitask Learners" to 2020 "Language Models are Few-Shot Learners").

Dario Amodei recently said that with current gen models pre-training itself only takes a few months (then followed by post-training, etc). These are not year+ training runs.

replies(1): >>42142076 #
1. og_kalu ◴[] No.42142076[source]
>Sutskever, recently ex. OpenAI, one of the first to believe in scaling, now says it is plateauing.

Blind scaling sure (for whatever reason)* but this is the same Sutskever who believes in ASI within a decade off the back of what we have today.

* Not like anyone is telling us any details. After all, Open AI and Microsoft are still trying to create a 100B data center.

In my opinion, there's a difference between scaling not working and scaling becoming increasingly infeasible. GPT-4 is something like x100 the compute of 3 (Same with 2>3).

All the drips we've had of 5 point to ~x10 of 4. Not small but very modest in comparison.

>FWIW, GPT-2 and GPT-3 were about a year apart (2019 "Language models are Unsupervised Multitask Learners" to 2020 "Language Models are Few-Shot Learners").

Ah sorry I meant 3 and 4.

>Dario Amodei recently said that with current gen models pre-training itself only takes a few months (then followed by post-training, etc). These are not year+ training runs.

You don't have to be training models the entire time. GPT-4 was done training in August 2022 according to Open AI and wouldn't be released for another 8 months. Why? Who knows.

replies(1): >>42142274 #
2. HarHarVeryFunny ◴[] No.42142274[source]
> After all, Open AI and Microsoft are still trying to create a 100B data center.

Yes - it'll be interesting to see if there are any signs of these plans being adjusted. Apparently Microsoft's first step is to build optical links between existing data centers to create a larger distributed cluster, which must be less of a financial commitment.

Meta seem to have an advantage here in that they have massive inference needs to run their own business, so they are perhaps making less of a bet by building out data centers.