Going from 10% to 50% (500% more) complete coverage of common sense knowledge and reasoning is going to feel like a significant advance. Going from 90% to 95% (5% more) coverage is not going to feel the same.
Regardless of what Altman says, its been two years since OpenAI released GPT-4, and still no GPT-5 in sight, and they are now touting Q-star/strawberry/GPT-o1 as the next big thing instead. Sutskever, who saw what they're cooking before leaving, says that traditional scaling has plateaeud.
It's been 20 months since 4 was released. 3 was released 32 months after 2. The lack of a release by now in itself does not mean much of anything.
A lot hangs on what you mean by "significant". Can you define what you mean? And/or give an example of an improvement that you don't think is significant.
Also, on what basis can you say "no significant improvements" have been made? Many major players have published some of their improvements openly. They also have more private, unpublished improvements.
If your claim boils down to "what people mean by a Generative Pre-trained Transformer" still has a clear meaning, ok, fine, but that isn't the meat of the issue. There is so much more to a chat system than just the starting point of a vanilla GPT.
It is wiser to look at the whole end-to-end system, starting at data acquisition, including pre-training and fine-tuning, deployment, all the way to UX.
P.S. I don't have a vested interest in promoting or disparaging AI. I don't work for a big AI lab. I'm just trying to call it like I see it, as rationally as I can.
Sutskever, recently ex. OpenAI, one of the first to believe in scaling, now says it is plateauing. Do OpenAI have something secret he was unaware of? I doubt it.
FWIW, GPT-2 and GPT-3 were about a year apart (2019 "Language models are Unsupervised Multitask Learners" to 2020 "Language Models are Few-Shot Learners").
Dario Amodei recently said that with current gen models pre-training itself only takes a few months (then followed by post-training, etc). These are not year+ training runs.
Blind scaling sure (for whatever reason)* but this is the same Sutskever who believes in ASI within a decade off the back of what we have today.
* Not like anyone is telling us any details. After all, Open AI and Microsoft are still trying to create a 100B data center.
In my opinion, there's a difference between scaling not working and scaling becoming increasingly infeasible. GPT-4 is something like x100 the compute of 3 (Same with 2>3).
All the drips we've had of 5 point to ~x10 of 4. Not small but very modest in comparison.
>FWIW, GPT-2 and GPT-3 were about a year apart (2019 "Language models are Unsupervised Multitask Learners" to 2020 "Language Models are Few-Shot Learners").
Ah sorry I meant 3 and 4.
>Dario Amodei recently said that with current gen models pre-training itself only takes a few months (then followed by post-training, etc). These are not year+ training runs.
You don't have to be training models the entire time. GPT-4 was done training in August 2022 according to Open AI and wouldn't be released for another 8 months. Why? Who knows.
Yes - it'll be interesting to see if there are any signs of these plans being adjusted. Apparently Microsoft's first step is to build optical links between existing data centers to create a larger distributed cluster, which must be less of a financial commitment.
Meta seem to have an advantage here in that they have massive inference needs to run their own business, so they are perhaps making less of a bet by building out data centers.