This is supported by both general observations and recently this tweet from an OpenAI engineer that Sam responded to and engaged ->
"scaling has hit a wall and that wall is 100% eval saturation"
Which I interpert to mean his view is that models are no longer yielding significant performance improvements because the models have maxed out existing evaluation metrics.
Are those evaluations (or even LLMs) the RIGHT measures to achieve AGI? Probably not.
But have they been useful tools to demonstrate that the confluence of compute, engineering, and tactical models are leading towards signifigant breathroughts in artificial (computer) intelligence?
I would say yes.
Which in turn are driving the funding, power innovation, public policy etc needed to take that next step?
I hope so.