This article is weak and just general speculation.
Many people doubt the actual performance of Grok 3 and suspect it has been specifically trained on the benchmarks. And Sabine Hossenfelder says this:
> Asked Grok 3 to explain Bell's theorem. It gets it wrong just like all other LLMs I have asked because it just repeats confused stuff that has been written elsewhere rather than looking at the actual theorem.
https://x.com/skdh/status/1892432032644354192
Which shows that "massive scaling", even enormous, gigantic scaling, doesn't improve intelligence one bit; it improves scope, maybe, or flexibility, or coverage, or something, but not "intelligence".
replies(7):