←back to thread

Grok 3: Another win for the bitter lesson

(www.thealgorithmicbridge.com)
129 points kiyanwang | 1 comments | | HN request time: 0.219s | source
Show context
bambax ◴[] No.43112611[source]
This article is weak and just general speculation.

Many people doubt the actual performance of Grok 3 and suspect it has been specifically trained on the benchmarks. And Sabine Hossenfelder says this:

> Asked Grok 3 to explain Bell's theorem. It gets it wrong just like all other LLMs I have asked because it just repeats confused stuff that has been written elsewhere rather than looking at the actual theorem.

https://x.com/skdh/status/1892432032644354192

Which shows that "massive scaling", even enormous, gigantic scaling, doesn't improve intelligence one bit; it improves scope, maybe, or flexibility, or coverage, or something, but not "intelligence".

replies(7): >>43112886 #>>43112908 #>>43113270 #>>43113312 #>>43113843 #>>43114290 #>>43115189 #
aucisson_masque ◴[] No.43113270[source]
Last time I used chatbox arena, I was the one to ask question to LLM and so I made my own benchmark. There wasn't any predefined question.

How could Musk LLM train on data that does not yet exist ?

replies(2): >>43113596 #>>43115799 #
1. JKCalhoun ◴[] No.43115799[source]
That's true. You can head over to lmarena.ai and pit it against other LLMs yourself. I only tried two prompts but was surprised at how well it did.

There are "leaderboards" there that provide more anecdotal data points than my two.