Grok 3: Another win for the bitter lesson

(www.thealgorithmicbridge.com)

132 points kiyanwang | 1 comments | 20 Feb 25 07:15 UTC | HN request time: 0s | source

Show context

bambax ◴[20 Feb 25 09:04 UTC] No.43112611[source]▶

>>43111963 (OP) #

This article is weak and just general speculation.

Many people doubt the actual performance of Grok 3 and suspect it has been specifically trained on the benchmarks. And Sabine Hossenfelder says this:

> Asked Grok 3 to explain Bell's theorem. It gets it wrong just like all other LLMs I have asked because it just repeats confused stuff that has been written elsewhere rather than looking at the actual theorem.

https://x.com/skdh/status/1892432032644354192

Which shows that "massive scaling", even enormous, gigantic scaling, doesn't improve intelligence one bit; it improves scope, maybe, or flexibility, or coverage, or something, but not "intelligence".

replies(7): >>43112886 #>>43112908 #>>43113270 #>>43113312 #>>43113843 #>>43114290 #>>43115189 #

aucisson_masque ◴[20 Feb 25 10:53 UTC] No.43113270[source]▶

>>43112611 #

Last time I used chatbox arena, I was the one to ask question to LLM and so I made my own benchmark. There wasn't any predefined question.

How could Musk LLM train on data that does not yet exist ?

replies(2): >>43113596 #>>43115799 #

1. JKCalhoun ◴[20 Feb 25 15:25 UTC] No.43115799[source]▶

>>43113270 #

That's true. You can head over to lmarena.ai and pit it against other LLMs yourself. I only tried two prompts but was surprised at how well it did.

There are "leaderboards" there that provide more anecdotal data points than my two.

↑