Grok 3: Another win for the bitter lesson

(www.thealgorithmicbridge.com)

132 points kiyanwang | 5 comments | 20 Feb 25 07:15 UTC | HN request time: 0s | source

Show context

bambax ◴[20 Feb 25 09:04 UTC] No.43112611[source]▶

>>43111963 (OP) #

This article is weak and just general speculation.

Many people doubt the actual performance of Grok 3 and suspect it has been specifically trained on the benchmarks. And Sabine Hossenfelder says this:

> Asked Grok 3 to explain Bell's theorem. It gets it wrong just like all other LLMs I have asked because it just repeats confused stuff that has been written elsewhere rather than looking at the actual theorem.

https://x.com/skdh/status/1892432032644354192

Which shows that "massive scaling", even enormous, gigantic scaling, doesn't improve intelligence one bit; it improves scope, maybe, or flexibility, or coverage, or something, but not "intelligence".

replies(7): >>43112886 #>>43112908 #>>43113270 #>>43113312 #>>43113843 #>>43114290 #>>43115189 #

1. melodyogonna ◴[20 Feb 25 09:49 UTC] No.43112886[source]▶

>>43112611 #

How can it be specifically trained on benchmarks when it is leading on blind chatbot tests?

The post you quoted is not a Grok problem if other LLMs are also failing, it seems, to me, to be a fundamental failure in the current approach to AI model development.

replies(2): >>43113802 #>>43115538 #

2. bearjaws ◴[20 Feb 25 12:15 UTC] No.43113802[source]▶

>>43112886 (TP) #

Any LLM that is uncensored does well on Chatbot tests because a refusal is an automatic loss.

And since 30% of people using Chatbots are Gooning it up theres far more refusals...

replies(1): >>43116167 #

3. nycdatasci ◴[20 Feb 25 15:04 UTC] No.43115538[source]▶

>>43112886 (TP) #

I think a more plausible path to gaming benchmarks would be to use watermarks in text output to identify your model, then unleash bots to consistently rank your model over opponents.

4. pyinstallwoes ◴[20 Feb 25 15:51 UTC] No.43116167[source]▶

>>43113802 #

Gooning?

replies(1): >>43118014 #

5. bearjaws ◴[20 Feb 25 18:03 UTC] No.43118014{3}[source]▶

>>43116167 #

https://www.urbandictionary.com/define.php?term=gooning

↑