Grok 3: Another win for the bitter lesson

(www.thealgorithmicbridge.com)

132 points kiyanwang | 1 comments | 20 Feb 25 07:15 UTC | HN request time: 0.203s | source

Show context

bambax ◴[20 Feb 25 09:04 UTC] No.43112611[source]▶

>>43111963 (OP) #

This article is weak and just general speculation.

Many people doubt the actual performance of Grok 3 and suspect it has been specifically trained on the benchmarks. And Sabine Hossenfelder says this:

> Asked Grok 3 to explain Bell's theorem. It gets it wrong just like all other LLMs I have asked because it just repeats confused stuff that has been written elsewhere rather than looking at the actual theorem.

https://x.com/skdh/status/1892432032644354192

Which shows that "massive scaling", even enormous, gigantic scaling, doesn't improve intelligence one bit; it improves scope, maybe, or flexibility, or coverage, or something, but not "intelligence".

replies(7): >>43112886 #>>43112908 #>>43113270 #>>43113312 #>>43113843 #>>43114290 #>>43115189 #

aucisson_masque ◴[20 Feb 25 10:53 UTC] No.43113270[source]▶

>>43112611 #

Last time I used chatbox arena, I was the one to ask question to LLM and so I made my own benchmark. There wasn't any predefined question.

How could Musk LLM train on data that does not yet exist ?

replies(2): >>43113596 #>>43115799 #

1. HenryBemis ◴[20 Feb 25 11:49 UTC] No.43113596[source]▶

>>43113270 #

That. I have used only ChatGPT and I remember asking 4 legacy to write some code. I asked o3 the same question when it came out, and then I compared the codes. o3 was 'better' more precise, more detailed, less 'crude'. Now, don't get me wrong, crude worked fine. But when I wanted to do the v1.1 and v1.2 o3 nailed it every time, while 4 legacy was simply bad and full of errors.

With that said, I assume that every 'next' version of each engine is using my 'prompts' to train, so each new version has the benefit of having already processed my initial v1.0 and then v1.1 and then v1.2. So it is somewhat 'unfair' because for "ChatGTP v2024" my v1.0 is brand new while for "ChatGTP v2027" my v1.0, v1.1, v1.2 is already in the training dataset.

I haven't used Grok yet, perhaps it's time to pause that OpenAI payment and give Elon some $$$ and see how it works 'for me'.

↑