←back to thread

Grok 3: Another win for the bitter lesson

(www.thealgorithmicbridge.com)
129 points kiyanwang | 1 comments | | HN request time: 0.219s | source
Show context
bambax ◴[] No.43112611[source]
This article is weak and just general speculation.

Many people doubt the actual performance of Grok 3 and suspect it has been specifically trained on the benchmarks. And Sabine Hossenfelder says this:

> Asked Grok 3 to explain Bell's theorem. It gets it wrong just like all other LLMs I have asked because it just repeats confused stuff that has been written elsewhere rather than looking at the actual theorem.

https://x.com/skdh/status/1892432032644354192

Which shows that "massive scaling", even enormous, gigantic scaling, doesn't improve intelligence one bit; it improves scope, maybe, or flexibility, or coverage, or something, but not "intelligence".

replies(7): >>43112886 #>>43112908 #>>43113270 #>>43113312 #>>43113843 #>>43114290 #>>43115189 #
aucisson_masque ◴[] No.43113270[source]
Last time I used chatbox arena, I was the one to ask question to LLM and so I made my own benchmark. There wasn't any predefined question.

How could Musk LLM train on data that does not yet exist ?

replies(2): >>43113596 #>>43115799 #
1. HenryBemis ◴[] No.43113596[source]
That. I have used only ChatGPT and I remember asking 4 legacy to write some code. I asked o3 the same question when it came out, and then I compared the codes. o3 was 'better' more precise, more detailed, less 'crude'. Now, don't get me wrong, crude worked fine. But when I wanted to do the v1.1 and v1.2 o3 nailed it every time, while 4 legacy was simply bad and full of errors.

With that said, I assume that every 'next' version of each engine is using my 'prompts' to train, so each new version has the benefit of having already processed my initial v1.0 and then v1.1 and then v1.2. So it is somewhat 'unfair' because for "ChatGTP v2024" my v1.0 is brand new while for "ChatGTP v2027" my v1.0, v1.1, v1.2 is already in the training dataset.

I haven't used Grok yet, perhaps it's time to pause that OpenAI payment and give Elon some $$$ and see how it works 'for me'.