Instead they're barely able to eek out wins against a bot that plays completely random moves: https://maxim-saplin.github.io/llm_chess/
Much in the same way a human who only just learnt the rules but 0 strategy would very, very rarely lose here
These companies are shouting that their products are passing incredibly hard exams, solving PHD level questions, and are about to displace humans, and yet they still fail to crush a random-only strategy chess bot? How does this make any sense?
We're on the verge of AGI but there's not even the tiniest spark of general reasoning ability in something they haven't been trained for
"Reasoning" or "Thinking" are marketing terms and nothing more. If an LLM is trained for chess then its performance would just come from memorization, not any kind of "reasoning"
If you think you can play chess at that level over that many games and moves with memorization then i don't know what to tell you except that you're wrong. It's not possible so let's just get that out of the way.
>These companies are shouting that their products are passing incredibly hard exams, solving PHD level questions, and are about to displace humans, and yet they still fail to crush a random-only strategy chess bot? How does this make any sense?
Why doesn't it ? Have you actually looked at any of these games ? Those LLMs aren't playing like poor reasoners. They're playing like machines who have no clue what the rules of the game are. LLMs learn by predicting and failing and getting a little better at it, repeat ad nauseum. You want them to learn the rules of a complex game ? That's how you do it. By training them to predict it. Training on chess books just makes them learn how to converse about chess.
Humans have weird failure modes that are odds with their 'intelligence'. We just choose to call them funny names and laugh about it sometimes. These Machines have theirs. That's all there is to it. The top comment we are both replying to had gemini-2.5-pro which released less than 5 days later hit 25% on the benchmark. Now that was particularly funny.
It was surprising to me because I would have expected if there was reasoning ability then it would translate across domains at least somewhat, but yeah what you say makes sense. I'm thinking of it in human terms
Like how
- Training LLMs on code makes them solve reasoning problems better - Training Language Y alongside X makes them much better at Y than if they were trained on language Y alone and so on.
Probably because well gradient descent is a dumb optimizer and training is more like evolution than a human reading a book.
Also, there is something genuinely weird going on with LLM chess. And it's possible base models are better. https://dynomight.net/more-chess/
Very hard for me to wrap my head around the idea that an LLM being able to discuss, even perhaps teach high level chess strategy wouldn't transfer at all to its playing performance