Something weird is happening with LLMs and chess

(dynomight.substack.com)

696 points crescit_eundo | 1 comments | 14 Nov 24 17:05 UTC | HN request time: 0.354s | source

Show context

niobe ◴[15 Nov 24 00:40 UTC] No.42142885[source]▶

I don't understand why educated people expect that an LLM would be able to play chess at a decent level.

It has no idea about the quality of it's data. "Act like x" prompts are no substitute for actual reasoning and deterministic computation which clearly chess requires.

replies(20): >>42142963 #>>42143021 #>>42143024 #>>42143060 #>>42143136 #>>42143208 #>>42143253 #>>42143349 #>>42143949 #>>42144041 #>>42144146 #>>42144448 #>>42144487 #>>42144490 #>>42144558 #>>42144621 #>>42145171 #>>42145383 #>>42146513 #>>42147230 #

xelxebar ◴[15 Nov 24 04:24 UTC] No.42143949[source]▶

>>42142885 #

Then you should be surprised that turbo-instruct actually plays well, right? We see a proliferation of hand-wavy arguments based on unfounded anthropomorphic intuitions about "actual reasoning" and whatnot. I think this is good evidence that nobody really understands what's going on.

If some mental model says that LLMs should be bad at chess, then it fails to explain why we have LLMs playing strong chess. If another mental model says the inverse, then it fails to explain why so many of these large models fail spectacularly at chess.

Clearly, there's more going on here.

replies(5): >>42144358 #>>42145060 #>>42147213 #>>42147766 #>>42161043 #

1. the_af ◴[15 Nov 24 14:25 UTC] No.42147213[source]▶

>>42143949 #

> Then you should be surprised that turbo-instruct actually plays well, right?

Do we know it's not special-casing chess and instead using a different engine (not an LLM) for playing?

To be clear, this would be an entirely appropriate approach to problem-solving in the real world, it just wouldn't be the LLM that's playing chess.

↑