←back to thread

688 points crescit_eundo | 1 comments | | HN request time: 0s | source
Show context
niobe ◴[] No.42142885[source]
I don't understand why educated people expect that an LLM would be able to play chess at a decent level.

It has no idea about the quality of it's data. "Act like x" prompts are no substitute for actual reasoning and deterministic computation which clearly chess requires.

replies(20): >>42142963 #>>42143021 #>>42143024 #>>42143060 #>>42143136 #>>42143208 #>>42143253 #>>42143349 #>>42143949 #>>42144041 #>>42144146 #>>42144448 #>>42144487 #>>42144490 #>>42144558 #>>42144621 #>>42145171 #>>42145383 #>>42146513 #>>42147230 #
xelxebar ◴[] No.42143949[source]
Then you should be surprised that turbo-instruct actually plays well, right? We see a proliferation of hand-wavy arguments based on unfounded anthropomorphic intuitions about "actual reasoning" and whatnot. I think this is good evidence that nobody really understands what's going on.

If some mental model says that LLMs should be bad at chess, then it fails to explain why we have LLMs playing strong chess. If another mental model says the inverse, then it fails to explain why so many of these large models fail spectacularly at chess.

Clearly, there's more going on here.

replies(5): >>42144358 #>>42145060 #>>42147213 #>>42147766 #>>42161043 #
flyingcircus3 ◴[] No.42144358[source]
"playing strong chess" would be a much less hand-wavy claim if there were lots of independent methods of quantifying and verifying the strength of stockfish's lowest difficulty setting. I honestly don't know if that exists or not. But unless it does, why would stockfish's lowest difficulty setting be a meaningful threshold?
replies(1): >>42144495 #
1. golol ◴[] No.42144495{3}[source]
I've tried it myself, GPT-3.5-turbo-instruct was at least somewhere in the rabge 1600-1800 ELO.