←back to thread

688 points crescit_eundo | 3 comments | | HN request time: 0.776s | source
Show context
niobe ◴[] No.42142885[source]
I don't understand why educated people expect that an LLM would be able to play chess at a decent level.

It has no idea about the quality of it's data. "Act like x" prompts are no substitute for actual reasoning and deterministic computation which clearly chess requires.

replies(20): >>42142963 #>>42143021 #>>42143024 #>>42143060 #>>42143136 #>>42143208 #>>42143253 #>>42143349 #>>42143949 #>>42144041 #>>42144146 #>>42144448 #>>42144487 #>>42144490 #>>42144558 #>>42144621 #>>42145171 #>>42145383 #>>42146513 #>>42147230 #
computerex ◴[] No.42142963[source]
Question here is why gpt-3.5-instruct can then beat stockfish.
replies(4): >>42142975 #>>42143081 #>>42143181 #>>42143889 #
fsndz ◴[] No.42142975[source]
PS: I ran and as suspected got-3.5-turbo-instruct does not beat stockfish, it is not even close "Final Results: gpt-3.5-turbo-instruct: Wins=0, Losses=6, Draws=0, Rating=1500.00 stockfish: Wins=6, Losses=0, Draws=0, Rating=1500.00" https://www.loom.com/share/870ea03197b3471eaf7e26e9b17e1754?...
replies(1): >>42142993 #
computerex ◴[] No.42142993[source]
Maybe there's some difference in the setup because the OP reports that the model beats stockfish (how they had it configured) every single game.
replies(2): >>42143059 #>>42144502 #
1. Filligree ◴[] No.42143059[source]
OP had stockfish at its weakest preset.
replies(1): >>42143193 #
2. fsndz ◴[] No.42143193[source]
Did the same and gpt-3.5-turbo-instruct still lost all the games. maybe a diff in stockfish version ? I am using stockfish 16
replies(1): >>42143999 #
3. mannykannot ◴[] No.42143999[source]
That is a very pertinent question, especially if Stockfish has been used to generate training data.