←back to thread

688 points crescit_eundo | 5 comments | | HN request time: 0s | source
Show context
niobe ◴[] No.42142885[source]
I don't understand why educated people expect that an LLM would be able to play chess at a decent level.

It has no idea about the quality of it's data. "Act like x" prompts are no substitute for actual reasoning and deterministic computation which clearly chess requires.

replies(20): >>42142963 #>>42143021 #>>42143024 #>>42143060 #>>42143136 #>>42143208 #>>42143253 #>>42143349 #>>42143949 #>>42144041 #>>42144146 #>>42144448 #>>42144487 #>>42144490 #>>42144558 #>>42144621 #>>42145171 #>>42145383 #>>42146513 #>>42147230 #
computerex ◴[] No.42142963[source]
Question here is why gpt-3.5-instruct can then beat stockfish.
replies(4): >>42142975 #>>42143081 #>>42143181 #>>42143889 #
fsndz ◴[] No.42142975[source]
PS: I ran and as suspected got-3.5-turbo-instruct does not beat stockfish, it is not even close "Final Results: gpt-3.5-turbo-instruct: Wins=0, Losses=6, Draws=0, Rating=1500.00 stockfish: Wins=6, Losses=0, Draws=0, Rating=1500.00" https://www.loom.com/share/870ea03197b3471eaf7e26e9b17e1754?...
replies(1): >>42142993 #
1. computerex ◴[] No.42142993[source]
Maybe there's some difference in the setup because the OP reports that the model beats stockfish (how they had it configured) every single game.
replies(2): >>42143059 #>>42144502 #
2. Filligree ◴[] No.42143059[source]
OP had stockfish at its weakest preset.
replies(1): >>42143193 #
3. fsndz ◴[] No.42143193[source]
Did the same and gpt-3.5-turbo-instruct still lost all the games. maybe a diff in stockfish version ? I am using stockfish 16
replies(1): >>42143999 #
4. mannykannot ◴[] No.42143999{3}[source]
That is a very pertinent question, especially if Stockfish has been used to generate training data.
5. golol ◴[] No.42144502[source]
You have to get the model to think in PGN data. It's crucial to use the exact PGN format it sae in its training data and to give it few shot examples.