(dynomight.substack.com)

696 points crescit_eundo | 1 comments | 14 Nov 24 17:05 UTC | HN request time: 0.281s | source

Show context

niobe ◴[15 Nov 24 00:40 UTC] No.42142885[source]▶

I don't understand why educated people expect that an LLM would be able to play chess at a decent level.

It has no idea about the quality of it's data. "Act like x" prompts are no substitute for actual reasoning and deterministic computation which clearly chess requires.

replies(20): >>42142963 #>>42143021 #>>42143024 #>>42143060 #>>42143136 #>>42143208 #>>42143253 #>>42143349 #>>42143949 #>>42144041 #>>42144146 #>>42144448 #>>42144487 #>>42144490 #>>42144558 #>>42144621 #>>42145171 #>>42145383 #>>42146513 #>>42147230 #

computerex ◴[15 Nov 24 00:55 UTC] No.42142963[source]▶

>>42142885 #

Question here is why gpt-3.5-instruct can then beat stockfish.

replies(4): >>42142975 #>>42143081 #>>42143181 #>>42143889 #

fsndz ◴[15 Nov 24 00:57 UTC] No.42142975[source]▶

>>42142963 #

PS: I ran and as suspected got-3.5-turbo-instruct does not beat stockfish, it is not even close "Final Results: gpt-3.5-turbo-instruct: Wins=0, Losses=6, Draws=0, Rating=1500.00 stockfish: Wins=6, Losses=0, Draws=0, Rating=1500.00" https://www.loom.com/share/870ea03197b3471eaf7e26e9b17e1754?...

replies(1): >>42142993 #

computerex ◴[15 Nov 24 01:00 UTC] No.42142993[source]▶

>>42142975 #

Maybe there's some difference in the setup because the OP reports that the model beats stockfish (how they had it configured) every single game.

replies(2): >>42143059 #>>42144502 #

1. golol ◴[15 Nov 24 06:54 UTC] No.42144502[source]▶

>>42142993 #

You have to get the model to think in PGN data. It's crucial to use the exact PGN format it sae in its training data and to give it few shot examples.

↑

Something weird is happening with LLMs and chess