Something weird is happening with LLMs and chess

(dynomight.substack.com)

696 points crescit_eundo | 2 comments | 14 Nov 24 17:05 UTC | HN request time: 0.416s | source

Show context

fabiospampinato ◴[15 Nov 24 11:20 UTC] No.42145891[source]▶

It's probably worth to play around with different prompts and different board positions.

For context this [1] is the board position the model is being prompted on.

There may be more than one weird thing about this experiment, for example giving instructions to the non-instruction tuned variants may be counter productive.

More importantly let's say you just give the model the truncated PGN, does this look like a position where white is a grandmaster level player? I don't think so. Even if the model understood chess really well it's going to try to predict the most probable move given the position at hand, if the model thinks that white is a bad player, and the model is good at understanding chess, it's going to predict bad moves as the more likely ones because that would better predict what is most likely to happen here.

[1]: https://i.imgur.com/qRxalgH.png

replies(4): >>42146161 #>>42147006 #>>42147866 #>>42150105 #

1. fabiospampinato ◴[15 Nov 24 14:01 UTC] No.42147006[source]▶

>>42145891 #

Apparently I can find some matches for games that start like that between very strong players [1], so my hypothesis that the model may just be predicting bad moves on purpose seems wobbly, although having stockfish at the lowest level play as the supposedly very strong opponent may still be throwing the model off somewhat. In the charts the first few moves the model makes seem decent, if I'm interpreting these charts right, and after a few of those things seem to start going wrong.

Either way it's worth repeating the experiment imo, tweaking some of these variables (prompt guidance, stockfish strength, starting position, the name of the supposed players, etc.).

[1]: https://www.365chess.com/search_result.php?search=1&p=1&m=8&...

replies(1): >>42164340 #

2. sjducb ◴[17 Nov 24 14:26 UTC] No.42164340[source]▶

>>42147006 (TP) #

Interesting thought the LLM isn’t trying to win, it’s trying to produce data like the input data. It’s quite rare for a very strong player to play a very weak one. If you feed it lots of weak moves it’ll best replicate the training data by following with weak moves.

↑