Something weird is happening with LLMs and chess

(dynomight.substack.com)

696 points crescit_eundo | 3 comments | 14 Nov 24 17:05 UTC | HN request time: 0.638s | source

Show context

PaulHoule ◴[14 Nov 24 21:56 UTC] No.42141647[source]▶

>>42138289 (OP) #

Maybe that one which plays chess well is calling out to a real chess engine.

replies(6): >>42141726 #>>42141959 #>>42142323 #>>42142342 #>>42143067 #>>42143188 #

1. og_kalu ◴[14 Nov 24 23:20 UTC] No.42142342[source]▶

>>42141647 #

It's not:

1. That would just be plain bizzare

2. It plays like what you'd expect from a LLM that could play chess. That is, level of play can be modulated by the prompt and doesn't manifest the same way shifting the level of stockfish etc does. Also the specific chess notation being prompted actually matters

3. It's sensitive to how the position came to be. Clearly not an existing chess engine. https://github.com/dpaleka/llm-chess-proofgame

4. It does make illegal moves. It's rare (~5 in 8205) but it happens. https://github.com/adamkarvonen/chess_gpt_eval

5. You can or well you used to be able to inspect the logprobs. I think Open AI have stopped doing this but the link in 4 does show the author inspecting it for Turbo instruct.

replies(2): >>42142524 #>>42142642 #

2. aithrowawaycomm ◴[14 Nov 24 23:44 UTC] No.42142524[source]▶

>>42142342 (TP) #

> Also the specific chess notation being prompted actually matters

Couldn't this be evidence that it is using an engine? Maybe if you use the wrong notation it relies on the ANN rather than calling to the engine.

Likewise:

- The sensitivity to game history is interesting, but is it actually true that other chess engines only look at current board state? Regardless, maybe it's not an existing chess engine! I would think OpenAI has some custom chess engine built as a side project, PoC, etc. In particular this engine might be neural and trained on actual games rather than board positions, which could explain dependency on past moves. Note that the engine is not actually very good. Does AlphaZero depend on move history? (Genuine question, I am not sure. But it does seem likely.)

- I think the illegal moves can be explained similarly to why gpt-o1 sometimes screws up easy computations despite having access to Python: an LLM having access to a tool does not guarantee it always uses that tool.

I realize there are holes in the argument, but I genuinely don't think these holes are as big as the "why is gpt-3.5-turbo-instruct so much better at chess than gpt-4?"

replies(1): >>42143234 #

3. janalsncm ◴[15 Nov 24 01:49 UTC] No.42143234[source]▶

>>42142524 #

> Couldn’t this be evidence that it is using an engine?

A test would be to measure its performance against more difficult versions of Stockfish. A real chess engine would have a higher ceiling.

Much more likely is this model was trained on more chess PGNs. You can call that a “neural engine” if you’d like but it is the simplest solution and explains the mistakes it is making.

Game state isn’t just what you can see on the board. It includes the 50 move rule and castling rights. Those were encoded as layers in AlphaZero along with prior positions of pieces. (8 prior positions if I’m remembering correctly.)

↑