(dynomight.substack.com)

1. anotherpaulg ◴[15 Nov 24 04:54 UTC] No.42144062[source]▶

I found a related set of experiments that include gpt-3.5-turbo-instruct, gpt-3.5-turbo and gpt-4.

Same surprising conclusion: gpt-3.5-turbo-instruct is much better at chess.

https://blog.mathieuacher.com/GPTsChessEloRatingLegalMoves/

2. shtack ◴[15 Nov 24 05:19 UTC] No.42144150[source]▶

I’d bet it’s using function calling out to a real chess engine. It could probably be proven with a timing analysis to see how inference time changes/doesn’t with number of tokens or game complexity.

replies(2): >>42144275 #>>42150589 #

3. scratchyone ◴[15 Nov 24 05:53 UTC] No.42144275[source]▶

>>42144150 #

?? why would openai even want to secretly embed chess function calling into an incredibly old model? if they wanted to trick people into thinking their models are super good at chess why wouldn't they just do that to gpt-4o?

replies(1): >>42144575 #

4. semi-extrinsic ◴[15 Nov 24 07:10 UTC] No.42144575{3}[source]▶

>>42144275 #

The idea is that they embedded this when it was a new model, as part of the hype before GPT-4. The fake-it-till-you-make-it hope was that GPT-4 would be so good it could actually play chess. Then it turned out GPT-4 sucked at chess as well, and OpenAI quietly dropped any mention of chess. But it would be too suspicious to remove a well-documented feature from the old model, so it's left there and can be chalked up as a random event.

5. vbarrielle ◴[15 Nov 24 20:27 UTC] No.42150589[source]▶

>>42144150 #

If it were calling to a real chess engine there would be no illegal moves.

replies(1): >>42153733 #

6. shtack ◴[16 Nov 24 02:55 UTC] No.42153733{3}[source]▶

>>42150589 #

The instances of that happening are likely the LLM failing to call the engine for whatever reason and falling back to inference.

↑

Something weird is happening with LLMs and chess