←back to thread

688 points crescit_eundo | 5 comments | | HN request time: 0s | source
Show context
anotherpaulg ◴[] No.42144062[source]
I found a related set of experiments that include gpt-3.5-turbo-instruct, gpt-3.5-turbo and gpt-4.

Same surprising conclusion: gpt-3.5-turbo-instruct is much better at chess.

https://blog.mathieuacher.com/GPTsChessEloRatingLegalMoves/

replies(1): >>42144150 #
1. shtack ◴[] No.42144150[source]
I’d bet it’s using function calling out to a real chess engine. It could probably be proven with a timing analysis to see how inference time changes/doesn’t with number of tokens or game complexity.
replies(2): >>42144275 #>>42150589 #
2. scratchyone ◴[] No.42144275[source]
?? why would openai even want to secretly embed chess function calling into an incredibly old model? if they wanted to trick people into thinking their models are super good at chess why wouldn't they just do that to gpt-4o?
replies(1): >>42144575 #
3. semi-extrinsic ◴[] No.42144575[source]
The idea is that they embedded this when it was a new model, as part of the hype before GPT-4. The fake-it-till-you-make-it hope was that GPT-4 would be so good it could actually play chess. Then it turned out GPT-4 sucked at chess as well, and OpenAI quietly dropped any mention of chess. But it would be too suspicious to remove a well-documented feature from the old model, so it's left there and can be chalked up as a random event.
4. vbarrielle ◴[] No.42150589[source]
If it were calling to a real chess engine there would be no illegal moves.
replies(1): >>42153733 #
5. shtack ◴[] No.42153733[source]
The instances of that happening are likely the LLM failing to call the engine for whatever reason and falling back to inference.