(dynomight.substack.com)

696 points crescit_eundo | 1 comments | 14 Nov 24 17:05 UTC | HN request time: 0s | source

Show context

anotherpaulg ◴[15 Nov 24 04:54 UTC] No.42144062[source]▶

>>42138289 (OP) #

I found a related set of experiments that include gpt-3.5-turbo-instruct, gpt-3.5-turbo and gpt-4.

Same surprising conclusion: gpt-3.5-turbo-instruct is much better at chess.

https://blog.mathieuacher.com/GPTsChessEloRatingLegalMoves/

replies(1): >>42144150 #

shtack ◴[15 Nov 24 05:19 UTC] No.42144150[source]▶

>>42144062 #

I’d bet it’s using function calling out to a real chess engine. It could probably be proven with a timing analysis to see how inference time changes/doesn’t with number of tokens or game complexity.

replies(2): >>42144275 #>>42150589 #

scratchyone ◴[15 Nov 24 05:53 UTC] No.42144275[source]▶

>>42144150 #

?? why would openai even want to secretly embed chess function calling into an incredibly old model? if they wanted to trick people into thinking their models are super good at chess why wouldn't they just do that to gpt-4o?

replies(1): >>42144575 #

1. semi-extrinsic ◴[15 Nov 24 07:10 UTC] No.42144575[source]▶

>>42144275 #

The idea is that they embedded this when it was a new model, as part of the hype before GPT-4. The fake-it-till-you-make-it hope was that GPT-4 would be so good it could actually play chess. Then it turned out GPT-4 sucked at chess as well, and OpenAI quietly dropped any mention of chess. But it would be too suspicious to remove a well-documented feature from the old model, so it's left there and can be chalked up as a random event.

↑

Something weird is happening with LLMs and chess