←back to thread

688 points crescit_eundo | 2 comments | | HN request time: 0.663s | source
Show context
anotherpaulg ◴[] No.42144062[source]
I found a related set of experiments that include gpt-3.5-turbo-instruct, gpt-3.5-turbo and gpt-4.

Same surprising conclusion: gpt-3.5-turbo-instruct is much better at chess.

https://blog.mathieuacher.com/GPTsChessEloRatingLegalMoves/

replies(1): >>42144150 #
shtack ◴[] No.42144150[source]
I’d bet it’s using function calling out to a real chess engine. It could probably be proven with a timing analysis to see how inference time changes/doesn’t with number of tokens or game complexity.
replies(2): >>42144275 #>>42150589 #
1. vbarrielle ◴[] No.42150589[source]
If it were calling to a real chess engine there would be no illegal moves.
replies(1): >>42153733 #
2. shtack ◴[] No.42153733[source]
The instances of that happening are likely the LLM failing to call the engine for whatever reason and falling back to inference.