←back to thread

166 points lawrenceyan | 1 comments | | HN request time: 0.201s | source
Show context
imranhou ◴[] No.41875226[source]
From the page: "We also show that our model outperforms AlphaZero's policy and value networks (without MCTS) and GPT-3.5-turbo-instruct."

Why compare this to GPT-3.5-turbo-instruct? Is that near SOTA in this space?

replies(1): >>41875394 #
1. og_kalu ◴[] No.41875394[source]
As far as anyone knows, 3.5-turbo-instruct is the best chess playing (certainly it was at the time of the paper) LLM. About 1800 Elo and < 0.1% Illegal move rate. It's unclear why it was so much better than 4 (lack of RLHF?, Data?) and I don't know if anyone has bothered to test 4o similarly but it was pretty big news online at the time.