←back to thread

365 points lawrenceyan | 1 comments | | HN request time: 0.204s | source
Show context
imranhou ◴[] No.41875226[source]
From the page: "We also show that our model outperforms AlphaZero's policy and value networks (without MCTS) and GPT-3.5-turbo-instruct."

Why compare this to GPT-3.5-turbo-instruct? Is that near SOTA in this space?

replies(1): >>41875394 #
og_kalu ◴[] No.41875394[source]
As far as anyone knows, 3.5-turbo-instruct is the best chess playing (certainly it was at the time of the paper) LLM. About 1800 Elo and < 0.1% Illegal move rate. It's unclear why it was so much better than 4 (lack of RLHF?, Data?) and I don't know if anyone has bothered to test 4o similarly but it was pretty big news online at the time.
replies(1): >>41877569 #
Davidzheng ◴[] No.41877569[source]
OA definitely purposefully trained its chess strength
replies(1): >>41880481 #
1. og_kalu ◴[] No.41880481[source]
I'm sure they did but there's no reason to believe they pretrained it on chess anymore than 4 so there's some speculation the post training processes mess things up. Turbo instruct does not go through RLHF for instance.