(github.com)

365 points lawrenceyan | 1 comments | 17 Oct 24 19:13 UTC | HN request time: 0.201s | source

Show context

imranhou ◴[18 Oct 24 00:29 UTC] No.41875226[source]▶

From the page: "We also show that our model outperforms AlphaZero's policy and value networks (without MCTS) and GPT-3.5-turbo-instruct."

Why compare this to GPT-3.5-turbo-instruct? Is that near SOTA in this space?

replies(1): >>41875394 #

og_kalu ◴[18 Oct 24 00:56 UTC] No.41875394[source]▶

>>41875226 #

As far as anyone knows, 3.5-turbo-instruct is the best chess playing (certainly it was at the time of the paper) LLM. About 1800 Elo and < 0.1% Illegal move rate. It's unclear why it was so much better than 4 (lack of RLHF?, Data?) and I don't know if anyone has bothered to test 4o similarly but it was pretty big news online at the time.

replies(1): >>41877569 #

Davidzheng ◴[18 Oct 24 09:01 UTC] No.41877569[source]▶

>>41875394 #

OA definitely purposefully trained its chess strength

replies(1): >>41880481 #

1. og_kalu ◴[18 Oct 24 15:45 UTC] No.41880481[source]▶

>>41877569 #

I'm sure they did but there's no reason to believe they pretrained it on chess anymore than 4 so there's some speculation the post training processes mess things up. Turbo instruct does not go through RLHF for instance.

↑

Grandmaster-level chess without search