From the page: "We also show that our model outperforms AlphaZero's policy and value networks (without MCTS) and GPT-3.5-turbo-instruct."
Why compare this to GPT-3.5-turbo-instruct? Is that near SOTA in this space?
replies(1):
Why compare this to GPT-3.5-turbo-instruct? Is that near SOTA in this space?