As far as anyone knows, 3.5-turbo-instruct is the best chess playing (certainly it was at the time of the paper) LLM. About 1800 Elo and < 0.1% Illegal move rate. It's unclear why it was so much better than 4 (lack of RLHF?, Data?) and I don't know if anyone has bothered to test 4o similarly but it was pretty big news online at the time.
I'm sure they did but there's no reason to believe they pretrained it on chess anymore than 4 so there's some speculation the post training processes mess things up. Turbo instruct does not go through RLHF for instance.