I found a related set of experiments that include gpt-3.5-turbo-instruct, gpt-3.5-turbo and gpt-4.
Same surprising conclusion: gpt-3.5-turbo-instruct is much better at chess.
replies(1):
Same surprising conclusion: gpt-3.5-turbo-instruct is much better at chess.