> And then I tried gpt-3.5-turbo-instruct. This is a closed OpenAI model, so details are very murky.
How do you know it didn't just write a script that uses a chess engine and then execute the script? That IMO is the easiest explanation.
Also, I looked at the gpt-3.5-turbo-instruct example victory. One side played with 70% accuracy and the other was 77%. IMO that's not on par with 27XX ELO.