I feel like the article neglects one obvious possibility: that OpenAI decided that chess was a benchmark worth "winning", special-cases chess within gpt-3.5-turbo-instruct, and then neglected to add that special-case to follow-up models since it wasn't generating sustained press coverage.
I suspect the same thing. Rather than LLMs “learning to play chess,” they “learnt” to recognise a chess game and hand over instructions to a chess engine. If that’s the case, I don’t feel impressed at all.
TBH I think a good AI would have access to a Swiss army knife of tools and know how to use them. For example a complicated math equation, using a calculator is just smarter than doing it in your head.
We already have the chess "calculator", though. It's called stockfish. I don't know why you'd ask a dictionary how to solve a math problem.