(dynomight.substack.com)

696 points crescit_eundo | 1 comments | 14 Nov 24 17:05 UTC | HN request time: 0.211s | source

Show context

swiftcoder ◴[15 Nov 24 07:57 UTC] No.42144784[source]▶

I feel like the article neglects one obvious possibility: that OpenAI decided that chess was a benchmark worth "winning", special-cases chess within gpt-3.5-turbo-instruct, and then neglected to add that special-case to follow-up models since it wasn't generating sustained press coverage.

replies(8): >>42145306 #>>42145352 #>>42145619 #>>42145811 #>>42145883 #>>42146777 #>>42148148 #>>42151081 #

1. bambax ◴[15 Nov 24 11:19 UTC] No.42145883[source]▶

>>42144784 #

Yes, came here to say exactly this. And it's possible this specific model is "cheating", for example by identifying a chess problem and forwarding it to a chess engine. A modern version of the Mechanical Turk.

That's the problem with closed models, we can never know what they're doing.

↑

Something weird is happening with LLMs and chess