←back to thread

688 points crescit_eundo | 3 comments | | HN request time: 0.641s | source
Show context
swiftcoder ◴[] No.42144784[source]
I feel like the article neglects one obvious possibility: that OpenAI decided that chess was a benchmark worth "winning", special-cases chess within gpt-3.5-turbo-instruct, and then neglected to add that special-case to follow-up models since it wasn't generating sustained press coverage.
replies(8): >>42145306 #>>42145352 #>>42145619 #>>42145811 #>>42145883 #>>42146777 #>>42148148 #>>42151081 #
1. amelius ◴[] No.42145619[source]
To be fair, they say

> Theory 2: GPT-3.5-instruct was trained on more chess games.

replies(1): >>42146129 #
2. AstralStorm ◴[] No.42146129[source]
If that were the case, pumping big Llama chock full of chess games would produce good results. It didn't.

The only way it could be true is if that model recognized and replayed the answer to the game from memory.

replies(1): >>42146631 #
3. yorwba ◴[] No.42146631[source]
Do you have a link to the results from fine-tuning a Llama model on chess? How do they compare to the base models in the article here?