(dynomight.substack.com)

696 points crescit_eundo | 4 comments | 14 Nov 24 17:05 UTC | HN request time: 0.001s | source

Show context

swiftcoder ◴[15 Nov 24 07:57 UTC] No.42144784[source]▶

I feel like the article neglects one obvious possibility: that OpenAI decided that chess was a benchmark worth "winning", special-cases chess within gpt-3.5-turbo-instruct, and then neglected to add that special-case to follow-up models since it wasn't generating sustained press coverage.

replies(8): >>42145306 #>>42145352 #>>42145619 #>>42145811 #>>42145883 #>>42146777 #>>42148148 #>>42151081 #

dmurray ◴[15 Nov 24 09:46 UTC] No.42145352[source]▶

>>42144784 #

This seems quite likely to me, but did they special case it by reinforcement training it into the LLM (which would be extremely interesting in how they did it and what its internal representation looks like) or is it just that when you make an API call to OpenAI, the machine on the other end is not just a zillion-parameter LLM but also runs an instance of Stockfish?

replies(1): >>42145408 #

shaky-carrousel ◴[15 Nov 24 09:56 UTC] No.42145408[source]▶

>>42145352 #

That's easy to test, invent a new chess variant and see how the model does.

replies(3): >>42145466 #>>42145557 #>>42146160 #

1. gliptic ◴[15 Nov 24 10:06 UTC] No.42145466[source]▶

>>42145408 #

Both an LLM and Stockfish would fail that test.

replies(1): >>42146130 #

2. delusional ◴[15 Nov 24 12:00 UTC] No.42146130[source]▶

>>42145466 (TP) #

Nobody is claiming that Stockfish is learning generalizable concepts that can one day meaningfully replace people in value creating work.