Something weird is happening with LLMs and chess

Show context

swiftcoder ◴[15 Nov 24 07:57 UTC] No.42144784[source]▶

I feel like the article neglects one obvious possibility: that OpenAI decided that chess was a benchmark worth "winning", special-cases chess within gpt-3.5-turbo-instruct, and then neglected to add that special-case to follow-up models since it wasn't generating sustained press coverage.

replies(8): >>42145306 #>>42145352 #>>42145619 #>>42145811 #>>42145883 #>>42146777 #>>42148148 #>>42151081 #

scott_w ◴[15 Nov 24 11:10 UTC] No.42145811[source]▶

>>42144784 #

I suspect the same thing. Rather than LLMs “learning to play chess,” they “learnt” to recognise a chess game and hand over instructions to a chess engine. If that’s the case, I don’t feel impressed at all.

replies(5): >>42146086 #>>42146152 #>>42146383 #>>42146415 #>>42156785 #

Kiro ◴[15 Nov 24 12:06 UTC] No.42146152[source]▶

>>42145811 #

That's something completely different than what the OP suggests and would be a scandal if true (i.e. gpt-3.5-turbo-instruct actually using something else behind the scenes).

replies(3): >>42146324 #>>42147204 #>>42151029 #

nerdponx ◴[15 Nov 24 12:34 UTC] No.42146324[source]▶

>>42146152 #

Ironically it's probably a lot closer to what a super-human AGI would look like in practice, compared to just an LLM alone.

replies(2): >>42146675 #>>42149673 #

sanderjd ◴[15 Nov 24 13:22 UTC] No.42146675[source]▶

>>42146324 #

Right. To me, this is the "agency" thing, that I still feel like is somewhat missing in contemporary AI, despite all the focus on "agents".

If I tell an "agent", whether human or artificial, to win at chess, it is a good decision for that agent to decide to delegate that task to a system that is good at chess. This would be obvious to a human agent, so presumably it should be obvious to an AI as well.

This isn't useful for AI researchers, I suppose, but it's more useful as a tool.

(This may all be a good thing, as giving AIs true agency seems scary.)

replies(1): >>42147515 #

scott_w ◴[15 Nov 24 15:02 UTC] No.42147515[source]▶

>>42146675 #

If this was part of the offering: “we can recognise requests and delegate them to appropriate systems,” I’d understand and be somewhat impressed but the marketing hype is missing this out.

Most likely because they want people to think the system is better than it is for hype purposes.

I should temper my level of impressed with only if it’s doing this dynamically . Hardcoding recognition of chess moves isn’t exactly a difficult trick to pull given there’s like 3 standard formats…

replies(2): >>42148468 #>>42149134 #