Something weird is happening with LLMs and chess

(dynomight.substack.com)

696 points crescit_eundo | 3 comments | 14 Nov 24 17:05 UTC | HN request time: 0.001s | source

Show context

swiftcoder ◴[15 Nov 24 07:57 UTC] No.42144784[source]▶

I feel like the article neglects one obvious possibility: that OpenAI decided that chess was a benchmark worth "winning", special-cases chess within gpt-3.5-turbo-instruct, and then neglected to add that special-case to follow-up models since it wasn't generating sustained press coverage.

replies(8): >>42145306 #>>42145352 #>>42145619 #>>42145811 #>>42145883 #>>42146777 #>>42148148 #>>42151081 #

scott_w ◴[15 Nov 24 11:10 UTC] No.42145811[source]▶

>>42144784 #

I suspect the same thing. Rather than LLMs “learning to play chess,” they “learnt” to recognise a chess game and hand over instructions to a chess engine. If that’s the case, I don’t feel impressed at all.

replies(5): >>42146086 #>>42146152 #>>42146383 #>>42146415 #>>42156785 #

Kiro ◴[15 Nov 24 12:06 UTC] No.42146152[source]▶

>>42145811 #

That's something completely different than what the OP suggests and would be a scandal if true (i.e. gpt-3.5-turbo-instruct actually using something else behind the scenes).

replies(3): >>42146324 #>>42147204 #>>42151029 #

nerdponx ◴[15 Nov 24 12:34 UTC] No.42146324[source]▶

>>42146152 #

Ironically it's probably a lot closer to what a super-human AGI would look like in practice, compared to just an LLM alone.

replies(2): >>42146675 #>>42149673 #

1. dartos ◴[15 Nov 24 18:47 UTC] No.42149673[source]▶

>>42146324 #

So… we’re at expert systems again?

That’s how the AI winter started last time.

replies(1): >>42157158 #

2. kadoban ◴[16 Nov 24 16:22 UTC] No.42157158[source]▶

>>42149673 (TP) #

What is an "expert system" to you? In AI they're just series of if-then statements to encode certain rules. What non-trivial part of an LLM reaching out to a chess AI does that describe?

replies(1): >>42160230 #

3. dartos ◴[16 Nov 24 22:58 UTC] No.42160230[source]▶

>>42157158 #

The initial LLM acts as an intention detection mechanism switch.

To personify LLM way too much:

It sees that a prompt of some kind wants to play chess.

Knowing this it looks at the bag of “tools” and sees a chess tool. It then generates a response which eventually causes a call to a chess AI (or just chess program, potentially) which does further processing.

The first LLM acts as a ton of if-then statements, but automatically generated (or brute-forcly discovered) through training.

You still needed discrete parts for this system. Some communication protocol, an intent detection step, a chess execution step, etc…

I don’t see how that differs from a classic expert system other than the if statement is handled by a statistical model.

↑