Most active commenters

Jerrrrrrry(3)

Popular/hot comments

>>42145905 #

←back to thread

Something weird is happening with LLMs and chess

(dynomight.substack.com)

Show context

swiftcoder ◴[15 Nov 24 07:57 UTC] No.42144784[source]▶

>>42138289 (OP) #

I feel like the article neglects one obvious possibility: that OpenAI decided that chess was a benchmark worth "winning", special-cases chess within gpt-3.5-turbo-instruct, and then neglected to add that special-case to follow-up models since it wasn't generating sustained press coverage.

replies(8): >>42145306 #>>42145352 #>>42145619 #>>42145811 #>>42145883 #>>42146777 #>>42148148 #>>42151081 #

dmurray ◴[15 Nov 24 09:46 UTC] No.42145352[source]▶

>>42144784 #

This seems quite likely to me, but did they special case it by reinforcement training it into the LLM (which would be extremely interesting in how they did it and what its internal representation looks like) or is it just that when you make an API call to OpenAI, the machine on the other end is not just a zillion-parameter LLM but also runs an instance of Stockfish?

replies(1): >>42145408 #

shaky-carrousel ◴[15 Nov 24 09:56 UTC] No.42145408[source]▶

>>42145352 #

That's easy to test, invent a new chess variant and see how the model does.

replies(3): >>42145466 #>>42145557 #>>42146160 #

1. andy_ppp ◴[15 Nov 24 10:26 UTC] No.42145557[source]▶

>>42145408 #

You're imagining LLMs don't just regurgitate and recombine things they already know from things they have seen before. A new variant would not be in the dataset so would not be understood. In fact this is quite a good way to show LLMs are NOT thinking or understanding anything in the way we understand it.

replies(2): >>42145905 #>>42147218 #

2. shaky-carrousel ◴[15 Nov 24 11:23 UTC] No.42145905[source]▶

>>42145557 (TP) #

Yes, that's how you can really tell if the model is doing real thinking and not recombinating things. If it can correctly play a novel game, then it's doing more than that.

replies(3): >>42146014 #>>42146022 #>>42146190 #

3. dwighttk ◴[15 Nov 24 11:39 UTC] No.42146014[source]▶

>>42145905 #

No LLM model is doing any thinking.

replies(1): >>42146320 #

4. jahnu ◴[15 Nov 24 11:40 UTC] No.42146022[source]▶

>>42145905 #

I wonder what the minimal amount of change qualifies as novel?

"Chess but white and black swap their knights" for example?

replies(1): >>42147158 #

5. timdiggerm ◴[15 Nov 24 12:12 UTC] No.42146190[source]▶

>>42145905 #

By that standard (and it is a good standard), none of these "AI" things are doing any thinking

replies(1): >>42147408 #

6. selestify ◴[15 Nov 24 12:33 UTC] No.42146320{3}[source]▶

>>42146014 #

How do you define thinking?

replies(2): >>42146586 #>>42151638 #

7. antononcube ◴[15 Nov 24 13:12 UTC] No.42146586{4}[source]▶

>>42146320 #

Being fast at doing linear algebra computations. (Is there any other kind?!)

8. the_af ◴[15 Nov 24 14:18 UTC] No.42147158{3}[source]▶

>>42146022 #

I wonder what would happen with a game that is mostly chess (or chess with truly minimal variations) but with all the names changed (pieces, moves, "check", etc, all changed). The algebraic notation is also replaced with something else so it cannot be pattern matched against the training data. Then you list the rules (which are mostly the same as chess).

None of these changes are explained to the LLM, so if it can tell it's still chess, it must deduce this on its own.

Would any LLM be able to play at a decent level?

replies(1): >>42152352 #

9. empath75 ◴[15 Nov 24 14:25 UTC] No.42147218[source]▶

>>42145557 (TP) #

You say this quite confidently, but LLMs do generalize somewhat.

10. Jerrrrrrry ◴[15 Nov 24 14:51 UTC] No.42147408{3}[source]▶

>>42146190 #

musical goalposts, gotta love it.

These LLM's just exhibited agency.

Swallow your pride.

replies(1): >>42147976 #

11. samatman ◴[15 Nov 24 15:52 UTC] No.42147976{4}[source]▶

>>42147408 #

"Does it generalize past the training data" has been a pre-registered goalpost since before the attention transformer architecture came on the scene.

replies(1): >>42148394 #

12. Jerrrrrrry ◴[15 Nov 24 16:29 UTC] No.42148394{5}[source]▶

>>42147976 #

  >'thinking' vs 'just recombinating things

If there is a difference, and LLM's can do one but not the other...

  >By that standard (and it is a good standard), none of these "AI" things are doing any thinking

  >"Does it generalize past the training data" has been a pre-registered goalpost since before the attention transformer architecture came on the scene.

Then what the fuck are they doing.

Learning is thinking, reasoning, what have you.

Move goalposts, re-define words, it won't matter.

13. landryraccoon ◴[15 Nov 24 21:57 UTC] No.42151638{4}[source]▶

>>42146320 #

Making the OP feel threatened/emotionally attached/both enough to call the language model a rival / companion / peer instead of a tool.

replies(1): >>42176541 #

14. jahnu ◴[15 Nov 24 23:11 UTC] No.42152352{4}[source]▶

>>42147158 #

Nice. Even the tiniest rule, I strongly suspect, would throw off pattern matching. “Every second move, swap the name of the piece you move to the last piece you moved.”

15. Jerrrrrrry ◴[18 Nov 24 20:25 UTC] No.42176541{5}[source]▶

>>42151638 #

Lolol. It's a chess thread, say it.

We are pawns, hoping to be maybe a Rook to the King by endgame.

Some think we can promote our pawns to Queens to match.

Luckily, the Jester muses!

↑