Something weird is happening with LLMs and chess

Show context

chvid ◴[15 Nov 24 05:55 UTC] No.42144283[source]▶

Theory 5: GPT-3.5-instruct plays chess by calling a traditional chess engine.

replies(5): >>42144296 #>>42144326 #>>42144379 #>>42144517 #>>42156924 #

bubblyworld ◴[15 Nov 24 06:08 UTC] No.42144326[source]▶

Just think about the trade off from OpenAI's side here - they're going to add a bunch of complexity to gpt3.5 to let it call out to engines (either an external system monitoring all outputs for chess related stuff, or some kind of tool-assisted CoT for instance) just so it can play chess incorrectly a high percentage of the time, and even when it doesn't at a mere 1800ELO level? In return for some mentions in a few relatively obscure blog posts? Doesn't make any sense to me as an explanation.

replies(2): >>42144427 #>>42144614 #

usrusr ◴[15 Nov 24 07:21 UTC] No.42144614[source]▶

>>42144326 #

Could be a pilot implementation to learn about how to link up external specialist engines. Chess would be the obvious example to start with because the problem is so well known, standardized and specialist engines are easily available. If they ever want to offer an integration like that to customers (who might have some existing rule based engine in house), the need to know everything they can about expected cost, performance.

replies(1): >>42144821 #

bubblyworld ◴[15 Nov 24 08:05 UTC] No.42144821[source]▶

>>42144614 #

This doesn't address its terrible performance. If it were touching anything like a real engine it would be playing at a superhuman level, not the level of a upper-tier beginner.

replies(2): >>42145541 #>>42148929 #

9dev ◴[15 Nov 24 10:21 UTC] No.42145541[source]▶

>>42144821 #

That would have immediately given away that something must be off. If you want to do this in a subtle way that increases the hype around GPT-3.5 at the time, giving it a good-but-not-too-good rating would be the way to go.

replies(1): >>42147459 #

bubblyworld ◴[15 Nov 24 14:56 UTC] No.42147459[source]▶

>>42145541 #

If you want to keep adding conditions to an already-complex theory, you'll need an equally complex set of observations to justify it.

replies(1): >>42148203 #

samatman ◴[15 Nov 24 16:12 UTC] No.42148203[source]▶

>>42147459 #

You're the one imposing an additional criterion, that OpenAI must have chosen the highest setting on a chess engine, and demanding that this additional criterion be used to explain the facts.

I agree with GP that if a 'fine tuning' of GPT 3.5 came out the gate playing at top Stockfish level, people would have been extremely suspicious of that. So in my accounting of the unknowns here, the fact that it doesn't play at the top level provides no additional information with which to resolve the question.

replies(5): >>42148525 #>>42148570 #>>42148689 #>>42148759 #>>42154446 #