Something weird is happening with LLMs and chess

(dynomight.substack.com)

696 points crescit_eundo | 1 comments | 14 Nov 24 17:05 UTC | HN request time: 0.204s | source

Show context

chvid ◴[15 Nov 24 05:55 UTC] No.42144283[source]▶

>>42138289 (OP) #

Theory 5: GPT-3.5-instruct plays chess by calling a traditional chess engine.

replies(5): >>42144296 #>>42144326 #>>42144379 #>>42144517 #>>42156924 #

bubblyworld ◴[15 Nov 24 06:08 UTC] No.42144326[source]▶

>>42144283 #

Just think about the trade off from OpenAI's side here - they're going to add a bunch of complexity to gpt3.5 to let it call out to engines (either an external system monitoring all outputs for chess related stuff, or some kind of tool-assisted CoT for instance) just so it can play chess incorrectly a high percentage of the time, and even when it doesn't at a mere 1800ELO level? In return for some mentions in a few relatively obscure blog posts? Doesn't make any sense to me as an explanation.

replies(2): >>42144427 #>>42144614 #

copperx ◴[15 Nov 24 06:37 UTC] No.42144427[source]▶

>>42144326 #

But there could be a simple explanation. For example, they could have tested many "engines" when developing function calling and they just left them in there. They just happened to connect to a basic chess playing algorithm and nothing sophisticated.

Also, it makes a lot of sense if you expect people to play chess against the LLM, especially if you are later training future models on the chats.

replies(1): >>42144859 #

1. bubblyworld ◴[15 Nov 24 08:13 UTC] No.42144859[source]▶

>>42144427 #

This still requires a lot of coincidences, like they chose to use a terrible chess engine for their external tool (why?), they left it on in the background for all calls via all APIs for only gpt-3.5-turbo-instruct (why?), they see business value in this specific model being good at chess vs other things (why?).

You say it makes sense but how does it make sense for OpenAI to add overhead to all of its API calls for the super niche case of people playing 1800 ELO chess/chat bots? (that often play illegal moves, you can go try it yourself)

↑