←back to thread

688 points crescit_eundo | 1 comments | | HN request time: 0s | source
Show context
niobe ◴[] No.42142885[source]
I don't understand why educated people expect that an LLM would be able to play chess at a decent level.

It has no idea about the quality of it's data. "Act like x" prompts are no substitute for actual reasoning and deterministic computation which clearly chess requires.

replies(20): >>42142963 #>>42143021 #>>42143024 #>>42143060 #>>42143136 #>>42143208 #>>42143253 #>>42143349 #>>42143949 #>>42144041 #>>42144146 #>>42144448 #>>42144487 #>>42144490 #>>42144558 #>>42144621 #>>42145171 #>>42145383 #>>42146513 #>>42147230 #
computerex ◴[] No.42142963[source]
Question here is why gpt-3.5-instruct can then beat stockfish.
replies(4): >>42142975 #>>42143081 #>>42143181 #>>42143889 #
lukan ◴[] No.42143181[source]
Cheating (using a internal chess engine) would be the obvious reason to me.
replies(2): >>42143214 #>>42165535 #
TZubiri ◴[] No.42143214[source]
Nope. Calls by api don't use functions calls.
replies(2): >>42143226 #>>42144027 #
permo-w ◴[] No.42143226{4}[source]
that you know of
replies(1): >>42150883 #
TZubiri ◴[] No.42150883{5}[source]
Sure. It's not hard to verify, in the user ui, function calls are very transparent.

And in the api, all of the common features like maths and search are just not there. You can implement them yourself.

You can compare with self hosted models like llama and the performance is quite similar.

You can also jailbreak and get shell into the container to get some further proof

replies(1): >>42157065 #
1. permo-w ◴[] No.42157065{6}[source]
this is all just guesswork. it's a black box. you have no idea what post-processing they're doing on their end