←back to thread

695 points crescit_eundo | 7 comments | | HN request time: 1.254s | source | bottom
Show context
niobe ◴[] No.42142885[source]
I don't understand why educated people expect that an LLM would be able to play chess at a decent level.

It has no idea about the quality of it's data. "Act like x" prompts are no substitute for actual reasoning and deterministic computation which clearly chess requires.

replies(20): >>42142963 #>>42143021 #>>42143024 #>>42143060 #>>42143136 #>>42143208 #>>42143253 #>>42143349 #>>42143949 #>>42144041 #>>42144146 #>>42144448 #>>42144487 #>>42144490 #>>42144558 #>>42144621 #>>42145171 #>>42145383 #>>42146513 #>>42147230 #
computerex ◴[] No.42142963[source]
Question here is why gpt-3.5-instruct can then beat stockfish.
replies(4): >>42142975 #>>42143081 #>>42143181 #>>42143889 #
1. lukan ◴[] No.42143181[source]
Cheating (using a internal chess engine) would be the obvious reason to me.
replies(2): >>42143214 #>>42165535 #
2. TZubiri ◴[] No.42143214[source]
Nope. Calls by api don't use functions calls.
replies(2): >>42143226 #>>42144027 #
3. permo-w ◴[] No.42143226[source]
that you know of
replies(1): >>42150883 #
4. girvo ◴[] No.42144027[source]
How can you prove this when talking about someones internal closed API?
5. TZubiri ◴[] No.42150883{3}[source]
Sure. It's not hard to verify, in the user ui, function calls are very transparent.

And in the api, all of the common features like maths and search are just not there. You can implement them yourself.

You can compare with self hosted models like llama and the performance is quite similar.

You can also jailbreak and get shell into the container to get some further proof

replies(1): >>42157065 #
6. permo-w ◴[] No.42157065{4}[source]
this is all just guesswork. it's a black box. you have no idea what post-processing they're doing on their end
7. nske ◴[] No.42165535[source]
But in that case there shouldn't be any invalid moves, ever. Another tester found gpt-3.5-turbo-instruct to be suggesting at least one illegal move in 16% of the games (source: https://blog.mathieuacher.com/GPTsChessEloRatingLegalMoves/ )