Most active commenters

    ←back to thread

    695 points crescit_eundo | 16 comments | | HN request time: 1.401s | source | bottom
    Show context
    niobe ◴[] No.42142885[source]
    I don't understand why educated people expect that an LLM would be able to play chess at a decent level.

    It has no idea about the quality of it's data. "Act like x" prompts are no substitute for actual reasoning and deterministic computation which clearly chess requires.

    replies(20): >>42142963 #>>42143021 #>>42143024 #>>42143060 #>>42143136 #>>42143208 #>>42143253 #>>42143349 #>>42143949 #>>42144041 #>>42144146 #>>42144448 #>>42144487 #>>42144490 #>>42144558 #>>42144621 #>>42145171 #>>42145383 #>>42146513 #>>42147230 #
    1. computerex ◴[] No.42142963[source]
    Question here is why gpt-3.5-instruct can then beat stockfish.
    replies(4): >>42142975 #>>42143081 #>>42143181 #>>42143889 #
    2. fsndz ◴[] No.42142975[source]
    PS: I ran and as suspected got-3.5-turbo-instruct does not beat stockfish, it is not even close "Final Results: gpt-3.5-turbo-instruct: Wins=0, Losses=6, Draws=0, Rating=1500.00 stockfish: Wins=6, Losses=0, Draws=0, Rating=1500.00" https://www.loom.com/share/870ea03197b3471eaf7e26e9b17e1754?...
    replies(1): >>42142993 #
    3. computerex ◴[] No.42142993[source]
    Maybe there's some difference in the setup because the OP reports that the model beats stockfish (how they had it configured) every single game.
    replies(2): >>42143059 #>>42144502 #
    4. Filligree ◴[] No.42143059{3}[source]
    OP had stockfish at its weakest preset.
    replies(1): >>42143193 #
    5. bluGill ◴[] No.42143081[source]
    The artical appears to have only run stockfish at low levels. you don't have to be very good to beat it
    6. lukan ◴[] No.42143181[source]
    Cheating (using a internal chess engine) would be the obvious reason to me.
    replies(2): >>42143214 #>>42165535 #
    7. fsndz ◴[] No.42143193{4}[source]
    Did the same and gpt-3.5-turbo-instruct still lost all the games. maybe a diff in stockfish version ? I am using stockfish 16
    replies(1): >>42143999 #
    8. TZubiri ◴[] No.42143214[source]
    Nope. Calls by api don't use functions calls.
    replies(2): >>42143226 #>>42144027 #
    9. permo-w ◴[] No.42143226{3}[source]
    that you know of
    replies(1): >>42150883 #
    10. shric ◴[] No.42143889[source]
    I'm actually surprised any of them manage to make legal moves throughout the game once out of book moves.
    11. mannykannot ◴[] No.42143999{5}[source]
    That is a very pertinent question, especially if Stockfish has been used to generate training data.
    12. girvo ◴[] No.42144027{3}[source]
    How can you prove this when talking about someones internal closed API?
    13. golol ◴[] No.42144502{3}[source]
    You have to get the model to think in PGN data. It's crucial to use the exact PGN format it sae in its training data and to give it few shot examples.
    14. TZubiri ◴[] No.42150883{4}[source]
    Sure. It's not hard to verify, in the user ui, function calls are very transparent.

    And in the api, all of the common features like maths and search are just not there. You can implement them yourself.

    You can compare with self hosted models like llama and the performance is quite similar.

    You can also jailbreak and get shell into the container to get some further proof

    replies(1): >>42157065 #
    15. permo-w ◴[] No.42157065{5}[source]
    this is all just guesswork. it's a black box. you have no idea what post-processing they're doing on their end
    16. nske ◴[] No.42165535[source]
    But in that case there shouldn't be any invalid moves, ever. Another tester found gpt-3.5-turbo-instruct to be suggesting at least one illegal move in 16% of the games (source: https://blog.mathieuacher.com/GPTsChessEloRatingLegalMoves/ )