←back to thread

688 points crescit_eundo | 1 comments | | HN request time: 0s | source
Show context
fsndz ◴[] No.42142922[source]
wow I actually did something similar recently and no LLM could win and the centipawn loss was always going through the roof (sort of). I created a leaderboard based on it. https://www.lycee.ai/blog/what-happens-when-llms-play-chess

I am very surprised by the perf of got-3.5-turbo-instruct. Beating stockfish ? I will have to run the experiment with that model to check that out

replies(1): >>42142971 #
fsndz ◴[] No.42142971[source]
PS: I ran and as suspected got-3.5-turbo-instruct does not beat stockfish, it is not even close

"Final Results: gpt-3.5-turbo-instruct: Wins=0, Losses=6, Draws=0, Rating=1500.00 stockfish: Wins=6, Losses=0, Draws=0, Rating=1500.00"

https://www.loom.com/share/870ea03197b3471eaf7e26e9b17e1754?...

replies(3): >>42143260 #>>42143295 #>>42145596 #
janalsncm ◴[] No.42143260[source]
> I always had the LLM play as white against Stockfish—a standard chess AI—on the lowest difficulty setting

I think the author was comparing against Stockfish at a lower skill level (roughly, the number of nodes explored in a move).

replies(1): >>42143574 #
fsndz ◴[] No.42143574{3}[source]
Did the same and gpt-3.5-turbo-instruct still lost all the games. maybe a diff in stockfish version ? I am using stockfish 16
replies(1): >>42149947 #
1. janalsncm ◴[] No.42149947{4}[source]
Huh. Honestly, your answer makes more sense, LLMs shouldn’t be good at chess, and this anomaly looks more like a bug. Maybe the author should share his code so it can be replicated.