(dynomight.substack.com)

696 points crescit_eundo | 1 comments | 14 Nov 24 17:05 UTC | HN request time: 0.301s | source

Show context

fsndz ◴[15 Nov 24 00:46 UTC] No.42142922[source]▶

wow I actually did something similar recently and no LLM could win and the centipawn loss was always going through the roof (sort of). I created a leaderboard based on it. https://www.lycee.ai/blog/what-happens-when-llms-play-chess

I am very surprised by the perf of got-3.5-turbo-instruct. Beating stockfish ? I will have to run the experiment with that model to check that out

replies(1): >>42142971 #

fsndz ◴[15 Nov 24 00:56 UTC] No.42142971[source]▶

PS: I ran and as suspected got-3.5-turbo-instruct does not beat stockfish, it is not even close

"Final Results: gpt-3.5-turbo-instruct: Wins=0, Losses=6, Draws=0, Rating=1500.00 stockfish: Wins=6, Losses=0, Draws=0, Rating=1500.00"

1. ◴[15 Nov 24 02:00 UTC] No.42143295[source]▶

Something weird is happening with LLMs and chess