(dynomight.substack.com)

696 points crescit_eundo | 1 comments | 14 Nov 24 17:05 UTC | HN request time: 0s | source

Show context

snickerbockers ◴[15 Nov 24 08:31 UTC] No.42144943[source]▶

Does it ever try an illegal move? OP didn't mention this and I think it's inevitable that it should happen at least once, since the rules of chess are fairly arbitrary and LLMs are notorious for bullshitting their way through difficult problems when we'd rather they just admit that they don't have the answer.

replies(2): >>42145004 #>>42145793 #

sethherr ◴[15 Nov 24 08:42 UTC] No.42145004[source]▶

>>42144943 #

Yes, he discusses using a grammar to restrict to only legal moves

replies(4): >>42147380 #>>42148708 #>>42150800 #>>42152205 #

yshui ◴[15 Nov 24 20:46 UTC] No.42150800[source]▶

>>42145004 #

I suspect the models probably memorized some chess openings, and afterwards they are just playing random moves with the help of the grammar.

replies(1): >>42151787 #

1. gs17 ◴[15 Nov 24 22:11 UTC] No.42151787[source]▶

>>42150800 #

I suspect that as well, however, 3.5-turbo-instruct has been noted by other people to do much better at generating legal chess moves than the other models. https://github.com/adamkarvonen/chess_gpt_eval gave models "5 illegal moves before forced resignation of the round" and 3.5 had very few illegal moves, while 4 lost most games due to illegal moves.

↑

Something weird is happening with LLMs and chess