(dynomight.substack.com)

696 points crescit_eundo | 3 comments | 14 Nov 24 17:05 UTC | HN request time: 0.828s | source

1. tqi ◴[14 Nov 24 23:47 UTC] No.42142548[source]▶

I assume LLMs will be fairly average at chess for the same reason it cant count Rs in Strawberry - it's reflecting the training set and not using any underlying logic? Granted my understanding of LLMs is not very sophisticated, but I would be surprised if the Reward Models used were able to distinguish high quality moves vs subpar moves...

replies(1): >>42143439 #

2. ClassyJacket ◴[15 Nov 24 02:27 UTC] No.42143439[source]▶

>>42142548 (TP) #

LLMs can't count the Rs in strawberry because of tokenization. Words are converted to vectors (numbers), so the actual transformer network never sees the letters that make up the word.

ChatGPT doesn't see "strawberry", it sees [302, 1618, 19772]

replies(1): >>42147965 #

3. tqi ◴[15 Nov 24 15:51 UTC] No.42147965[source]▶

>>42143439 #

Hm but if that is the case, then why did LLMs only fail at the tasks for a few word/letter combinations (like r's in "Strawberry"), and not all words?

↑

Something weird is happening with LLMs and chess