←back to thread

296 points todsacerdoti | 1 comments | | HN request time: 0.227s | source
Show context
andy99 ◴[] No.44368430[source]
> inability to detect the number of r's in:strawberry: meme

Can someone (who know about LLMs) explain why the r's in strawberry thing is related to tokenization? I have no reason to believe an LLM would be better at counting letters if each was one token. It's not like they "see" any of it. Are they better at counting tokens than letters for some reason? Or is this just one of those things someone misinformed said to sound smart to even less informed people, that got picked up?

replies(7): >>44368463 #>>44369041 #>>44369608 #>>44370115 #>>44370128 #>>44374874 #>>44395946 #
1. skerit ◴[] No.44374874[source]
LLMs aren't necessarily taught the characters their tokens represent. It's kind of the same how some humans are able to speak a language, but not write it. We are basically "transcribing" what LLMs are saying into text.