←back to thread

296 points todsacerdoti | 1 comments | | HN request time: 0s | source
Show context
andy99 ◴[] No.44368430[source]
> inability to detect the number of r's in:strawberry: meme

Can someone (who know about LLMs) explain why the r's in strawberry thing is related to tokenization? I have no reason to believe an LLM would be better at counting letters if each was one token. It's not like they "see" any of it. Are they better at counting tokens than letters for some reason? Or is this just one of those things someone misinformed said to sound smart to even less informed people, that got picked up?

replies(7): >>44368463 #>>44369041 #>>44369608 #>>44370115 #>>44370128 #>>44374874 #>>44395946 #
ijk ◴[] No.44368463[source]
Well, which is easier:

Count the number of Rs in this sequence: [496, 675, 15717]

Count the number of 18s in this sequence: 19 20 18 1 23 2 5 18 18 25

replies(1): >>44368554 #
ASalazarMX ◴[] No.44368554[source]
For a LLM? No idea.

Human: Which is the easier of these formulas

1. x = SQRT(4)

2. x = SQRT(123567889.987654321)

Computer: They're both the same.

replies(2): >>44368891 #>>44369678 #
1. drdeca ◴[] No.44368891{3}[source]
Depending on the data types and what the hardware supports, the latter may be harder (in the sense of requiring more operations)? And for a general algorithm bigger numbers would take more steps.