←back to thread

407 points todsacerdoti | 3 comments | | HN request time: 0.676s | source
Show context
arjvik ◴[] No.45008810[source]
Took the article pointing out that the c and r were transposed for me to even notice there was a problem!
replies(2): >>45008822 #>>45008942 #
SoftTalker ◴[] No.45008822[source]
Yep this is the sort of typo error I make probably 10 times a day.
replies(1): >>45009673 #
1. javchz ◴[] No.45009673[source]
What it's funny it's that because tokenization there is a non zero chance a LLM audit may not see anything wrong here, similar to the strawberry problem.
replies(1): >>45012567 #
2. TobTobXX ◴[] No.45012567[source]
Nah, cr and rc are different tokens and LLMs would have no issues telling them apart. An older model might have trouble explaining that cr and rc are similar and can thus get easily mixed up, but the characters are probably more different to the LLM than they are to us.
replies(1): >>45013820 #
3. TehCorwiz ◴[] No.45013820[source]
What about all that GitHub training data using the wrong domain? Even being a different token it’s still being trained as a correct value.