←back to thread

228 points nkko | 1 comments | | HN request time: 0.001s | source
Show context
neuroelectron ◴[] No.43888491[source]
Love this and the way everything is mapped out and explained simply really opens up the opportunity for trying new things, and where you can do that effectively.

For instance, why not use whole words as tokens? Make a "robot" with a limited "robot dialect." Yes, no capacity for new words or rare words, but you could modify the training data and input data to translate those words into the existing vocabulary. Now you have a much smaller mapping that's literally robot-like and kind of gives the user an expectation of what kind of answers the robot can answer well, like C-3PO.

replies(1): >>43888647 #
minimaxir ◴[] No.43888647[source]
> For instance, why not use whole words as tokens?

Word-only tokenizers what people did in the RNN/LSTM days. There's no functional improvement over tokenization schemes like BPE or even WordPiece/SentencePiece, and it results in worse quality since you can't use meaningful semantic hints such as punctuation.

replies(1): >>43888862 #
1. neuroelectron ◴[] No.43888862[source]
You can encode semantic hints in the layers instead. Admittedly, this is more expensive which is kind to counter of the word-as-tokens idea.