Dummy's Guide to Modern LLM Sampling

(rentry.co)

228 points nkko | 1 comments | 04 May 25 16:26 UTC | HN request time: 0.001s | source

Show context

neuroelectron ◴[04 May 25 18:34 UTC] No.43888491[source]▶

Love this and the way everything is mapped out and explained simply really opens up the opportunity for trying new things, and where you can do that effectively.

For instance, why not use whole words as tokens? Make a "robot" with a limited "robot dialect." Yes, no capacity for new words or rare words, but you could modify the training data and input data to translate those words into the existing vocabulary. Now you have a much smaller mapping that's literally robot-like and kind of gives the user an expectation of what kind of answers the robot can answer well, like C-3PO.

replies(1): >>43888647 #

minimaxir ◴[04 May 25 18:56 UTC] No.43888647[source]▶

>>43888491 #

> For instance, why not use whole words as tokens?

Word-only tokenizers what people did in the RNN/LSTM days. There's no functional improvement over tokenization schemes like BPE or even WordPiece/SentencePiece, and it results in worse quality since you can't use meaningful semantic hints such as punctuation.

replies(1): >>43888862 #

1. neuroelectron ◴[04 May 25 19:24 UTC] No.43888862[source]▶

>>43888647 #

You can encode semantic hints in the layers instead. Admittedly, this is more expensive which is kind to counter of the word-as-tokens idea.

↑