←back to thread

216 points veggieroll | 1 comments | | HN request time: 0.206s | source
Show context
barbegal ◴[] No.41860730[source]
Does anyone know why Mistral use a 17 bit (131k) vocabulary? I'm sure it's more efficient at encoding text but each token doesn't fit into a 16 bit register which must make it more inefficient computationally?
replies(1): >>41865000 #
1. cpldcpu ◴[] No.41865000[source]
The tokens are immediately transformed into embeddings (very large vectors), so the 17 bit values are not used for any computation.