←back to thread

343 points kashifr | 4 comments | | HN request time: 0.405s | source
1. tiahura ◴[] No.44501872[source]
Can anyone estimate how much of the 3B is necessitated by multi-language support?
replies(3): >>44502099 #>>44509476 #>>44509763 #
2. rockinghigh ◴[] No.44502099[source]
The vocabulary size is fairly small (128,256) for a multilingual model. I would guess it doesn't require many additional parameters to support these 5 languages as many tokens can be shared.
3. ethan_smith ◴[] No.44509476[source]
Typically, multilingual capabilities consume 20-30% of model parameters in small LLMs, primarily in token embeddings and early transformer layers. Monolingual variants of similar models often perform better on English benchmarks with the same parameter count.
4. netdur ◴[] No.44509763[source]
naive look, 2/3 of model, without multi-languages this shiuld be around 1B