←back to thread

345 points kashifr | 1 comments | | HN request time: 0.201s | source
Show context
tiahura ◴[] No.44501872[source]
Can anyone estimate how much of the 3B is necessitated by multi-language support?
replies(3): >>44502099 #>>44509476 #>>44509763 #
1. ethan_smith ◴[] No.44509476[source]
Typically, multilingual capabilities consume 20-30% of model parameters in small LLMs, primarily in token embeddings and early transformer layers. Monolingual variants of similar models often perform better on English benchmarks with the same parameter count.