(huggingface.co)

345 points kashifr | 1 comments | 08 Jul 25 16:13 UTC | HN request time: 0.201s | source

Show context

tiahura ◴[08 Jul 25 17:01 UTC] No.44501872[source]▶

Can anyone estimate how much of the 3B is necessitated by multi-language support?

replies(3): >>44502099 #>>44509476 #>>44509763 #

1. ethan_smith ◴[09 Jul 25 12:55 UTC] No.44509476[source]▶

Typically, multilingual capabilities consume 20-30% of model parameters in small LLMs, primarily in token embeddings and early transformer layers. Monolingual variants of similar models often perform better on English benchmarks with the same parameter count.

↑

Smollm3: Smol, multilingual, long-context reasoner LLM