Zamba2-7B | slacker news

The short answer is that there is nothing magic about these numbers. Having somewhat standard sizes in the different ranges (7B for smaller models, for example) makes comparing the different architecture and training techniques more straightforward. It's more of a priority for some teams than others.

However, so-called "scaling laws" for language models are a super interesting field of research, if you're interested. I'd recommend OpenAI's 2020 paper as a good start: https://openai.com/index/scaling-laws-for-neural-language-mo...