←back to thread

Zamba2-7B

(www.zyphra.com)
282 points dataminer | 1 comments | | HN request time: 0.201s | source
Show context
nox101 ◴[] No.41847916[source]
what is magic about 7B? why not 8B, 9B, 11.234B? Is 7B some power of 2 reinterpreted?
replies(2): >>41848051 #>>41848104 #
1. calebkaiser ◴[] No.41848104[source]
The short answer is that there is nothing magic about these numbers. Having somewhat standard sizes in the different ranges (7B for smaller models, for example) makes comparing the different architecture and training techniques more straightforward. It's more of a priority for some teams than others.

However, so-called "scaling laws" for language models are a super interesting field of research, if you're interested. I'd recommend OpenAI's 2020 paper as a good start: https://openai.com/index/scaling-laws-for-neural-language-mo...