what is magic about 7B? why not 8B, 9B, 11.234B? Is 7B some power of 2 reinterpreted?
replies(2):
However, so-called "scaling laws" for language models are a super interesting field of research, if you're interested. I'd recommend OpenAI's 2020 paper as a good start: https://openai.com/index/scaling-laws-for-neural-language-mo...