what is magic about 7B? why not 8B, 9B, 11.234B? Is 7B some power of 2 reinterpreted?
replies(2):
Model Size + Overhead (context length, etc...)
7B: 13 GB - fits on T4 (16 GB).
13B: 26 GB - fits on V100 (32 GB).
30B: 65 GB - fits on A100 (80 GB).
65B: 131 GB - fits on 2x A100 (160 GB).
That's it really.
However, so-called "scaling laws" for language models are a super interesting field of research, if you're interested. I'd recommend OpenAI's 2020 paper as a good start: https://openai.com/index/scaling-laws-for-neural-language-mo...