←back to thread

Zamba2-7B

(www.zyphra.com)
282 points dataminer | 3 comments | | HN request time: 0.348s | source
1. nox101 ◴[] No.41847916[source]
what is magic about 7B? why not 8B, 9B, 11.234B? Is 7B some power of 2 reinterpreted?
replies(2): >>41848051 #>>41848104 #
2. ikeashark ◴[] No.41848051[source]
I believe it comes from the original Llama papers where they chose these sizes because it fits each of the standard ML compute GPUs nicely.

Model Size + Overhead (context length, etc...)

7B: 13 GB - fits on T4 (16 GB).

13B: 26 GB - fits on V100 (32 GB).

30B: 65 GB - fits on A100 (80 GB).

65B: 131 GB - fits on 2x A100 (160 GB).

That's it really.

3. calebkaiser ◴[] No.41848104[source]
The short answer is that there is nothing magic about these numbers. Having somewhat standard sizes in the different ranges (7B for smaller models, for example) makes comparing the different architecture and training techniques more straightforward. It's more of a priority for some teams than others.

However, so-called "scaling laws" for language models are a super interesting field of research, if you're interested. I'd recommend OpenAI's 2020 paper as a good start: https://openai.com/index/scaling-laws-for-neural-language-mo...