(www.zyphra.com)

282 points dataminer | 3 comments | 14 Oct 24 22:45 UTC | HN request time: 0.662s | source

1. nox101 ◴[15 Oct 24 12:35 UTC] No.41847916[source]▶

what is magic about 7B? why not 8B, 9B, 11.234B? Is 7B some power of 2 reinterpreted?

2. ikeashark ◴[15 Oct 24 12:53 UTC] No.41848051[source]▶

I believe it comes from the original Llama papers where they chose these sizes because it fits each of the standard ML compute GPUs nicely.

Model Size + Overhead (context length, etc...)

7B: 13 GB - fits on T4 (16 GB).

13B: 26 GB - fits on V100 (32 GB).

30B: 65 GB - fits on A100 (80 GB).

65B: 131 GB - fits on 2x A100 (160 GB).

That's it really.

3. calebkaiser ◴[15 Oct 24 12:58 UTC] No.41848104[source]▶

>>41847916 (TP) #

The short answer is that there is nothing magic about these numbers. Having somewhat standard sizes in the different ranges (7B for smaller models, for example) makes comparing the different architecture and training techniques more straightforward. It's more of a priority for some teams than others.

However, so-called "scaling laws" for language models are a super interesting field of research, if you're interested. I'd recommend OpenAI's 2020 paper as a good start: https://openai.com/index/scaling-laws-for-neural-language-mo...

↑

Zamba2-7B