(www.zyphra.com)

282 points dataminer | 1 comments | 14 Oct 24 22:45 UTC | HN request time: 0.261s | source

Show context

nox101 ◴[15 Oct 24 12:35 UTC] No.41847916[source]▶

what is magic about 7B? why not 8B, 9B, 11.234B? Is 7B some power of 2 reinterpreted?

1. ikeashark ◴[15 Oct 24 12:53 UTC] No.41848051[source]▶

I believe it comes from the original Llama papers where they chose these sizes because it fits each of the standard ML compute GPUs nicely.

Model Size + Overhead (context length, etc...)

7B: 13 GB - fits on T4 (16 GB).

13B: 26 GB - fits on V100 (32 GB).

30B: 65 GB - fits on A100 (80 GB).

65B: 131 GB - fits on 2x A100 (160 GB).

That's it really.

Zamba2-7B