(github.com)

1311 points msoad | 1 comments | 31 Mar 23 20:37 UTC | HN request time: 0.001s | source

Show context

cubefox ◴[31 Mar 23 21:35 UTC] No.35393976[source]▶

I don't understand. I thought each parameter was 16 bit (two bytes) which would predict minimally 60GB of RAM for a 30 billion parameter model. Not 6GB.

replies(2): >>35394470 #>>35394590 #

1. gamegoblin ◴[31 Mar 23 22:28 UTC] No.35394590[source]▶

>>35393976 #

Parameters have been quantized down to 4 bits per parameter, and not all parameters are needed at the same time.

↑

Llama.cpp 30B runs with only 6GB of RAM now