Llama.cpp 30B runs with only 6GB of RAM now

(github.com)

1311 points msoad | 1 comments | 31 Mar 23 20:37 UTC | HN request time: 0.208s | source

Does that only happen with the quantized model or also with the float16 / float32 model? Is there any reason to use float models at all?