(github.com)

1311 points msoad | 1 comments | 31 Mar 23 20:37 UTC | HN request time: 0.195s | source

1. dvt ◴[01 Apr 23 00:20 UTC] No.35395606[source]▶

This seems suspiciously like a bug (either in inference or in mmap reporting), as these models are not sparse enough for the savings to come from anywhere viable.

↑

Llama.cpp 30B runs with only 6GB of RAM now