(github.com)

1311 points msoad | 2 comments | 31 Mar 23 20:37 UTC | HN request time: 0.541s | source

1. dislikedtom2 ◴[31 Mar 23 23:32 UTC] No.35395213[source]▶

1. turn the swap off or monitor it closely 2. try to load a big model, like 65b-q4 or 30b-f16 3. observe the OOM - It's not so hard to test this.

replies(1): >>35400212 #

2. gg82 ◴[01 Apr 23 13:37 UTC] No.35400212[source]▶

Using a memory mapped file doesn't use swap. The memory is backed by the file that is memory mapped!

Llama.cpp 30B runs with only 6GB of RAM now