(github.com)

1311 points msoad | 1 comments | 31 Mar 23 20:37 UTC | HN request time: 0.8s | source

Show context

abhimskywalker ◴[01 Apr 23 04:16 UTC] No.35397063[source]▶

"The recent change also means you can run multiple LLaMA ./main processes at the same time, and they'll all share the same memory resources." So this could have a main and multiple sub-worker llm processes possibly collaborating while sharing same memory footprint?

replies(1): >>35398555 #

1. l33tman ◴[01 Apr 23 08:52 UTC] No.35398555[source]▶

>>35397063 #

Yes, if the model is mmap'ed read-only (as I'm sure it is).

There are other bottlenecks than CPU cores though, it might not be very useful to run multiple in parallel..

↑

Llama.cpp 30B runs with only 6GB of RAM now