Llama.cpp 30B runs with only 6GB of RAM now

(github.com)

1311 points msoad | 4 comments | 31 Mar 23 20:37 UTC | HN request time: 0.001s | source

Show context

jart ◴[31 Mar 23 21:02 UTC] No.35393615[source]▶

Author here. For additional context, please read https://github.com/ggerganov/llama.cpp/discussions/638#discu... The loading time performance has been a huge win for usability, and folks have been having the most wonderful reactions after using this change. But we don't have a compelling enough theory yet to explain the RAM usage miracle. So please don't get too excited just yet! Yes things are getting more awesome, but like all things in science a small amount of healthy skepticism is warranted.

replies(24): >>35393868 #>>35393942 #>>35394089 #>>35394097 #>>35394107 #>>35394203 #>>35394208 #>>35394244 #>>35394259 #>>35394288 #>>35394408 #>>35394881 #>>35395091 #>>35395249 #>>35395858 #>>35395995 #>>35397318 #>>35397499 #>>35398037 #>>35398083 #>>35398427 #>>35402974 #>>35403334 #>>35468946 #

intelVISA ◴[31 Mar 23 22:03 UTC] No.35394288[source]▶

>>35393615 #

Didn't expect to see two titans today: ggerganov AND jart. Can ya'll slow down you make us mortals look bad :')

Seeing such clever use of mmap makes me dread to imagine how much Python spaghetti probably tanks OpenAI's and other "big ML" shops' infra when they should've trusted in zero copy solutions.

Perhaps SWE is dead after all, but LLMs didn't kill it...

replies(11): >>35395112 #>>35395145 #>>35395165 #>>35395404 #>>35396298 #>>35397484 #>>35398972 #>>35399367 #>>35400001 #>>35400090 #>>35456064 #

MontyCarloHall ◴[31 Mar 23 23:24 UTC] No.35395145[source]▶

>>35394288 #

>how much Python spaghetti probably tanks OpenAI's and other "big ML" shops' infra when they should've trusted in zero copy solutions

Probably not all that much. All of the Python numeric computing frameworks (Numpy, PyTorch, TensorFlow, etc.) are basically just wrappers for lower level C++/C/Fortran code. Unless you’re doing something boneheaded and converting framework-native tensors to Python objects, passing tensors around within a framework essentially just passes a pointer around, which has marginal overhead even when encapsulated in a bloated Python object.

Indeed, a huge number of PyTorch operations are explicitly zero copy: https://pytorch.org/docs/stable/tensor_view.html

replies(2): >>35396982 #>>35408170 #

oceanplexian ◴[01 Apr 23 04:02 UTC] No.35396982[source]▶

>>35395145 #

It’s not that the performance is the issue, it’s that it’s unmaintainable and prone to break. Exceptions aren’t handled right, dependencies are a disaster (Proprietary NVIDIA drivers+CUDA+PyTorch+ the various versions of stuff are a complete disaster)

This leads to all sorts of bugs and breaking changes that are cool in an academic or hobbyist setting but a total headache on a large production system.

replies(3): >>35397515 #>>35397551 #>>35398182 #

CurrentB ◴[01 Apr 23 05:59 UTC] No.35397551[source]▶

>>35396982 #

Yeah, I've been using python for the first time in a while to try out some of the llm stuff and I can't believe how bad the dependency hell is. It's probably particularly bad due to the pace of change in this field. But I spend an hour getting dependencies fixed every time I touch anything. 80% of the Google Collabs I find are just outright broken. I wish there were other viable non python options to try out these things.

replies(3): >>35398107 #>>35399048 #>>35424595 #

1. hunta2097 ◴[01 Apr 23 10:25 UTC] No.35399048{3}[source]▶

>>35397551 #

You're using virtual environments, right?

ML libraries are particularly bad, most other stuff works well.

Friends don't let friends install pip into /usr/lib.

replies(1): >>35406036 #

2. AnthonyMouse ◴[02 Apr 23 01:12 UTC] No.35406036[source]▶

>>35399048 (TP) #

This just goes to show what a mess this is.

Suppose you have a big piece of compute hardware (e.g. at a university) which is shared by multiple users. They all want to come in and play with these models. Each one is tens to hundreds of gigabytes. Is each user supposed to have their own copy in their home directory?

replies(1): >>35406562 #

3. Accujack ◴[02 Apr 23 02:35 UTC] No.35406562[source]▶

>>35406036 #

This is not exactly a new problem.

replies(1): >>35406640 #

4. AnthonyMouse ◴[02 Apr 23 02:46 UTC] No.35406640{3}[source]▶

>>35406562 #

That's kind of the point. We solved this problem decades ago. You have a system package manager that installs a system-wide copy of the package that everybody can use.

But now we encounter this broken nonsense because solved problems get unsolved by bad software.

↑