Llama.cpp 30B runs with only 6GB of RAM now

(github.com)

Show context

jart ◴[31 Mar 23 21:02 UTC] No.35393615[source]▶

Author here. For additional context, please read https://github.com/ggerganov/llama.cpp/discussions/638#discu... The loading time performance has been a huge win for usability, and folks have been having the most wonderful reactions after using this change. But we don't have a compelling enough theory yet to explain the RAM usage miracle. So please don't get too excited just yet! Yes things are getting more awesome, but like all things in science a small amount of healthy skepticism is warranted.

replies(24): >>35393868 #>>35393942 #>>35394089 #>>35394097 #>>35394107 #>>35394203 #>>35394208 #>>35394244 #>>35394259 #>>35394288 #>>35394408 #>>35394881 #>>35395091 #>>35395249 #>>35395858 #>>35395995 #>>35397318 #>>35397499 #>>35398037 #>>35398083 #>>35398427 #>>35402974 #>>35403334 #>>35468946 #

1. htrp ◴[31 Mar 23 22:00 UTC] No.35394244[source]▶

>>35393615 #

Just shows how inefficient some of the ML research code can be

replies(4): >>35394895 #>>35394991 #>>35395797 #>>35396415 #

2. robrenaud ◴[31 Mar 23 22:57 UTC] No.35394895[source]▶

>>35394244 (TP) #

Training tends to require a lot more precision and hence memory than inference. I bet many of the tricks here won't work well for training.

3. actually_a_dog ◴[31 Mar 23 23:07 UTC] No.35394991[source]▶

>>35394244 (TP) #

As a former grad student, I can tell you, that's all research code, not just ML, or even "performance-oriented" research code.

4. rvz ◴[01 Apr 23 00:47 UTC] No.35395797[source]▶

>>35394244 (TP) #

Exactly.

It also shows the number of impostors in this thread and inflated titles of self-proclaimed 'seniors' who can't optimize ML code to even be on the same league as Tunney (jart), and Gerganov (ggerganov).

Not even ChatGPT or Copilot could even submit a change or in-fact completely rewrite and optimize this code like they have done.

replies(1): >>35397055 #

5. alduin32 ◴[01 Apr 23 02:23 UTC] No.35396415[source]▶

>>35394244 (TP) #

For now we've just shown how measuring memory consumption can be tricky at times.

6. visarga ◴[01 Apr 23 04:14 UTC] No.35397055[source]▶

>>35395797 #

Remember this moment when you're about to criticise LLMs. People can act suboptimal too, even experts.

↑