Llama.cpp 30B runs with only 6GB of RAM now

(github.com)

1311 points msoad | 5 comments | 31 Mar 23 20:37 UTC | HN request time: 1.262s | source

Show context

jart ◴[31 Mar 23 21:02 UTC] No.35393615[source]▶

Author here. For additional context, please read https://github.com/ggerganov/llama.cpp/discussions/638#discu... The loading time performance has been a huge win for usability, and folks have been having the most wonderful reactions after using this change. But we don't have a compelling enough theory yet to explain the RAM usage miracle. So please don't get too excited just yet! Yes things are getting more awesome, but like all things in science a small amount of healthy skepticism is warranted.

replies(24): >>35393868 #>>35393942 #>>35394089 #>>35394097 #>>35394107 #>>35394203 #>>35394208 #>>35394244 #>>35394259 #>>35394288 #>>35394408 #>>35394881 #>>35395091 #>>35395249 #>>35395858 #>>35395995 #>>35397318 #>>35397499 #>>35398037 #>>35398083 #>>35398427 #>>35402974 #>>35403334 #>>35468946 #

1. thomastjeffery ◴[31 Mar 23 22:12 UTC] No.35394408[source]▶

>>35393615 #

How diverse is the training corpus?

replies(1): >>35394827 #

2. dchest ◴[31 Mar 23 22:50 UTC] No.35394827[source]▶

>>35394408 (TP) #

https://arxiv.org/abs/2302.13971

replies(1): >>35394950 #

3. thomastjeffery ◴[31 Mar 23 23:03 UTC] No.35394950[source]▶

>>35394827 #

Is there any measure, not of size or token amount, but of diversity in the content of the text?

Did that metric meaningfully change when the amount of required memory dropped?

If the amount of diversity is lowered, I would expect that to lower the amount of patterns to be modeled from the text. If that is the case, then the resulting model size itself would be lowered, during and after training.

replies(1): >>35395333 #

4. actually_a_dog ◴[31 Mar 23 23:46 UTC] No.35395333{3}[source]▶

>>35394950 #

By "diversity," do you mean something like "entropy?" Like maybe

    H_s(x) := -\sum_{x \in X_s} p(x) log(p(x))

where X_s := all s-grams from the training set? That seems like it would eventually become hard to impossible to actually compute. Even if you could what would it tell you?

Or, wait... are you referring to running such an analysis on the output of the model? Yeah, that might prove interesting....

replies(1): >>35415417 #

5. thomastjeffery ◴[02 Apr 23 21:26 UTC] No.35415417{4}[source]▶

>>35395333 #

I'm really just speculating here.

Because the text we write is not evenly distributed random noise, what we encode into it (by writing) is entropy.

Because LLMs model text with inference, they model all of the entropy that is present.

That would mean that the resulting size would be a measure of entropy (sum of patterns) divided by repetition (recurring patterns). In this count, I would consider each unique token alone an instance of the identity pattern.

So to answer both questions: yes.

↑