Llama.cpp 30B runs with only 6GB of RAM now

(github.com)

1311 points msoad | 1 comments | 31 Mar 23 20:37 UTC | HN request time: 0.21s | source

Show context

detrites ◴[31 Mar 23 20:58 UTC] No.35393558[source]▶

The pace of collaborative OSS development on these projects is amazing, but the rate of optimisations being achieved is almost unbelievable. What has everyone been doing wrong all these years cough sorry, I mean to say weeks?

Ok I answered my own question.

replies(5): >>35393627 #>>35393885 #>>35393921 #>>35394786 #>>35397029 #

kmeisthax ◴[31 Mar 23 21:28 UTC] No.35393921[source]▶

>>35393558 #

>What has everyone been doing wrong all these years

So it's important to note that all of these improvements are the kinds of things that are cheap to run on a pretrained model. And all of the developments involving large language models recently have been the product of hundreds of thousands of dollars in rented compute time. Once you start putting six digits on a pile of model weights, that becomes a capital cost that the business either needs to recuperate or turn into a competitive advantage. So everyone who scales up to this point doesn't release model weights.

The model in question - LLaMA - isn't even a public model. It leaked and people copied[0] it. But because such a large model leaked, now people can actually work on iterative improvements again.

Unfortunately we don't really have a way for the FOSS community to pool together that much money to buy compute from cloud providers. Contributions-in-kind through distributed computing (e.g. a "GPT@home" project) would require significant changes to training methodology[1]. Further compounding this, the state-of-the-art is actually kind of a trade secret now. Exact training code isn't always available, and OpenAI has even gone so far as to refuse to say anything about GPT-4's architecture or training set to prevent open replication.

[0] I'm avoiding the use of the verb "stole" here, not just because I support filesharing, but because copyright law likely does not protect AI model weights alone.

[1] AI training has very high minimum requirements to get in the door. If your GPU has 12GB of VRAM and your model and gradients require 13GB, you can't train the model. CPUs don't have this limitation but they are ridiculously inefficient for any training task. There are techniques like ZeRO to give pagefile-like state partitioning to GPU training, but that requires additional engineering.

replies(7): >>35393979 #>>35394466 #>>35395609 #>>35396273 #>>35400202 #>>35400942 #>>35573426 #

Tryk ◴[01 Apr 23 00:21 UTC] No.35395609[source]▶

>>35393921 #

>Unfortunately we don't really have a way for the FOSS community to pool together that much money to buy compute from cloud providers.

How so? Why couldn't we just start a gofundme/kickstarter to fund the training of an open-source model?

replies(1): >>35396185 #

hackernewds ◴[01 Apr 23 01:43 UTC] No.35396185[source]▶

>>35395609 #

who will be entrusted to benefit from this? recall that OpenAI did begin as an open source project. and that they chose to go the capitalist route, despite initially explicitly stating on their site they were a non-profit

replies(1): >>35396963 #

kmeisthax ◴[01 Apr 23 03:59 UTC] No.35396963[source]▶

>>35396185 #

OpenAI specifically cited scaling costs as a reason for why they switched their org structure from non-profit to "capped profit"[0].

You could potentially crowdfund this, though I should point out that this was already tried and Kickstarter shut it down. The effort in question, "Unstable Diffusion", was kinda sketchy, promising a model specifically tuned for NSFW work. What you'd want is an organization that's responsible, knows how to use state of the art model architectures, and at least is willing to try and stop generative porn.

Which just so happens to be Stability AI. Except they're funded as a for-profit on venture capital, not as a something you can donate to on Kickstarter or Patreon.

If they were to switch from investor subsidy to crowdfunding, however, I'm not entirely sure people would actually be lining up to bear the costs of training. To find out why we need to talk about motive. We can broadly subdivide the users of generative AI into a few categories:

- Companies, who view AI as a way to either juice stock prices by promising a permanent capitalist revolution that will abolish the creative working class. They do not care about ownership, they care about balancing profit and loss. Insamuch as they want AI models not controlled by OpenAI, it is a strategic play, not a moral one.

- Artists of varying degrees of competence who use generative AI to skip past creative busywork such as assembling references or to hack out something quickly. Insamuch as they have critiques of how AI is owned, it is specifically that they do not want to be abolished by capitalists using their own labor as ground meat for the linear algebra data blender. So they are unlikely to crowdfund the thing they are angry is going to put them out of a job.

- No-hopers and other creatively bankrupt individuals who have been sold a promise that AI is going to fix their lack of talent by making talent obsolete. This is, of course, a lie[2]. They absolutely would prefer a model unencumbered by filters on cloud servers or morality clauses in licensing agreements, but they do not have the capital in aggregate to fund such an endeavor.

- Free Software types that hate OpenAI's about-face on open AI. Oddly enough, they also have the same hangups artists do, because much of FOSS is based on copyleft/Share-Alike clauses in the GPL, which things like GitHub Copilot is not equipped to handle. On the other hand they probably would be OK with it if the model was trained on permissive sources and had some kind of regurgitation detector. Consider this one a wildcard.

- Evildoers. This could be people who want a cheaper version of GPT-4 that hasn't been Asimov'd by OpenAI so they can generate shittons of spam. Or people who want a Stable Diffusion model that's really good at making nonconsensual deepfake pornography so they can fuck with people's heads. This was the explicit demographic that "Unstable Diffusion" was trying to target. Problem is, cybercriminals tend to be fairly unsophisticated, because the people who actually know how to crime with impunity would rather make more money in legitimate business instead.

Out of five demographics I'm aware of, two have capital but no motive, two have motive but no capital, and one would have both - but they already have a sour taste in their mouth from the creep-tech vibes that AI gives off.

[0] In practice the only way that profit cap is being hit is if they upend the economy so much that it completely decimates all human labor, in which case they can just overthrow the government and start sending out Terminators to kill the working class[1].

[1] God damn it why do all the best novel ideas have to come by when I'm halfway through another fucking rewrite of my current one

[2] Getting generative AI to spit out good writing or art requires careful knowledge of the model's strengths and limitations. Like any good tool.

replies(1): >>35398659 #

mx20 ◴[01 Apr 23 09:19 UTC] No.35398659[source]▶

>>35396963 #

Maybe a lot of people/companies also don't want to give their data and knowledge to OpenAI, so that they can sell it off to the competition.

replies(1): >>35403868 #

1. kmeisthax ◴[01 Apr 23 20:42 UTC] No.35403868[source]▶

>>35398659 #

Yes, that's the "strategic" play I mentioned before.

This isn't really helpful for people who want open AI though, because if your strategy is to deny OpenAI data and knowledge then you aren't going to release any models either.

↑