Most active commenters

kmeisthax(5)
terafo(3)

Popular/hot comments

>>35393979 #

←back to thread

Llama.cpp 30B runs with only 6GB of RAM now

(github.com)

Show context

detrites ◴[31 Mar 23 20:58 UTC] No.35393558[source]▶

>>35393284 (OP) #

The pace of collaborative OSS development on these projects is amazing, but the rate of optimisations being achieved is almost unbelievable. What has everyone been doing wrong all these years cough sorry, I mean to say weeks?

Ok I answered my own question.

replies(5): >>35393627 #>>35393885 #>>35393921 #>>35394786 #>>35397029 #

1. kmeisthax ◴[31 Mar 23 21:28 UTC] No.35393921[source]▶

>>35393558 #

>What has everyone been doing wrong all these years

So it's important to note that all of these improvements are the kinds of things that are cheap to run on a pretrained model. And all of the developments involving large language models recently have been the product of hundreds of thousands of dollars in rented compute time. Once you start putting six digits on a pile of model weights, that becomes a capital cost that the business either needs to recuperate or turn into a competitive advantage. So everyone who scales up to this point doesn't release model weights.

The model in question - LLaMA - isn't even a public model. It leaked and people copied[0] it. But because such a large model leaked, now people can actually work on iterative improvements again.

Unfortunately we don't really have a way for the FOSS community to pool together that much money to buy compute from cloud providers. Contributions-in-kind through distributed computing (e.g. a "GPT@home" project) would require significant changes to training methodology[1]. Further compounding this, the state-of-the-art is actually kind of a trade secret now. Exact training code isn't always available, and OpenAI has even gone so far as to refuse to say anything about GPT-4's architecture or training set to prevent open replication.

[0] I'm avoiding the use of the verb "stole" here, not just because I support filesharing, but because copyright law likely does not protect AI model weights alone.

[1] AI training has very high minimum requirements to get in the door. If your GPU has 12GB of VRAM and your model and gradients require 13GB, you can't train the model. CPUs don't have this limitation but they are ridiculously inefficient for any training task. There are techniques like ZeRO to give pagefile-like state partitioning to GPU training, but that requires additional engineering.

replies(7): >>35393979 #>>35394466 #>>35395609 #>>35396273 #>>35400202 #>>35400942 #>>35573426 #

2. terafo ◴[31 Mar 23 21:35 UTC] No.35393979[source]▶

>>35393921 (TP) #

AI training has very high minimum requirements to get in the door. If your GPU has 12GB of VRAM and your model and gradients require 13GB, you can't train the model. CPUs don't have this limitation but they are ridiculously inefficient for any training task. There are techniques like ZeRO to give pagefile-like state partitioning to GPU training, but that requires additional engineering.

You can't if you have one 12gb gpu. You can if you have couple of dozens. And then petals-style training is possible. It is all very very new and there are many unsolved hurdles, but I think it can be done.

replies(3): >>35394356 #>>35394585 #>>35395800 #

3. dplavery92 ◴[31 Mar 23 22:09 UTC] No.35394356[source]▶

>>35393979 #

Sure, but when one 12gb GPU costs ~$800 new (e.g. for the 3080 LHR), "a couple of dozens" of them is a big barrier to entry to the hobbyist, student, or freelancer. And cloud computing offers an alternative route, but, as stated, distribution introduces a new engineering task, and the month-to-month bills for the compute nodes you are using can still add up surprisingly quickly.

replies(1): >>35394546 #

4. seydor ◴[31 Mar 23 22:18 UTC] No.35394466[source]▶

>>35393921 (TP) #

> we don't really have a way for the FOSS community to pool together that much money

There must be open source projects with enough money to pool into such a project. I wonder whether wikimedia or apache are considering anything.

replies(2): >>35395484 #>>35397575 #

5. terafo ◴[31 Mar 23 22:23 UTC] No.35394546{3}[source]▶

>>35394356 #

We are talking groups, not individuals. I think it is quite possible for couple of hundreds of people to cooperate and train something at least as big as LLaMa 7B in a week or two.

6. webnrrd2k ◴[31 Mar 23 22:27 UTC] No.35394585[source]▶

>>35393979 #

Maybe a good candidate for the SETI@home treatment?

replies(1): >>35394635 #

7. terafo ◴[31 Mar 23 22:31 UTC] No.35394635{3}[source]▶

>>35394585 #

It is a good candidate. Tech is good 6-18 months away, though.

replies(1): >>35395229 #

8. nullsense ◴[31 Mar 23 23:35 UTC] No.35395229{4}[source]▶

>>35394635 #

How much faster can we develop the tech if we leverage GPT-4 to do it?

9. sceadu ◴[01 Apr 23 00:06 UTC] No.35395484[source]▶

>>35394466 #

Maybe we can repurpose the SETI@home infrastructure :)

replies(1): >>35396801 #

10. Tryk ◴[01 Apr 23 00:21 UTC] No.35395609[source]▶

>>35393921 (TP) #

>Unfortunately we don't really have a way for the FOSS community to pool together that much money to buy compute from cloud providers.

How so? Why couldn't we just start a gofundme/kickstarter to fund the training of an open-source model?

replies(1): >>35396185 #

11. 3np ◴[01 Apr 23 00:47 UTC] No.35395800[source]▶

>>35393979 #

One thing I don't understand: If it's possible to chunk and parallelize it, is it not relatively straightforward to do these chunks sequentially on a single GPU with a roughly linear increase in runtime? Or are the parallelized computations actually interdependent and involving message-passing, making this unfeasible?

replies(1): >>35396618 #

12. hackernewds ◴[01 Apr 23 01:43 UTC] No.35396185[source]▶

>>35395609 #

who will be entrusted to benefit from this? recall that OpenAI did begin as an open source project. and that they chose to go the capitalist route, despite initially explicitly stating on their site they were a non-profit

replies(1): >>35396963 #

13. chii ◴[01 Apr 23 01:59 UTC] No.35396273[source]▶

>>35393921 (TP) #

> Exact training code isn't always available, and OpenAI has even gone so far as to refuse to say anything about GPT-4's architecture or training set to prevent open replication.

this is why i think the patent and copyright system is a failure. The idea that having laws protecting information like this would advance the progress of science.

It doesn't, because look how an illegally leaked model gets much more advances in shorter time. The laws protecting IP merely gives a moat to incumbents.

replies(2): >>35396877 #>>35400870 #

14. latency-guy2 ◴[01 Apr 23 02:57 UTC] No.35396618{3}[source]▶

>>35395800 #

Data moving back and forth from CPU to to bus to GPU and back again for however many chunked model parts you have would increase training time far beyond what you would be willing to invest, not to mention how inefficient and power intensive it is, far more power needed than doing just CPU only or GPU only training. Back to the time part - it's not linear at all. IMO its easily quadratic.

It's not unfeasible, in fact that's how things were done before lots of improvements to the various libraries in essence, many corps still have poorly built pipelines that spend a lot of time in CPU land and not enough in GPU land.

Just an FYI as well - intermediate outputs of models are used in quite a bit of ML, you may see them in some form being used for hyperparameter optimization and searching.

15. kmeisthax ◴[01 Apr 23 03:26 UTC] No.35396801{3}[source]▶

>>35395484 #

BOINC might be usable but the existing distributed training setups assume all nodes have very high speed I/O so they can trade gradients and model updates around quickly. The kind of setup that's feasible for BOINC is "here's a dataset shard, here's the last epoch, send me back gradients and I'll average them with the other ones I get to make the next epoch". This is quite a bit different from, say, the single-node case which is entirely serial and model updates happen every step rather than epoch.

16. hotpotamus ◴[01 Apr 23 03:40 UTC] No.35396877[source]▶

>>35396273 #

The idea of a patent was always a time-limited monopoly and in exchange you would reveal trade secrets that could presumably advance science. I think like many aspects of modernity, it's a bit outmoded these days, particularly in software, but that was the idea. Copyright was similar, but it did not last nearly as long as it does today in the original US incarnations.

17. kmeisthax ◴[01 Apr 23 03:59 UTC] No.35396963{3}[source]▶

>>35396185 #

OpenAI specifically cited scaling costs as a reason for why they switched their org structure from non-profit to "capped profit"[0].

You could potentially crowdfund this, though I should point out that this was already tried and Kickstarter shut it down. The effort in question, "Unstable Diffusion", was kinda sketchy, promising a model specifically tuned for NSFW work. What you'd want is an organization that's responsible, knows how to use state of the art model architectures, and at least is willing to try and stop generative porn.

Which just so happens to be Stability AI. Except they're funded as a for-profit on venture capital, not as a something you can donate to on Kickstarter or Patreon.

If they were to switch from investor subsidy to crowdfunding, however, I'm not entirely sure people would actually be lining up to bear the costs of training. To find out why we need to talk about motive. We can broadly subdivide the users of generative AI into a few categories:

- Companies, who view AI as a way to either juice stock prices by promising a permanent capitalist revolution that will abolish the creative working class. They do not care about ownership, they care about balancing profit and loss. Insamuch as they want AI models not controlled by OpenAI, it is a strategic play, not a moral one.

- Artists of varying degrees of competence who use generative AI to skip past creative busywork such as assembling references or to hack out something quickly. Insamuch as they have critiques of how AI is owned, it is specifically that they do not want to be abolished by capitalists using their own labor as ground meat for the linear algebra data blender. So they are unlikely to crowdfund the thing they are angry is going to put them out of a job.

- No-hopers and other creatively bankrupt individuals who have been sold a promise that AI is going to fix their lack of talent by making talent obsolete. This is, of course, a lie[2]. They absolutely would prefer a model unencumbered by filters on cloud servers or morality clauses in licensing agreements, but they do not have the capital in aggregate to fund such an endeavor.

- Free Software types that hate OpenAI's about-face on open AI. Oddly enough, they also have the same hangups artists do, because much of FOSS is based on copyleft/Share-Alike clauses in the GPL, which things like GitHub Copilot is not equipped to handle. On the other hand they probably would be OK with it if the model was trained on permissive sources and had some kind of regurgitation detector. Consider this one a wildcard.

- Evildoers. This could be people who want a cheaper version of GPT-4 that hasn't been Asimov'd by OpenAI so they can generate shittons of spam. Or people who want a Stable Diffusion model that's really good at making nonconsensual deepfake pornography so they can fuck with people's heads. This was the explicit demographic that "Unstable Diffusion" was trying to target. Problem is, cybercriminals tend to be fairly unsophisticated, because the people who actually know how to crime with impunity would rather make more money in legitimate business instead.

Out of five demographics I'm aware of, two have capital but no motive, two have motive but no capital, and one would have both - but they already have a sour taste in their mouth from the creep-tech vibes that AI gives off.

[0] In practice the only way that profit cap is being hit is if they upend the economy so much that it completely decimates all human labor, in which case they can just overthrow the government and start sending out Terminators to kill the working class[1].

[1] God damn it why do all the best novel ideas have to come by when I'm halfway through another fucking rewrite of my current one

[2] Getting generative AI to spit out good writing or art requires careful knowledge of the model's strengths and limitations. Like any good tool.

replies(1): >>35398659 #

18. totony ◴[01 Apr 23 06:03 UTC] No.35397575[source]▶

>>35394466 #

Or big cloud platform could give some compute for free, give back some of the profit they get from oss.

19. mx20 ◴[01 Apr 23 09:19 UTC] No.35398659{4}[source]▶

>>35396963 #

Maybe a lot of people/companies also don't want to give their data and knowledge to OpenAI, so that they can sell it off to the competition.

replies(1): >>35403868 #

20. JBorrow ◴[01 Apr 23 13:36 UTC] No.35400202[source]▶

>>35393921 (TP) #

$n00k of compute time is nothing, sorry. This is the kind of thing that academic institutions can give out for free…

replies(1): >>35400271 #

21. linesinthesand ◴[01 Apr 23 13:46 UTC] No.35400271[source]▶

>>35400202 #

why don't you write the check then huh

replies(1): >>35406593 #

22. breck ◴[01 Apr 23 15:07 UTC] No.35400870[source]▶

>>35396273 #

> The laws protecting IP merely gives a moat to incumbents.

Yes. These laws are bad. We could fix this with a 2 line change:

    Section 1. Article I, Section 8, Clause 8 of this Constitution is hereby repealed.
    Section 2. Congress shall make no law abridging the right of the people to publish information.

replies(1): >>35401622 #

23. barbariangrunge ◴[01 Apr 23 15:17 UTC] No.35400942[source]▶

>>35393921 (TP) #

Can the foss community really find nobody with the motivation to use their Bitcoin rig as a research machine? Or do you need even more specialized hardware than that?

replies(1): >>35401274 #

24. stolsvik ◴[01 Apr 23 15:56 UTC] No.35401274[source]▶

>>35400942 #

The bitcoin rig is specialised - it can only compute SHA hashes. You need more general compute.

25. kmeisthax ◴[01 Apr 23 16:35 UTC] No.35401622{3}[source]▶

>>35400870 #

Abolishing the copyright clause would not solve this problem because OpenAI is not leveraging copyright or patents. They're just not releasing anything.

To fix this, you'd need to ban trade secrecy entirely. As in, if you have some kind of invention or creative work you must publish sufficient information to replicate it "in a timely manner". This would be one of those absolutely insane schemes that only a villain in an Ayn Rand book would come up with.

replies(1): >>35404345 #

26. kmeisthax ◴[01 Apr 23 20:42 UTC] No.35403868{5}[source]▶

>>35398659 #

Yes, that's the "strategic" play I mentioned before.

This isn't really helpful for people who want open AI though, because if your strategy is to deny OpenAI data and knowledge then you aren't going to release any models either.

27. breck ◴[01 Apr 23 21:37 UTC] No.35404345{4}[source]▶

>>35401622 #

> Abolishing the copyright clause would not solve this problem because OpenAI is not leveraging copyright or patents. They're just not releasing anything.

The problem is how in the world is ChatGPT so good compared to the average human being? The answer is that human beings (except for the 1%), have their left hands tied behind their back because of copyright law.

28. ◴[02 Apr 23 02:40 UTC] No.35406593{3}[source]▶

>>35400271 #

29. Robotbeat ◴[14 Apr 23 19:04 UTC] No.35573426[source]▶

>>35393921 (TP) #

Training via CPU isn’t that bad if fully optimized with AVX512 extensions.

↑