Llama.cpp 30B runs with only 6GB of RAM now

1. yodsanklai ◴[31 Mar 23 22:48 UTC] No.35394809[source]▶

Total noob questions.

1. How does this compare with ChatGPT3

2. Does it mean we could eventually run a system such as ChatGPT3 on a computer

3. Could LLM eventually replace Google (in the sense that answers could be correct 99.9% of the time) or is the tech inherently flawed

replies(4): >>35394861 #>>35394958 #>>35395176 #>>35396372 #

2. addisonl ◴[31 Mar 23 22:53 UTC] No.35394861[source]▶

>>35394809 (TP) #

Minor correction, chatGPT uses GPT-3.5 and (most recently, if you pay $20/month) GPT-4. Their branding definitely needs some work haha. We are in track for you to be able to run something like chatGPT locally!

3. nmca ◴[31 Mar 23 23:04 UTC] No.35394958[source]▶

>>35394809 (TP) #

worse, yes, yes, no

4. retrac ◴[31 Mar 23 23:28 UTC] No.35395176[source]▶

>>35394809 (TP) #

The largest LLaMA model at ~60 billion parameters is not quite as large as ChatGPT 3 in size, and probably not quite as well trained, but it's basically in the same class. Even the complete, not quantitized model, can be run with llama.cpp on ARM64 and x86_64 CPUs already, assuming you have enough RAM (128 GB?).

5. simonw ◴[01 Apr 23 02:16 UTC] No.35396372[source]▶

>>35394809 (TP) #

"Could LLM eventually replace Google"

If you try to use LLMs as a Google replacement you're going to run into problems pretty quick.

LLMs are better thought of as "calculators for words" - retrieval of facts is a by-product of how they are trained, but it's not their core competence at all.

LLaMA at 4bit on my laptop is around 3.9GB. There's no way you could compress all of human knowledge into less than 4GB of space. Even ChatGPT / GPT-4, though much bigger, couldn't possible contain all of the information that you might want them to contain.

https://www.newyorker.com/tech/annals-of-technology/chatgpt-... "ChatGPT Is a Blurry JPEG of the Web" is a neat way of thinking about that.

But... it turns out you don't actually need a single LLM that contains all knowledge. What's much more interesting is a smaller LLM that has the ability to run tools - such as executing searches against larger indexes of data. That's what Bing and Google Bard do already, and it's a pattern we can implement ourselves pretty easily: https://til.simonwillison.net/llms/python-react-pattern

The thing that excites me is the idea of having a 4GB (or 8GB or 16GB even) model on my own computer that has enough capabilities that it can operate as a personal agent, running searches, executing calculations and generally doing really useful stuff despite not containing a great deal of detailed knowledge about the world at all.

replies(1): >>35400640 #

6. addandsubtract ◴[01 Apr 23 14:41 UTC] No.35400640[source]▶

>>35396372 #

I just need an LLM that can search, retrieve, and condense information on reddit, stackoverflow, and wikipedia to a given query.

replies(1): >>35453824 #

7. roguas ◴[05 Apr 23 13:30 UTC] No.35453824{3}[source]▶

>>35400640 #

depending on the max tokens, I think you can pretty easily fine-tune a model to return answers with actions required then wrap your prompt app to react to those, paste answers and "reask/reprompt" the same question...

similar stuff is being research under "langchains" term