GGML – AI at the Edge

If MeZO gets implemented, we are basically there: https://github.com/princeton-nlp/MeZO

replies(1): >>36216988 #

3. rvz ◴[06 Jun 23 17:45 UTC] No.36216465[source]▶

> ggml and llama.cpp are such a good platform for local LLMs, having some financial backing to support development is brilliant

The problem is, this financial backing and support is via VCs, who will steer the project to close it all up again.

> I want a local ChatGPT fine tuned on my personal data running on my own device, not in the cloud. Ideally open source too, llama.cpp is looking like the best bet to achieve that!

I think you are setting yourself up for disappointment in the future.

replies(3): >>36216838 #>>36217184 #>>36219154 #

4. behnamoh ◴[06 Jun 23 17:47 UTC] No.36216508[source]▶

I wonder if ClosedAI and other companies use the findings of the open source community in their products. For example, do they use QLORA to reduce the costs of training and inference? Do they quantize their models to serve non-subscribing consumers?

replies(2): >>36216688 #>>36217149 #

5. danielbln ◴[06 Jun 23 18:00 UTC] No.36216688[source]▶

>>36216508 #

Not disagreeing with your points, but saying "ClosedAI" is about as clever as writing M$ for Microsoft back in the day, which is to say not very.

replies(4): >>36216958 #>>36217145 #>>36218362 #>>36218979 #

6. ulchar ◴[06 Jun 23 18:10 UTC] No.36216838[source]▶

>>36216465 #

> The problem is, this financial backing and support is via VCs, who will steer the project to close it all up again.

How exactly could they meaningfully do that? Genuine question. The issue with the OpenAI business model is that the collaboration within academia and open source circles is creating innovations that are on track to out-pace the closed source approach. Does OpenAI have the pockets to buy the open source collaborators and researchers?

I'm truly cynical about many aspects of the tech industry but this is one of those fights that open source could win for the betterment of everybody.

replies(2): >>36217177 #>>36217454 #

7. loa_in_ ◴[06 Jun 23 18:19 UTC] No.36216958{3}[source]▶

I'd say saying M$ makes it harder for M$ to find out I'm talking about them in them in the indexed web because it's more ambiguous, that's all I need to know.

replies(1): >>36218186 #

8. moffkalast ◴[06 Jun 23 18:20 UTC] No.36216988[source]▶

>>36216377 #

Basically there, with what kind of VRAM and processing requirements? I doubt anyone running on a CPU can fine tune in a time frame that doesn't give them an obsolete model when they're done.

replies(1): >>36217136 #

9. nl ◴[06 Jun 23 18:31 UTC] No.36217136{3}[source]▶

>>36216988 #

According to the paper it fine tunes at the speed of inference (!!)

This would make fine tuning a qantized 13B model achievable in ~0.3 seconds per training example on a CPU.

replies(6): >>36217261 #>>36217324 #>>36217354 #>>36217827 #>>36218026 #>>36218841 #

10. rafark ◴[06 Jun 23 18:32 UTC] No.36217145{3}[source]▶

I think it’s ironic that M$ made ClosedAI.

replies(1): >>36218112 #

11. jmoss20 ◴[06 Jun 23 18:32 UTC] No.36217149[source]▶

>>36216508 #

Quantization is hardly a "finding of the open source community". (IIRC the first TPU was int8! Though the tradition is much older than that.)

12. maxilevi ◴[06 Jun 23 18:35 UTC] No.36217177{3}[source]▶

>>36216838 #

I agree with the spirit but saying that open source is on track to outpace OpenAI in innovation is just not true. Open source models are being compared to GPT3.5, none yet even get close to GPT4 quality and they finished that last year.

replies(1): >>36218569 #

13. jdonaldson ◴[06 Jun 23 18:35 UTC] No.36217184[source]▶

>>36216465 #

> I think you are setting yourself up for disappointment in the future.

Why would you say that?

replies(1): >>36237909 #

14. moffkalast ◴[06 Jun 23 18:42 UTC] No.36217261{4}[source]▶

Wow if that's true then it's genuinely a complete gamechanger for LLMs as a whole. You probably mean more like 0.3s per token, not per example, but that's still more like 1 or two minutes per training case, not like a day for 4 cases like it is now.

15. valval ◴[06 Jun 23 18:47 UTC] No.36217324{4}[source]▶

I think more importantly, what would the fine tuning routine look like? It's a non-trivial task to dump all of your personal data into any LLM architecture.

16. f_devd ◴[06 Jun 23 18:49 UTC] No.36217354{4}[source]▶

MeZO assumes a smooth parameter space, so you probably won't be able to do it with INT4/8 quantization, probably needs fp8 or smoother.

17. yyyk ◴[06 Jun 23 18:56 UTC] No.36217454{3}[source]▶

>>36216838 #

I've been going on and on about this in HN: Open source can win this fight, but I think OSS is overconfident. We need to be clear there are serious challenges ahead - ClosedAI and other corporations also have a plan, a plan that has good chances unless properly countered:

A) Embed OpenAI (etc.) API everywhere. Make embedding easy and trivial. First to gain a small API/install moat (user/dev: 'why install OSS model when OpenAI is already available with an OS API?'). If it's easy to use OpenAI but not open source they have an advantage. Second to gain brand. But more importantly:

B) Gain a technical moat by having a permanent data advantage using the existing install base (see above). Retune constantly to keep it.

C) Combine with existing propriety data stores to increase local data advantage (e.g. easy access for all your Office 365/GSuite documents, while OSS gets the scary permission prompts).

D) Combine with existing propriety moats to mutually reinforce.

E) Use selective copyright enforcement to increase data advantage.

F) Lobby legislators for limits that make competition (open or closed source) way harder.

TL;DR: OSS is probably catching up on algorithms. When it comes to good data and good integrations OSS is far behind and not yet catching up. It's been argued that OpenAI's entire performance advantage is due to having better data alone, and they intend to keep that advantage.

replies(1): >>36218897 #

18. SparkyMcUnicorn ◴[06 Jun 23 19:07 UTC] No.36217604[source]▶

Maybe I'm wrong, but I don't think you want it fine-tuned on your data.

Pretty sure you might be looking for this: https://github.com/SamurAIGPT/privateGPT

Fine-tuning is good for treating it how to act, but not great for reciting/recalling data.

replies(4): >>36219307 #>>36220595 #>>36226771 #>>36241658 #

19. isoprophlex ◴[06 Jun 23 19:22 UTC] No.36217827{4}[source]▶

If you go through the drudgery of integrating with all the existing channels (mail, Teams, discord, slack, traditional social media, texts, ...), such rapid finetuning speeds could enable an always up to date personality construct, modeled on you.

Which is my personal holy grail towards making myself unnecessary; it'd be amazing to be doing some light gardening while the bot handles my coworkers ;)

replies(2): >>36217987 #>>36221420 #

20. ignoramous ◴[06 Jun 23 19:24 UTC] No.36217847[source]▶

Can LLaMA be used for commerical purposes though (might limit external contributors)? I believe, FOSS alternatives like DataBricks Dolly / Together RedPajama / Eluether GPT NeoX (et al) is where the most progress is likely to be at.

replies(5): >>36217910 #>>36218688 #>>36219223 #>>36219290 #>>36219343 #

21. samwillis ◴[06 Jun 23 19:28 UTC] No.36217910[source]▶

Although llama.cpp started with the LLaMA model, it now supports many others.

22. ◴[06 Jun 23 19:34 UTC] No.36217987{5}[source]▶

>>36217827 #

23. gliptic ◴[06 Jun 23 19:37 UTC] No.36218026{4}[source]▶

I cannot find any such numbers in the paper. What the paper says is that MeZO converges much slower than SGD, and each step needs two forward passes.

"As a limitation, MeZO takes many steps in order to achieve strong performance."

24. replygirl ◴[06 Jun 23 19:44 UTC] No.36218112{4}[source]▶

>>36217145 #

Pedantic but that's not irony

replies(1): >>36220087 #

25. coolspot ◴[06 Jun 23 19:48 UTC] No.36218186{4}[source]▶

>>36216958 #

If we are talking about indexing, writing M$ is easier to find in an index because it is a such unique token. MS can mean many things (e.g. Miss), M$ is less ambiguous.

26. smoldesu ◴[06 Jun 23 20:01 UTC] No.36218362{3}[source]▶

Yeah, I think it feigns meaningful criticism. The "Sleepy Joe"-tier insults are ad-hominem enough that I don't try to respond.

27. jart ◴[06 Jun 23 20:19 UTC] No.36218569{4}[source]▶

>>36217177 #

We're basically surviving off the scraps companies like Facebook have been tossing off the table, like LLaMA. The fact that we're even allowed and able to use these things ourselves, at all, is a tremendous victory.

replies(1): >>36218687 #

28. maxilevi ◴[06 Jun 23 20:29 UTC] No.36218687{5}[source]▶

>>36218569 #

I agree

29. okhuman ◴[06 Jun 23 20:29 UTC] No.36218688[source]▶

This is a very good question that will be interesting how this develops. thanks for posting the alternatives list.

30. sp332 ◴[06 Jun 23 20:43 UTC] No.36218841{4}[source]▶

It's the same memory footprint as inference. It's not that fast, and the paper mentions some optimizations that could still be done.

replies(1): >>36220688 #

31. ljlolel ◴[06 Jun 23 20:47 UTC] No.36218897{4}[source]▶

>>36217454 #

Don’t forget chip shortages. That’s all centralized up through Nvidia, TSMC, and ASML

32. Miraste ◴[06 Jun 23 20:53 UTC] No.36218979{3}[source]▶

M$ is a silly way to call Microsoft greedy. ClosedAI is somewhat better because OpenAI's very name is a bald-faced lie, and they should be called on it. Are there more elegant ways to do that? Sure, but every time I see Altman in the news crying crocodile tears about the "dangers" of open anything I think we need all the forms of opposition we can find.

replies(1): >>36220202 #

33. ignoramous ◴[06 Jun 23 21:09 UTC] No.36219154[source]▶

>>36216465 #

> The problem is, this financial backing and support is via VCs, who will steer the project to close it all up again.

A matter of when, not if. I mean, the website itself makes that much clear:

  The ggml way
  
    ...
  
    Open Core

    The library and related projects are freely available under the MIT license... In the future we may choose to develop extensions that are licensed for commercial use
  
    Explore and have fun!

    ... Contributors are encouraged to try crazy ideas, build wild demos, and push the edge of what's possible

So, like many other "open core" devtools out there, they'd like to have their cake and eat it too. And they might just as well, like others before them.

Won't blame anyone here though; because clearly, if you're as good as Georgi Gerganov, why do it for free?

replies(1): >>36223453 #

34. detrites ◴[06 Jun 23 21:16 UTC] No.36219223[source]▶

May also be worth mentioning - UAE's Falcon, which apparently performs well (leads?). Falcon recently had its royalty-based commercial license modified to be fully open for free private and commercial use, via Apache 2.0: https://falconllm.tii.ae/

replies(1): >>36226198 #

35. chaxor ◴[06 Jun 23 21:22 UTC] No.36219290[source]▶

Why is commercial necessary to run local models?

replies(1): >>36219403 #

36. dr_dshiv ◴[06 Jun 23 21:24 UTC] No.36219307[source]▶

How does this work?

replies(2): >>36219423 #>>36220553 #

37. digitallyfree ◴[06 Jun 23 21:28 UTC] No.36219343[source]▶

https://github.com/openlm-research/open_llama

OpenLLAMA will be released soon and it's 100% compatible with the original LLAMA.

38. ignoramous ◴[06 Jun 23 21:36 UTC] No.36219403{3}[source]▶

>>36219290 #

It isn't, but such models may eventually lag behind the FOSS ones.

39. deet ◴[06 Jun 23 21:38 UTC] No.36219423{3}[source]▶

>>36219307 #

The parent is saying that "fine tuning", which has a specific meaning related to actually retraining the model itself (or layers at its surface) on a specialized set of data, is not what the GP is actually looking for.

An alternative method is to index content in a database and then insert contextual hints into the LLM's prompt that give it extra information and detail with which to respond with an answer on-the-fly.

That database can use semantic similarity (ie via a vector database), keyword search, or other ranking methods to decide what context to inject into the prompt.

PrivateGPT is doing this method, reading files, extracting their content, splitting the documents into small-enough-to-fit-into-prompt bits, and then indexing into a database. Then, at query time, it inserts context into the LLM prompt

The repo uses LangChain as boilerplate but it's pretty easily to do manually or with other frameworks.

(PS if anyone wants this type of local LLM + document Q/A and agents, it's something I'm working on as supported product integrated into macOS, and using ggml; see profile)

40. rafark ◴[06 Jun 23 22:40 UTC] No.36220087{5}[source]▶

>>36218112 #

Why do you think so? According to the dictionary, ironic could be something paradoxical or weird.

replies(1): >>36220713 #

41. tanseydavid ◴[06 Jun 23 22:53 UTC] No.36220202{4}[source]▶

>>36218979 #

It is a colloquial spelling and they earned it, a long time ago.

42. SparkyMcUnicorn ◴[06 Jun 23 23:29 UTC] No.36220553{3}[source]▶

>>36219307 #

deet already gave a comprehensive answer, but I'll add that the guts of privateGPT are pretty readable and only ~200 lines of code.

Core pieces: GPT4All (LLM interface/bindings), Chroma (vector store), HuggingFaceEmbeddings (for embeddings), and Langchain to tie everything together.

https://github.com/SamurAIGPT/privateGPT/blob/main/server/pr...

43. ◴[06 Jun 23 23:34 UTC] No.36220595[source]▶

44. nl ◴[06 Jun 23 23:43 UTC] No.36220688{5}[source]▶

>>36218841 #

Yes you are right.

I completely misread that!

45. nl ◴[06 Jun 23 23:46 UTC] No.36220713{6}[source]▶

>>36220087 #

Well it's not paradoxical?

If one is the kind of person who writes M$ then it's pretty much expected behaviour.

46. vgb2k18 ◴[07 Jun 23 01:05 UTC] No.36221420{5}[source]▶

>>36217827 #

> while the bot handles my coworkers

Or it handles their bots ;)

47. shostack ◴[07 Jun 23 02:12 UTC] No.36221973[source]▶

I've been trying to figure out what I might need to do in order to turn my Obsidian vault into a dataset to fine tune against. I'd invest a lot more into it now if I thought it would be a key to an AI learning about my the way it does in the movie Her.

replies(2): >>36222384 #>>36384485 #

48. 58x14 ◴[07 Jun 23 03:06 UTC] No.36222384[source]▶

>>36221973 #

I've been working on this for awhile now and I'd love to chat. I'll send you an email.

replies(1): >>36222595 #

49. legendofbrando ◴[07 Jun 23 03:48 UTC] No.36222595{3}[source]▶

>>36222384 #

I'm interested in this as well and have been exploring similarly. Would be super interesting to chat if you're up for it as well. Sending you an email to say hello.

50. ukuina ◴[07 Jun 23 06:12 UTC] No.36223453{3}[source]▶

>>36219154 #

Sounds like the SQLite model, which has been a net positive for the computing world.

51. mistercow ◴[07 Jun 23 12:56 UTC] No.36226198{3}[source]▶

>>36219223 #

Hugging Face has a demo of the 40B Falcon instruct model: https://huggingface.co/blog/falcon#demo

It’s pretty good as models of that size go, although it doesn’t take a lot of playing around with it to find that there’s still a good distance between it and ChatGPT 3.5.

(I do recommend editing the instructions before playing with it though; telling a model this size that it “always tells the truth” just seems to make it overconfident and stubborn)

52. gtirloni ◴[07 Jun 23 13:50 UTC] No.36226771[source]▶

> Fine-tuning is good for treating it how to act, but not great for reciting/recalling data.

What underlying process makes it this way? Is it because the prompt has heavier weight?

replies(2): >>36229475 #>>36242863 #

53. SparkyMcUnicorn ◴[07 Jun 23 16:47 UTC] No.36229475{3}[source]▶

>>36226771 #

I think your question is asking about the fundamentals of how an LLM works, which I'm not really qualified to answer. But I do have a general understanding of it all.

Fine-tuning is like having the model take a class on a certain subject. By the end of the class, it's going to have a general understanding on how to do that thing, but it's probably going to struggle when trying to quote the textbooks verbatim.

A good use-case for fine-tuning is teaching it a response style or format. If you fine-tune a model to only respond in JSON, then you no longer need to include formatting instructions in your prompt to get a JSON output.

54. rvz ◴[08 Jun 23 06:11 UTC] No.36237909{3}[source]▶

>>36217184 #

Never expect such promises to go your way, especially when VCs, angels, etc are able to control the project with their opaque terms sheet, which is why I am skeptical of this. Accepting VC, angel investment cash is no different to having another boss.

I am expecting such high expectations like that to end in disappointment for the 'community' since the interests will now be in the VCs to head for the exit. Their actions will speak more than what they are saying on the website.

55. SkyPuncher ◴[08 Jun 23 13:36 UTC] No.36241658[source]▶