GGML – AI at the Edge

(ggml.ai)

899 points georgehill | 1 comments | 06 Jun 23 16:50 UTC | HN request time: 0.223s | source

Show context

samwillis ◴[06 Jun 23 17:28 UTC] No.36216196[source]▶

ggml and llama.cpp are such a good platform for local LLMs, having some financial backing to support development is brilliant. We should be concentrating as much as possible to do local inference (and training) based on privet data.

I want a local ChatGPT fine tuned on my personal data running on my own device, not in the cloud. Ideally open source too, llama.cpp is looking like the best bet to achieve that!

replies(6): >>36216377 #>>36216465 #>>36216508 #>>36217604 #>>36217847 #>>36221973 #

SparkyMcUnicorn ◴[06 Jun 23 19:07 UTC] No.36217604[source]▶

>>36216196 #

Maybe I'm wrong, but I don't think you want it fine-tuned on your data.

Pretty sure you might be looking for this: https://github.com/SamurAIGPT/privateGPT

Fine-tuning is good for treating it how to act, but not great for reciting/recalling data.

replies(4): >>36219307 #>>36220595 #>>36226771 #>>36241658 #

gtirloni ◴[07 Jun 23 13:50 UTC] No.36226771[source]▶

>>36217604 #

> Fine-tuning is good for treating it how to act, but not great for reciting/recalling data.

What underlying process makes it this way? Is it because the prompt has heavier weight?

replies(2): >>36229475 #>>36242863 #

1. bluepoint ◴[08 Jun 23 14:53 UTC] No.36242863[source]▶

>>36226771 #

I just read the paper about LORA. The main idea is that you write the weights of each neural network as

W = W0 + B A

Where W0 is the trained model’s weights, which are kept fixed, and A and B are matrices but with a much much lower rank than the originals (say r = 4).

It has been shown (as mentioned in the lora paper that training for specific tasks results in low rank corrections, so this is what it is all about. I think that doing LoRa can be done locally.

[1] https://github.com/microsoft/LoRA

↑