←back to thread

899 points georgehill | 3 comments | | HN request time: 0.209s | source
Show context
samwillis ◴[] No.36216196[source]
ggml and llama.cpp are such a good platform for local LLMs, having some financial backing to support development is brilliant. We should be concentrating as much as possible to do local inference (and training) based on privet data.

I want a local ChatGPT fine tuned on my personal data running on my own device, not in the cloud. Ideally open source too, llama.cpp is looking like the best bet to achieve that!

replies(6): >>36216377 #>>36216465 #>>36216508 #>>36217604 #>>36217847 #>>36221973 #
SparkyMcUnicorn ◴[] No.36217604[source]
Maybe I'm wrong, but I don't think you want it fine-tuned on your data.

Pretty sure you might be looking for this: https://github.com/SamurAIGPT/privateGPT

Fine-tuning is good for treating it how to act, but not great for reciting/recalling data.

replies(4): >>36219307 #>>36220595 #>>36226771 #>>36241658 #
1. dr_dshiv ◴[] No.36219307[source]
How does this work?
replies(2): >>36219423 #>>36220553 #
2. deet ◴[] No.36219423[source]
The parent is saying that "fine tuning", which has a specific meaning related to actually retraining the model itself (or layers at its surface) on a specialized set of data, is not what the GP is actually looking for.

An alternative method is to index content in a database and then insert contextual hints into the LLM's prompt that give it extra information and detail with which to respond with an answer on-the-fly.

That database can use semantic similarity (ie via a vector database), keyword search, or other ranking methods to decide what context to inject into the prompt.

PrivateGPT is doing this method, reading files, extracting their content, splitting the documents into small-enough-to-fit-into-prompt bits, and then indexing into a database. Then, at query time, it inserts context into the LLM prompt

The repo uses LangChain as boilerplate but it's pretty easily to do manually or with other frameworks.

(PS if anyone wants this type of local LLM + document Q/A and agents, it's something I'm working on as supported product integrated into macOS, and using ggml; see profile)

3. SparkyMcUnicorn ◴[] No.36220553[source]
deet already gave a comprehensive answer, but I'll add that the guts of privateGPT are pretty readable and only ~200 lines of code.

Core pieces: GPT4All (LLM interface/bindings), Chroma (vector store), HuggingFaceEmbeddings (for embeddings), and Langchain to tie everything together.

https://github.com/SamurAIGPT/privateGPT/blob/main/server/pr...