←back to thread

198 points alexmrv | 3 comments | | HN request time: 0.632s | source

Hey HN! I built a proof-of-concept for AI memory using Git instead of vector databases.

The insight: Git already solved versioned document management. Why are we building complex vector stores when we could just use markdown files with Git's built-in diff/blame/history?

How it works:

Memories stored as markdown files in a Git repo Each conversation = one commit git diff shows how understanding evolves over time BM25 for search (no embeddings needed) LLMs generate search queries from conversation context Example: Ask "how has my project evolved?" and it uses git diff to show actual changes in understanding, not just similarity scores.

This is very much a PoC - rough edges everywhere, not production ready. But it's been working surprisingly well for personal use. The entire index for a year of conversations fits in ~100MB RAM with sub-second retrieval.

The cool part: You can git checkout to any point in time and see exactly what the AI knew then. Perfect reproducibility, human-readable storage, and you can manually edit memories if needed.

GitHub: https://github.com/Growth-Kinetics/DiffMem

Stack: Python, GitPython, rank-bm25, OpenRouter for LLM orchestration. MIT licensed.

Would love feedback on the approach. Is this crazy or clever? What am I missing that will bite me later?

1. mattnewton ◴[] No.44976171[source]
I recently wrote a short anecdote in a similar vein- in my testing, “agentic” retrieval where you simply pass an annotated list of files to an LLM and ask it which ones it wants to look at is probably better than traditional RAG for small datasets (few hundreds of docs).

I found it was both much simpler and more accurate at the cost of marginally more time and tokens, compared to RAG on embedded chunks with a vector store.

Shameless plug- https://www.matthewnewton.com/blog/replacing-rag

replies(1): >>44976535 #
2. jarirajari ◴[] No.44976535[source]
I was daunted by the amount of components that goes into a basic RAG. I also ended up starting with "agentic retrieval", which I invented. And it seems that many others have invented same/similar thing. It is just so much easier to start with something simple and improve later on.
replies(1): >>44976786 #
3. mattnewton ◴[] No.44976786[source]
Exactly, If it’s dumb and it works it isn’t dumb. (Just have to measure that it does work).

I think this is one of those cases where doing the simplest possible thing is suprisingly good and getting better as LLMs get better. I thought it would be a placeholder with all kinds of problems I would have to hurry to replace but it was surprisingly hard to beat on our most important metric of accuracy. And accuracy got better for “free” when the underlying model got better.