←back to thread

198 points alexmrv | 1 comments | | HN request time: 0.202s | source

Hey HN! I built a proof-of-concept for AI memory using Git instead of vector databases.

The insight: Git already solved versioned document management. Why are we building complex vector stores when we could just use markdown files with Git's built-in diff/blame/history?

How it works:

Memories stored as markdown files in a Git repo Each conversation = one commit git diff shows how understanding evolves over time BM25 for search (no embeddings needed) LLMs generate search queries from conversation context Example: Ask "how has my project evolved?" and it uses git diff to show actual changes in understanding, not just similarity scores.

This is very much a PoC - rough edges everywhere, not production ready. But it's been working surprisingly well for personal use. The entire index for a year of conversations fits in ~100MB RAM with sub-second retrieval.

The cool part: You can git checkout to any point in time and see exactly what the AI knew then. Perfect reproducibility, human-readable storage, and you can manually edit memories if needed.

GitHub: https://github.com/Growth-Kinetics/DiffMem

Stack: Python, GitPython, rank-bm25, OpenRouter for LLM orchestration. MIT licensed.

Would love feedback on the approach. Is this crazy or clever? What am I missing that will bite me later?

Show context
BenoitP ◴[] No.44970042[source]
I'm failing to grasp how it solves/replaces what vector db were created for in the first place (high-dimensional neighborhood searching, where the space to be searched grows by distance^dimension)
replies(4): >>44970063 #>>44970064 #>>44970076 #>>44970631 #
alexmrv ◴[] No.44970063[source]
Super simplistic example, but say i mention my Daughter, who is 9.

Then mention she is 10,

a few years later she is 12 but now i call her by her name.

I have struggled to get any of the RAG approaches to handle this effectively. It is also 3 entries, but 2 of them are no longer useful, they are nothing but noise in the system.

replies(5): >>44970126 #>>44970254 #>>44970511 #>>44970533 #>>44974081 #
johnisgood ◴[] No.44970126[source]
I do not necessarily think it is noise, similar to how not all history is noise.
replies(1): >>44970744 #
1. jaktet ◴[] No.44970744[source]
That’s a pretty good analogy though, noise is just information that isn’t immediately useful for the current task

If I need to know the current age I don’t need to know the past ages of someone