←back to thread

198 points alexmrv | 1 comments | | HN request time: 0.228s | source

Hey HN! I built a proof-of-concept for AI memory using Git instead of vector databases.

The insight: Git already solved versioned document management. Why are we building complex vector stores when we could just use markdown files with Git's built-in diff/blame/history?

How it works:

Memories stored as markdown files in a Git repo Each conversation = one commit git diff shows how understanding evolves over time BM25 for search (no embeddings needed) LLMs generate search queries from conversation context Example: Ask "how has my project evolved?" and it uses git diff to show actual changes in understanding, not just similarity scores.

This is very much a PoC - rough edges everywhere, not production ready. But it's been working surprisingly well for personal use. The entire index for a year of conversations fits in ~100MB RAM with sub-second retrieval.

The cool part: You can git checkout to any point in time and see exactly what the AI knew then. Perfect reproducibility, human-readable storage, and you can manually edit memories if needed.

GitHub: https://github.com/Growth-Kinetics/DiffMem

Stack: Python, GitPython, rank-bm25, OpenRouter for LLM orchestration. MIT licensed.

Would love feedback on the approach. Is this crazy or clever? What am I missing that will bite me later?

1. bob1029 ◴[] No.44970308[source]
I think BM25 is the right path for most use cases. I'd reach for a single lucene index here, making the commit hash one of the document's indexed fields. This would make FTS across history nearly instant. I'd store it in an ignored folder right next to .git.

For code search only, BM25 might be a bit overkill and not exactly what you want. FM indexes would be a simpler and faster way to implement pure substring search.

Maybe having both kinds of search at the same time could work better than either in isolation. You could frame them as "semantic" and "exact" search from the perspective of the LLM tool calls. The prompt could then say things like "for searching the codebase use FunctionA, for searching requirements or issues, use FunctionB."