I ended up ditching the usual RAG+embedding route and built a local semantic engine that uses ΔS as a resonance constraint (yeah it sounds crazy, but hear me out).
Still uses local models (Ollama + gguf)
But instead of just vector search, it enforces semantic logic trees + memory drift tracking
Main gain: reduced hallucination in summarization + actual retention of reasoning across files
Weirdly, the thing that made it viable was getting a public endorsement from the guy who wrote tesseract.js (OCR legend). He called the engine’s reasoning “shockingly human-like” — not in benchmark terms, but in sustained thought flow.
Still polishing a few parts, but if you’ve ever hit the wall of “why is my LLM helpful but forgetful?”, this might be a route worth peeking into.
(Also happy to share the GitHub PDF if you’re curious — it’s more logic notes than launch page.)