Anyone else hacking on long-horizon reasoning frameworks for LLMs?

I’ve been tinkering on a side project called Long Horizon LLM; basically an experiment in making LLMs reason in multi-step pipelines instead of just one-shot answers. Think: classify → plan → execute → critique → synthesize.

The backend is FastAPI + a blackboard-style orchestrator (DAG, QA loops, judges, memory), frontend is Next.js with a “sovereign” local model mode. It’s research-grade, messy around the edges, but I’m trying to explore how to get persistent reasoning runs, contradiction checks, and some lightweight control-theoretic feedback (budgets, hedging, backoff, etc.).

Lots of internals are done (judges, QA contracts, memory, concurrency limits). Missing pieces: no polished “end-to-end” run(query) API yet, composition layer is rough, observability is minimal (no real metrics), factuality judges are weak, and stability/homeostat stuff is mostly stubs.

Question: has anyone here tried building similar long-horizon / blackboard-style systems for LLM reasoning? Did you find a clean way to tie together plan → execution → synthesis without everything turning into spaghetti orchestration?