Building a Personal AI Factory

I am experimenting with a similar workflow and thought I'd share my experience.

I might be a little too hung up on the details compared to a lot of these agent cluster testimonials I've read, but unlike the author I'll be open and say that the codebase I work on is several hundred thousand lines of Go and currently does serve a high 5 to low 6 figure number of real, B2C users. Performance requirements are forgiving but correctness and reliability are very important. Finance.

Currently I use a very basic setup of scripts that clone a repo, configure an agent, and then run it against a prompt in a tmux session. I rely mainly on codex-cli since I am only given an OpenAI key to work with. The codex instances ping me in my system notifications when it's my turn, and I can easily quake-mode my terminal into view and then attach to the session (with a bit of help from fzf). I haven't gotten into MCP yet but it's on my radar.

I can sort of see the vision. For those small but distracting tasks, they are very helpful and I (mostly) passively produce a lot more small PRs to clean up papercuts around our codebase now. The "cattle not pets" mentality remains relevant - I just fire off a quick prompt when I feel the urge to get sidetracked on something minor.

I haven't gotten as much out of them for more involved tasks. Maybe I haven't really got enough of a context flywheel going yet, but I do typically have to intervene most of the time. Even on a working change, I always read the generated code first and make any edits for taste before submitting it for code review since I still view the output as my complete responsibility.

I still mostly micromanage the change control process too (branching, making commits, and pushing). I've dabbled in tools that can automate this but haven't gotten around to it.

I 100% resonate with the "fix the inputs, not the outputs" mindset as well. It's incredibly powerful without AI and our industry has been slowly but surely adopting it in more places (static typing, devops, IAC, etc). With nondeterministic processes like LLMs though it feels a lot harder to achieve, more like practice and not science.