←back to thread

Building a Personal AI Factory

(www.john-rush.com)
260 points derek | 9 comments | | HN request time: 0.423s | source | bottom
Show context
steveklabnik ◴[] No.44438330[source]
I'd love to see more specifics here, that is, how Claude and o3 talk to each other, an example session, etc.
replies(3): >>44438361 #>>44438366 #>>44438730 #
1. schmookeeg ◴[] No.44438730[source]
I use Zen MCP and OpenRouter. Every once in awhile, my instance of claude code will "phone a friend" and use Gemini for a code review. Often unprompted, sometimes me asking for "analysis" or "ultrathink" about a thorny feature when I doubt the proposed implementation will work out or cause footguns.

It's wild to see in action when it's unprompted.

For planning, I usually do a trip out to Gemini to check our work, offer ideas, research, and ratings of completeness. The iterations seem to be helpful, at least to me.

Everyone in these sorta threads asks for "proofs" and I don't really know what to offer. It's like 4 cents for a second opinion on what claude's planning has cooked up, and the detailed response has been interesting.

I loaded 10 bucks onto OpenRouter last month and I think I've pulled it down by like 50 cents. Meanwhile I'm on Claude Max @ $200/mo and GPT Plus for another $20. The OpenRouter stuff seems like less than couch change.

$0.02 :D

replies(3): >>44438830 #>>44438942 #>>44439781 #
2. conradev ◴[] No.44438830[source]
proof -> show the code if you can!

Then engineers can judge for themselves

replies(1): >>44438882 #
3. schmookeeg ◴[] No.44438882[source]
Yeahhhhhh I've been to enough code reviews / PR reviews to know this will result in 100 opinions about what color the drapes should be and what a catastrophe we've vibe coded for ourselves. If I shoot something to GH I'll highlight it for others, but nothing yet. I can appreciate this makes me look like I'm shilling.

It makes usable code for my projects. It often gets into the weeds and makes weird tesseracts of nonsense that I need to discover, tear down, and re-prompt it to not do that again.

It's cheap or free to try. It saves me time, particularly in languages I am not used to daily driving. Funnily enough, I get madder when I have it write ts/py/sql code since I'm most conversant in those, but for fringe stuff that I find tedious like AWS config and tests -- it mostly just works.

Will it rot my brain? Maybe? If this thing turns me from an engineer to a PM, well, I'll have nobody to blame but myself as I irritate other engineers and demand they fibonacci-size underdefined jira tix. :D

I think there's going to be a lot of momentum in this direction in the coming year. I'm fortunate that my clients embrace this stuff and we all look for the same hallucinations in the codebase and shut them down and laugh together, but I worry that I'm not exactly justifying my rate by being an LLM babysitter.

4. steveklabnik ◴[] No.44438942[source]
It’s not about proof: it’s that at this point I’m a fairly heavy Claude Code user and I’d like to up my game, but I’m also not so up on many of these details that I can just figure out how to give this a try just from the description of it. I’m already doing plan-up-front workflows with just Claude, but haven’t figured out some of this more advanced stuff.

I have two MCPs installed (playwright and context7) but it never seems like Claude decides to reach for them on its own.

I definitely appreciate why you’re not posting code, as you said in another comment.

replies(1): >>44440099 #
5. Uehreka ◴[] No.44439781[source]
> Everyone in these sorta threads asks for "proofs" and I don't really know what to offer

I’ve tried building these kinds of multi agent systems a couple times, and I’ve found that there’s a razor thin edge between a nice “humming along” system I feel good about and a “car won’t start” system where the first LLM refuses to properly output JSON and then the rest of them start reading each others <think> thoughts.

The difference seems to often come down to:

- Which LLM wrappers are you using? Are they using/exposing features like MCP, tools and chain-of-thought correctly for the particular models you’re using?

- What are your prompts? What are the 5 bullet points with capital letters that need to be in there to keep things in line? Is there a trick to getting certain LLMs to actually use the available MCP tools?

- Which particular LLM versions are you using? I’ve heard people say that Claude Sonnet 4 is actually better than Claude Opus 4 sometimes, so it’s not always an intuitive “pick the best model” kind of thing.

- Is your system capable of “humming along” for hours or is this a thing where you’re doing a ton of copy-paste between interfaces? If it’s the latter then hey, whatever works for you works for you. But a lot of people see the former as a difficult-to-attain Holy Grail, so if you’ve figured out the exact mixture of prompts/tools that makes that happen people are gonna want to know the details.

The overall wisdom in the post about inputs mattering more than outputs etc is totally spot on, and anyone who hasn’t figured that out yet should master that before getting into these weeds. But for those of us who are on that level, we’d love to know more about exactly what you’re getting out of this and how you’re doing it.

(And thanks for the details you’ve provided so far! I’ll have to check out Zen MCP)

6. Aeolun ◴[] No.44440099[source]
> I have two MCPs installed (playwright and context7) but it never seems like Claude decides to reach for them on its own.

Not even when you add ‘memories’ that tell it to always use those tools in certain situations?

My admonitions to always run repomix at the start of coding, and always run the build command before crying victory seem to be followed pretty well anyway.

replies(2): >>44440327 #>>44440488 #
7. steveklabnik ◴[] No.44440327{3}[source]
I have not done that, maybe that's the missing bit. Thanks!
8. manmal ◴[] No.44440488{3}[source]
What do you tell Claude to do with repomix? Get an overview into the context?
replies(1): >>44451220 #
9. Aeolun ◴[] No.44451220{4}[source]
Yeah, it’s just a shortcut to it exploring the code for half an hour before doing something. At least it seems to make its searching more targeted.