←back to thread

97 points jay-baleine | 1 comments | | HN request time: 0s | source
Show context
CuriouslyC ◴[] No.45149068[source]
The most important thing is to have a strong plan cycle in front of you agent work, if you do that, agents are very reliable. You need to have a deep research cycle that basically collects a covering set of code that might need to be modified for a feature, feeds it into gemini/gpt5 to get a broad codebase level understanding, then has a debate cycle on how to address it, with the final artifact being a hyper detailed plan that goes file by file and provides an outline of changes required.

Beyond this, you need to maintain good test coverage, and you need to have agents red-team your tests aggressively to make sure they're robust.

If you implement these two steps your agent performance will skyrocket. The planning phase will produce plans that claude can iterate on for 3+ hours in some cases, if you tell it to complete the entire task in one shot, and the robust test validation / change set analysis will catch agents solving an easier problem because they got frustrated or not following directions.

replies(2): >>45149253 #>>45156622 #
rapind ◴[] No.45156622[source]
> The planning phase will produce plans that claude can iterate on for 3+ hours in some cases, if you tell it to complete the entire task in one shot, and the robust test validation / change set analysis will catch agents solving an easier problem because they got frustrated or not following directions.

Don't you run into context nightmares though? I was coming up with very detailed plans (using zen to vet with other models), but I found claude just doing the wrong thing a lot of the time, ignoring and / or forgetting very specific instructions and rules, especially across context compactions.

There's this one time that really sticks out in my mind because I had to constantly correct it; when to use ->> versus -> and handle null / type checks with PostgreSQL JSONB. Vibe coders would miss this sort of thing with testing unless they knew that JSONB null is not the same as SQL NULL (and other types too). When working with nested data, you probably won't have test coverage for it. This is just one of many examples too.

replies(2): >>45157345 #>>45158782 #
1. CuriouslyC ◴[] No.45158782[source]
The key is to have each step have very detailed instructions, and tell claude to dispatch the appropriate domain expert subagent for each step with the specific instructions for that step. That keeps root context hot, and each subagent only gets the instructions it needs and has a fresh context.