←back to thread

97 points jay-baleine | 1 comments | | HN request time: 0s | source
Show context
CuriouslyC ◴[] No.45149068[source]
The most important thing is to have a strong plan cycle in front of you agent work, if you do that, agents are very reliable. You need to have a deep research cycle that basically collects a covering set of code that might need to be modified for a feature, feeds it into gemini/gpt5 to get a broad codebase level understanding, then has a debate cycle on how to address it, with the final artifact being a hyper detailed plan that goes file by file and provides an outline of changes required.

Beyond this, you need to maintain good test coverage, and you need to have agents red-team your tests aggressively to make sure they're robust.

If you implement these two steps your agent performance will skyrocket. The planning phase will produce plans that claude can iterate on for 3+ hours in some cases, if you tell it to complete the entire task in one shot, and the robust test validation / change set analysis will catch agents solving an easier problem because they got frustrated or not following directions.

replies(2): >>45149253 #>>45156622 #
skydhash ◴[] No.45149253[source]
By that point I would have already produced the 20 line diff for the ticket. Huge commits (or change requests) are usually scaffolding, refactoring, or design changes to support new features. You also got generated code and verbose language like CSS. So stuff where the more knowledge you have about the code, the faster you can be.

The daily struggle was always those 10 line diffs where you have to learn a lot (from the stakeholder, by debugging, from the docs).

replies(1): >>45149537 #
CuriouslyC ◴[] No.45149537[source]
A deep plan cycle will find stuff like this, because it's looking at the whole relevant portion of your codebase at once (and optionally the web, your internal docs, etc). It'll just generate a very short plan for the agent.

The important thing is that this process is entirely autonomous. You create an issue, that hooks the planners, the completion of a plan artifact hooks a test implementer, the completion of tests hooks the code implementer(s, with cheaper models generating multiple solutions and taking the best diff works well), the completion of a solution + PR hooks code+security review, test red teaming, etc.

replies(1): >>45181683 #
1. bit_bear ◴[] No.45181683[source]
What do those hooks look like low level? A script polling against some ticket queue triggers the planner. Is the hand off done by using watchman which triggers agents on .md files dropped in certain directories?