Gotcha - and yep, I did a small internal natural qualitative experiment with our AI head. It lines up with what I was writing around the benefit being throughput after 2-3 week investment:
* Setup: I did a whole cloth clean room rewrite at production-grade an MCP server that was previously prototyped by the AI head. I intentionally went all-in for this "first serious" agentic coding project: ultimately < 100 lines of code manually edited, while the final AI-generated PR stack was big.
* Baseline: If both of us did manually, I had already estimated similar times to completion due to differences in task scope naturally matching differences in proficiency for those tasks. Despite being new to agentic coding at the time, key aspects were in my favor: senior dev, 2 years of ~daily prompt engineering experience (tactics), PhD in code synthesis (strategy), and the repo setup with various lint/type/test guardrails.
* Result: About the same 1-2 weeks as a junior vibes coder vs the manual coder
* Experience: It was clear the first 1-2 weeks were slow due to onboarding myself and the codebase to agentic coding. That first week was especially rough. While I could get long runs going during it, I was doing the more confidently week 2, and switching to figuring out how to do parallel agents. Near the end, I was switching to doing multiple long runs in parallel on different tasks, where I could maintain maybe 2-4, but managing more gets exhausting, and especially when any are shorter runs.
Separately, we have a genAI team and a GPU Python team both switching to agentic coding. The natural difference in prompt engineering experience across teams seems to have individuals on one team picking up faster than the other, when gauged by the ability to do long runs
The initial experiment is part of why I view current agentic coding being more about overall coding throughput, and not latency within any specific task. If someone can reliably trigger long runs, do multiple in parallel, and doesn't waste time in interactive chatting, the difference is stark.
Likewise, reinforced by both of the above cases, looking for throughput improvements in the first 2-3w seems aggressive. A lot of blogposts seem to be from people new to high-quality prompting, tackling repos/tasks not setup for agents, and as limited nights & weekend efforts. To get to clear throughput multipliers from multiple agents working in parallel on good long runs.. I'd expect that to hit at month 2 or 3 when someone isn't as all in as I was able to be. It's more like a skill with setup, so takes investment before you reap the rewards.