If you're short on time, I'd recommend just reading the linked blogpost or the announcement thread here [1], rather than the full paper.
If you're short on time, I'd recommend just reading the linked blogpost or the announcement thread here [1], rather than the full paper.
If the instruction is just "implement this ticket with AI", then that's very realistic in that it's how management often tries to operate, but it's also likely to be quite suboptimal. There are ways to use AI that help a lot, and other ways that hurt more than it helps.
If your developers had sufficient experience with AI to tell the difference, then they might have compensated for that, but reading the paper I didn't see any indication of that.
That being said, we can't rule out that the experiment drove them to use more AI than they would have outside of the experiment (in a way that made them less productive). You can see more in section "Experimentally driven overuse of AI (C.2.1)" [1]
[1] https://metr.org/Early_2025_AI_Experienced_OS_Devs_Study.pdf