Measuring the impact of AI on experienced open-source developer productivity

1. kokanee ◴[10 Jul 25 16:53 UTC] No.44523013[source]▶

> developers expected AI to speed them up by 24%, and even after experiencing the slowdown, they still believed AI had sped them up by 20%.

I feel like there are two challenges causing this. One is that it's difficult to get good data on how long the same person in the same context would have taken to do a task without AI vs with. The other is that it's tempting to time an AI with metrics like how long until the PR was opened or merged. But the AI workflow fundamentally shifts engineering hours so that a greater percentage of time is spent on refactoring, testing, and resolving issues later in the process, including after the code was initially approved and merged. I can see how it's easy for a developer to report that AI completed a task quickly because the PR was opened quickly, discounting the amount of future work that the PR created.

replies(4): >>44523132 #>>44523767 #>>44523857 #>>44524518 #

2. qsort ◴[10 Jul 25 17:08 UTC] No.44523132[source]▶

>>44523013 (TP) #

It's really hard to attribute productivity gains/losses to specific technologies or practices, I'm very wary of self-reported anecdotes in any direction precisely because it's so easy to fool ourselves.

I'm not making any claim in either direction, the authors themselves recognize the study's limitations, I'm just trying to say that everyone should have far greater error bars. This technology is the weirdest shit I've seen in my lifetime, making deductions about productivity from anecdotes and dubious benchmarks is basically reading tea leaves.

3. yorwba ◴[10 Jul 25 18:02 UTC] No.44523767[source]▶

>>44523013 (TP) #

Figure 21 shows that initial implementation time (which I take to be time to PR) increased as well, although post-review time increased even more (but doesn't seem to have a significant impact on the total).

But Figure 18 shows that time spent actively coding decreased (which might be where the feeling of a speed-up was coming from) and the gains were eaten up by time spent prompting, waiting for and then reviewing the AI output and generally being idle.

So maybe it's not a good idea to use LLMs for tasks that you could've done yourself in under 5 minutes.

4. narush ◴[10 Jul 25 18:10 UTC] No.44523857[source]▶

>>44523013 (TP) #

Qualitatively, we don't see a drop in PR quality in between AI-allowed and AI-disallowed conditions in the study; the devs who participate are generally excellent, know their repositories standards super well, and aren't really into the 'get up a bad PR' vibe -- the median review time on the PRs in the study is about a minute.

Developers totally spend time totally differently, though, this is a great callout! On page 10 of the paper [1], you can see a breakdown of how developers spend time when they have AI vs. not - in general, when these devs have AI, they spend a smaller % of time writing code, and a larger % of time working with AI (which... makes sense).

[1] https://metr.org/Early_2025_AI_Experienced_OS_Devs_Study.pdf

5. gitremote ◴[10 Jul 25 19:18 UTC] No.44524518[source]▶

>>44523013 (TP) #

> I feel like there are two challenges causing this. One is that it's difficult to get good data on how long the same person in the same context would have taken to do a task without AI vs with.

The standard experimental design that solves this is to randomly assign participants to the experiment group (with AI) and the control group (without AI), which is what they did. This isolates the variable (with or without AI), taking into account uncontrollable individual, context, and environmental differences. You don't need to know how the single individual and context would have behaved in the other group. With a large enough sample size and effect size, you can determine statistical significance, and that the with-or-without-AI variable was the only difference.