Measuring the impact of AI on experienced open-source developer productivity

(metr.org)

688 points dheerajvs | 2 comments | 10 Jul 25 16:29 UTC | HN request time: 0.452s | source

Show context

simonw ◴[10 Jul 25 17:36 UTC] No.44523442[source]▶

Here's the full paper, which has a lot of details missing from the summary linked above: https://metr.org/Early_2025_AI_Experienced_OS_Devs_Study.pdf

My personal theory is that getting a significant productivity boost from LLM assistance and AI tools has a much steeper learning curve than most people expect.

This study had 16 participants, with a mix of previous exposure to AI tools - 56% of them had never used Cursor before, and the study was mainly about Cursor.

They then had those 16 participants work on issues (about 15 each), where each issue was randomly assigned a "you can use AI" v.s. "you can't use AI" rule.

So each developer worked on a mix of AI-tasks and no-AI-tasks during the study.

A quarter of the participants saw increased performance, 3/4 saw reduced performance.

One of the top performers for AI was also someone with the most previous Cursor experience. The paper acknowledges that here:

> However, we see positive speedup for the one developer who has more than 50 hours of Cursor experience, so it's plausible that there is a high skill ceiling for using Cursor, such that developers with significant experience see positive speedup.

My intuition here is that this study mainly demonstrated that the learning curve on AI-assisted development is high enough that asking developers to bake it into their existing workflows reduces their performance while they climb that learing curve.

replies(33): >>44523608 #>>44523638 #>>44523720 #>>44523749 #>>44523765 #>>44523923 #>>44524005 #>>44524033 #>>44524181 #>>44524199 #>>44524515 #>>44524530 #>>44524566 #>>44524631 #>>44524931 #>>44525142 #>>44525453 #>>44525579 #>>44525605 #>>44525830 #>>44525887 #>>44526005 #>>44526996 #>>44527368 #>>44527465 #>>44527935 #>>44528181 #>>44528209 #>>44529009 #>>44529698 #>>44530056 #>>44530500 #>>44532151 #

grey-area ◴[10 Jul 25 18:25 UTC] No.44524005[source]▶

>>44523442 #

Well, there are two possible interpretations here of 75% of participants (all of whom had some experience using LLMs) being slower using generative AI:

LLMs have a v. steep and long learning curve as you posit (though note the points from the paper authors in the other reply).

Current LLMs just are not as good as they are sold to be as a programming assistant and people consistently predict and self-report in the wrong direction on how useful they are.

replies(6): >>44524525 #>>44524552 #>>44525186 #>>44525216 #>>44525303 #>>44526981 #

1. burnte ◴[10 Jul 25 20:27 UTC] No.44525186[source]▶

>>44524005 #

> Current LLMs just are not as good as they are sold to be as a programming assistant and people consistently predict and self-report in the wrong direction on how useful they are.

I would argue you don't need the "as a programming assistant" phrase as right now from my experience over the past 2 years, literally every single AI tool is massively oversold as to its utility. I've literally not seen a single one that delivers on what it's billed as capable of.

They're useful, but right now they need a lot of handholding and I don't have time for that. Too much fact checking. If I want a tool I always have to double check, I was born with a memory so I'm already good there. I don't want to have to fact check my fact checker.

LLMs are great at small tasks. The larger the single task is, or the more tasks you try to cram into one session, the worse they fall apart.

replies(1): >>44526422 #

2. ◴[10 Jul 25 22:31 UTC] No.44526422[source]▶

>>44525186 (TP) #

↑