←back to thread

688 points dheerajvs | 1 comments | | HN request time: 0.49s | source
Show context
simonw ◴[] No.44523442[source]
Here's the full paper, which has a lot of details missing from the summary linked above: https://metr.org/Early_2025_AI_Experienced_OS_Devs_Study.pdf

My personal theory is that getting a significant productivity boost from LLM assistance and AI tools has a much steeper learning curve than most people expect.

This study had 16 participants, with a mix of previous exposure to AI tools - 56% of them had never used Cursor before, and the study was mainly about Cursor.

They then had those 16 participants work on issues (about 15 each), where each issue was randomly assigned a "you can use AI" v.s. "you can't use AI" rule.

So each developer worked on a mix of AI-tasks and no-AI-tasks during the study.

A quarter of the participants saw increased performance, 3/4 saw reduced performance.

One of the top performers for AI was also someone with the most previous Cursor experience. The paper acknowledges that here:

> However, we see positive speedup for the one developer who has more than 50 hours of Cursor experience, so it's plausible that there is a high skill ceiling for using Cursor, such that developers with significant experience see positive speedup.

My intuition here is that this study mainly demonstrated that the learning curve on AI-assisted development is high enough that asking developers to bake it into their existing workflows reduces their performance while they climb that learing curve.

replies(33): >>44523608 #>>44523638 #>>44523720 #>>44523749 #>>44523765 #>>44523923 #>>44524005 #>>44524033 #>>44524181 #>>44524199 #>>44524515 #>>44524530 #>>44524566 #>>44524631 #>>44524931 #>>44525142 #>>44525453 #>>44525579 #>>44525605 #>>44525830 #>>44525887 #>>44526005 #>>44526996 #>>44527368 #>>44527465 #>>44527935 #>>44528181 #>>44528209 #>>44529009 #>>44529698 #>>44530056 #>>44530500 #>>44532151 #
Aurornis ◴[] No.44525605[source]
> A quarter of the participants saw increased performance, 3/4 saw reduced performance.

The study used 246 tasks across 16 developers, for an average of 15 tasks per developer. Divide that further in half because tasks were assigned as AI or not-AI assisted, and the sample size per developer is still relatively small. Someone would have to take the time to review the statistics, but I don’t think this is a case where you can start inferring that the developers who benefited from AI were just better at using AI tools than those who were not.

I do agree that it would be interesting to repeat a similar test on developers who have more AI tool assistance, but then there is a potential confounding effect that AI-enthusiastic developers could actually lose some of their practice in writing code without the tools.

replies(1): >>44527923 #
1. bluefirebrand ◴[] No.44527923[source]
> potential confounding effect that AI-enthusiastic developers could actually lose some of their practice in writing code without the tools

I don't think this is a confounding effect

This is something that we definitely need to measure and be aware of, if there is a risk of it