Measuring the impact of AI on experienced open-source developer productivity

(metr.org)

688 points dheerajvs | 5 comments | 10 Jul 25 16:29 UTC | HN request time: 0.288s | source

Show context

simonw ◴[10 Jul 25 17:36 UTC] No.44523442[source]▶

Here's the full paper, which has a lot of details missing from the summary linked above: https://metr.org/Early_2025_AI_Experienced_OS_Devs_Study.pdf

My personal theory is that getting a significant productivity boost from LLM assistance and AI tools has a much steeper learning curve than most people expect.

This study had 16 participants, with a mix of previous exposure to AI tools - 56% of them had never used Cursor before, and the study was mainly about Cursor.

They then had those 16 participants work on issues (about 15 each), where each issue was randomly assigned a "you can use AI" v.s. "you can't use AI" rule.

So each developer worked on a mix of AI-tasks and no-AI-tasks during the study.

A quarter of the participants saw increased performance, 3/4 saw reduced performance.

One of the top performers for AI was also someone with the most previous Cursor experience. The paper acknowledges that here:

> However, we see positive speedup for the one developer who has more than 50 hours of Cursor experience, so it's plausible that there is a high skill ceiling for using Cursor, such that developers with significant experience see positive speedup.

My intuition here is that this study mainly demonstrated that the learning curve on AI-assisted development is high enough that asking developers to bake it into their existing workflows reduces their performance while they climb that learing curve.

replies(33): >>44523608 #>>44523638 #>>44523720 #>>44523749 #>>44523765 #>>44523923 #>>44524005 #>>44524033 #>>44524181 #>>44524199 #>>44524515 #>>44524530 #>>44524566 #>>44524631 #>>44524931 #>>44525142 #>>44525453 #>>44525579 #>>44525605 #>>44525830 #>>44525887 #>>44526005 #>>44526996 #>>44527368 #>>44527465 #>>44527935 #>>44528181 #>>44528209 #>>44529009 #>>44529698 #>>44530056 #>>44530500 #>>44532151 #

mjr00 ◴[10 Jul 25 17:50 UTC] No.44523608[source]▶

>>44523442 #

> My intuition here is that this study mainly demonstrated that the learning curve on AI-assisted development is high enough that asking developers to bake it into their existing workflows reduces their performance while they climb that learing curve.

Definitely. Effective LLM usage is not as straightforward as people believe. Two big things I see a lot of developers do when they share chats:

1. Talk to the LLM like a human. Remember when internet search first came out, and people were literally "Asking Jeeves" in full natural language? Eventually people learned that you don't need to type, "What is the current weather in San Francisco?" because "san francisco weather" gave you the same, or better, results. Now we've come full circle and people talk to LLMs like humans again; not out of any advanced prompt engineering, but just because it's so anthropomorphized it feels natural. But I can assure you that "pandas count unique values column 'Foo'" is just as effective an LLM prompt as "Using pandas, how do I get the count of unique values in the column named 'Foo'?" The LLM is also not insulted by you talking to it like this.

2. Don't know when to stop using the LLM. Rather than let the LLM take you 80% of the way there and then handle the remaining 20% "manually", they'll keep trying to prompt to get the LLM to generate what they want. Sometimes this works, but often it's just a waste of time and it's far more efficient to just take the LLM output and adjust it manually.

Much like so-called Google-fu, LLM usage is a skill and people who don't know what they're doing are going to get substandard results.

replies(6): >>44523635 #>>44523674 #>>44523721 #>>44523782 #>>44524509 #>>44528152 #

lukan ◴[10 Jul 25 18:03 UTC] No.44523782[source]▶

>>44523608 #

"But I can assure you that "pandas count unique values column 'Foo'" is just as effective an LLM prompt as "Using pandas, how do I get the count of unique values in the column named 'Foo'?""

How can you be so sure? Did you compare in a systematic way or read papers by people who did it?

Now I surely get results giving the llm only snippets and keywords, but anything complex, I do notice differences the way I articulate. Not claiming there is a significant difference, but it seems to me this way.

replies(1): >>44523882 #

1. mjr00 ◴[10 Jul 25 18:13 UTC] No.44523882[source]▶

>>44523782 #

> How can you be so sure? Did you compare in a systematic way or read papers by people who did it?

No, but I didn't need to read scientific papers to figure how to use Google effectively, either. I'm just using a results-based analysis after a lot of LLM usage.

replies(2): >>44523986 #>>44524159 #

2. lukan ◴[10 Jul 25 18:24 UTC] No.44523986[source]▶

>>44523882 (TP) #

Well, I did needed some tutorials to use google efficently in the old days when + meant something specific.

3. skybrian ◴[10 Jul 25 18:42 UTC] No.44524159[source]▶

>>44523882 (TP) #

Other people don't have benefit of your experience, though, so there's a communications gap here: this boils down to "trust me, bro."

How do we get beyond that?

replies(1): >>44524439 #

4. mjr00 ◴[10 Jul 25 19:10 UTC] No.44524439[source]▶

>>44524159 #

This is the gap between capability (what can this tool do?) versus workflow (what is the best way to use this tool to accomplish a goal?). Capabilities can be strictly evaluated, but workflow is subjective. Saying "Google has the site: and before: operators" is capability, saying "you should use site:reddit.com before:2020 in Google queries" is workflow.

LLMs have made the distinction ambiguous because their capabilities are so poorly understood. When I say "you should talk to an LLM like it's a computer", that's a workflow statement; it's a more efficient way to accomplish the same goal. You can try it for yourself and see if you agree. I personally liken people who talk to LLMs in full, proper English, capitalization and all, to boomers who still type in full sentences when running a Google query. Is there anything strictly wrong with it? Not really. Do I believe it's a more efficient workflow to just type the keywords that will give you the same result? Yes.

Workflow efficiencies can't really be scientifically evaluated. Some people still prefer to have desktop icons for programs on Windows; my workflow is pressing winkey -> typing the first few characters of the program -> enter. Is one of these methods scientifically more correct? Not really.

So, yeah -- eventually you'll either find your own workflow or copy the workflow of someone you see who is using LLMs effectively. It really is "just trust me, bro."

replies(1): >>44525681 #

5. skybrian ◴[10 Jul 25 21:12 UTC] No.44525681{3}[source]▶

>>44524439 #

Maybe it would help if more people wrote tutorials? It doesn't seem reasonable for people who don't have a buddy to learn from to have to figure it out on their own.

↑