Measuring the impact of AI on experienced open-source developer productivity

(metr.org)

Show context

simonw ◴[10 Jul 25 17:36 UTC] No.44523442[source]▶

Here's the full paper, which has a lot of details missing from the summary linked above: https://metr.org/Early_2025_AI_Experienced_OS_Devs_Study.pdf

My personal theory is that getting a significant productivity boost from LLM assistance and AI tools has a much steeper learning curve than most people expect.

This study had 16 participants, with a mix of previous exposure to AI tools - 56% of them had never used Cursor before, and the study was mainly about Cursor.

They then had those 16 participants work on issues (about 15 each), where each issue was randomly assigned a "you can use AI" v.s. "you can't use AI" rule.

So each developer worked on a mix of AI-tasks and no-AI-tasks during the study.

A quarter of the participants saw increased performance, 3/4 saw reduced performance.

One of the top performers for AI was also someone with the most previous Cursor experience. The paper acknowledges that here:

> However, we see positive speedup for the one developer who has more than 50 hours of Cursor experience, so it's plausible that there is a high skill ceiling for using Cursor, such that developers with significant experience see positive speedup.

My intuition here is that this study mainly demonstrated that the learning curve on AI-assisted development is high enough that asking developers to bake it into their existing workflows reduces their performance while they climb that learing curve.

replies(33): >>44523608 #>>44523638 #>>44523720 #>>44523749 #>>44523765 #>>44523923 #>>44524005 #>>44524033 #>>44524181 #>>44524199 #>>44524515 #>>44524530 #>>44524566 #>>44524631 #>>44524931 #>>44525142 #>>44525453 #>>44525579 #>>44525605 #>>44525830 #>>44525887 #>>44526005 #>>44526996 #>>44527368 #>>44527465 #>>44527935 #>>44528181 #>>44528209 #>>44529009 #>>44529698 #>>44530056 #>>44530500 #>>44532151 #

1. thesz ◴[10 Jul 25 21:02 UTC] No.44525579[source]▶

>>44523442 #

  > My personal theory is that getting a significant productivity boost from LLM assistance and AI tools has a much steeper learning curve than most people expect.

This is what I heard about strong type systems (especially Haskell's) about 20-15 years ago.

"History does not repeat, but it rhymes."

If we rhyme "strong types will change the world" with "agentic LLMs will change the world," what do we get?

My personal theory is that we will get the same: some people will get modest-to-substantial benefits there, but changes in the world will be small if noticeable at all.

replies(2): >>44525751 #>>44525928 #

2. ruszki ◴[10 Jul 25 21:21 UTC] No.44525751[source]▶

>>44525579 (TP) #

Maybe it depends on the task. I’m 100% sure, that if you think that type system is a drawback, then you have never code in a diverse, large codebase. Our 1.5 million LOC 30 years old monolith would be completely unmaintainable without it. But seriously, anything without a formal type system above 10 LOC after a few years is unmaintainable. An informal is fine for a while, but not long for sure. On a 30 years old code, basically every single informal rules are broken.

Also, my long experience is that even in PoC phase, using a type system adds almost zero extra time… of course if you know the type system, which should be trivial in any case after you’ve seen a few.

replies(2): >>44529397 #>>44529495 #

3. leshow ◴[10 Jul 25 21:40 UTC] No.44525928[source]▶

>>44525579 (TP) #

I don't think that's a fair comparison. Type systems don't produce probabilistic output. Their entire purpose is to reduce the scope of possible errors you can write. They kind of did change the world, didn't they? I mean, not everyone is writing Haskell but Rust exists and it's doing pretty well. There was also not really a case to be made where type systems made software in general _worse_. But you could definitely make the case that LLM's might make software worse.

replies(2): >>44526616 #>>44529347 #

4. atlintots ◴[10 Jul 25 22:55 UTC] No.44526616[source]▶

>>44525928 #

Its too bad the management people never pushed Haskell as hard as they're pushing AI today! Alas.

5. thesz ◴[11 Jul 25 07:38 UTC] No.44529347[source]▶

>>44525928 #

That probabilistic output has to be symbolically constrained - SQL/JSON/other code is generated through syntax constrained beam search.

You brought up Rust, it is fascinating.

The Rust's type system differs from typical Hindle-Milner by having operations that can remove definitions from environment of the scope.

Rust was conceived in 2006.

In 2006 there already were HList papers by Oleg Kiselyov [1] that had shown how to keep type level key-value lists with addition, removal and lookup, and type-level stateful operations like in [2] were already possible, albeit, most probably, not with nice monadic syntax support.

  [1] https://okmij.org/ftp/Haskell/HList-ext.pdf
  [2] http://blog.sigfpe.com/2009/02/beyond-monads.html

It was entirely possible to have prototype Rust to be embedded into Haskell and have borrow checker implemented as type-level manipulation over double parameterized state monad.

But it was not, Rust was not embedded into Haskell and now it will never get effects (even as weak as monad transformers) and, as a consequence, will never get proper high performance software transactional memory.

So here we are: everything in Haskell's strong type system world that would make Rust better was there at the very beginning of the Rust journey, but had no impact on Rust.

Rhyme that with LLM.

6. thesz ◴[11 Jul 25 07:47 UTC] No.44529397[source]▶

>>44525751 #

Contrarily I believe that strong type system is a plus. Please, look at my other comment: https://news.ycombinator.com/item?id=44529347

My original point was about history and about how can we extract possible outcome from it.

My other comment tries to amplify that too. Type systems were strong enough for several decades now, had everything Rust needed and more years before Rust began, yet they have little penetration into real world, example being that fancy dandy Rust language.

7. sfn42 ◴[11 Jul 25 08:02 UTC] No.44529495[source]▶

>>44525751 #

It's generally trivial for conventional class-based type systems like those in Java and C#, but TypeScript is a different beast entirely. On the surface it seems similar but it's so much deeper than the others.

I don't like it. I know it is the way it is because it's supposed to support all the cursed weird stuff you can do in JS, but to me as a fullstack developer who's never really taken the time to deep dive and learn TS properly it often feels more like an obstacle. For my own code it's fine, but when I have to work with third party libraries it can be really confusing. It's definitely a skill issue though.

replies(1): >>44529778 #

8. ruszki ◴[11 Jul 25 08:47 UTC] No.44529778{3}[source]▶

>>44529495 #

I agree. Typescript is different for another reason too. They ignore edge cases many times, and because of that you can do really-really nice things with it (when it’s not broken). I wondered a lot of times why Java doesn’t include a few things which would be appropriate even in that world, and the answer is almost always because Java cares about edge cases. There are notes about those in Typescript’s doc or issues.

↑