Measuring the impact of AI on experienced open-source developer productivity

(metr.org)

Show context

simonw ◴[10 Jul 25 17:36 UTC] No.44523442[source]▶

Here's the full paper, which has a lot of details missing from the summary linked above: https://metr.org/Early_2025_AI_Experienced_OS_Devs_Study.pdf

My personal theory is that getting a significant productivity boost from LLM assistance and AI tools has a much steeper learning curve than most people expect.

This study had 16 participants, with a mix of previous exposure to AI tools - 56% of them had never used Cursor before, and the study was mainly about Cursor.

They then had those 16 participants work on issues (about 15 each), where each issue was randomly assigned a "you can use AI" v.s. "you can't use AI" rule.

So each developer worked on a mix of AI-tasks and no-AI-tasks during the study.

A quarter of the participants saw increased performance, 3/4 saw reduced performance.

One of the top performers for AI was also someone with the most previous Cursor experience. The paper acknowledges that here:

> However, we see positive speedup for the one developer who has more than 50 hours of Cursor experience, so it's plausible that there is a high skill ceiling for using Cursor, such that developers with significant experience see positive speedup.

My intuition here is that this study mainly demonstrated that the learning curve on AI-assisted development is high enough that asking developers to bake it into their existing workflows reduces their performance while they climb that learing curve.

replies(33): >>44523608 #>>44523638 #>>44523720 #>>44523749 #>>44523765 #>>44523923 #>>44524005 #>>44524033 #>>44524181 #>>44524199 #>>44524515 #>>44524530 #>>44524566 #>>44524631 #>>44524931 #>>44525142 #>>44525453 #>>44525579 #>>44525605 #>>44525830 #>>44525887 #>>44526005 #>>44526996 #>>44527368 #>>44527465 #>>44527935 #>>44528181 #>>44528209 #>>44529009 #>>44529698 #>>44530056 #>>44530500 #>>44532151 #

ivanovm ◴[10 Jul 25 23:53 UTC] No.44526996[source]▶

>>44523442 #

I find the very popular response of "you're just not using it right" to be big copout for LLMs, especially at the scale we see today. It's hard to think of any other major tech product where it's acceptable to shift so much blame on the user. Typically if a user doesn't find value in the product, we agree that the product is poorly designed/implemented, not that the user is bad. But AI seems somehow exempt from this sentiment

replies(15): >>44527074 #>>44527365 #>>44527386 #>>44527577 #>>44527623 #>>44527723 #>>44527868 #>>44528270 #>>44528322 #>>44529356 #>>44529649 #>>44530908 #>>44532696 #>>44533993 #>>44537674 #

sanderjd ◴[11 Jul 25 01:31 UTC] No.44527577{3}[source]▶

>>44526996 #

> It's hard to think of any other major tech product where it's acceptable to shift so much blame on the user.

Maybe, but it isn't hard to think of developer tools where this is the case. This is the entire history of editor and IDE wars.

Imagine running this same study design with vim. How well would you expect the not-previously-experienced developers to perform in such a study?

replies(2): >>44528674 #>>44529374 #

fingerlocks ◴[11 Jul 25 05:30 UTC] No.44528674{4}[source]▶

>>44527577 #

No one is claiming 10x perf gains in vim.

It’s just a fun geeky thing to use with a lot of zany customizations. And after two hellish years of memory muscling enough keyboard bindings to finally be productive, you earned it! It’s a badge of pride!

But we all know you’re still fat fingering ggdG on occasion and silently cursing to yourself.

replies(1): >>44529110 #

1. TeMPOraL ◴[11 Jul 25 06:58 UTC] No.44529110{5}[source]▶

>>44528674 #

> No one is claiming 10x perf gains in vim.

Sure they are - or at least were, unitl the last couple years. Same thing with Emacs.

It's hard to claim this now, because the entire industry shifted towards webshit and cloud-based practices across the board, and the classical editors just can't keep up with VS Code. Despite the latter introducing LSP, which leveled the playing field wrt. code intelligence itself, the surrounding development process and the ecosystem increasingly demands you use web-based or web-derived tools and practices, which all see a browser engine as a basic building block. Classical editors can't match the UX/DX on that, plus the whole thing breaks basic assumptions about UI that were the source of the "10x perf gains" in vim and Emacs.

Ironically, a lot of the perf gains from AI come from letting you avoid dealing with the brokenness of the current tools and processes, that vim and Emacs are not equipped to handle.

replies(3): >>44529753 #>>44534294 #>>44535648 #

2. fingerlocks ◴[11 Jul 25 08:43 UTC] No.44529753[source]▶

>>44529110 (TP) #

Yeah I’m in my 40s and have been using vim for decades. Sure there was an occasional rando stirring up the forums about made-up productivity gains to get some traffic to their blog, but that was it. There has always been push back from many of the strongest vim advocates that the appeal is not about typing speed or whatever it was they were claiming. It’s just ergonomics and power.

It’s just not comparable to the LLM crazy hype train.

And to belabor your other point, I have treesitter, lsp, and GitHub Copilot agent all working flawlessly in neovim. Ts and lsp are neovim builtins now. And it’s custom built for exactly how I want it to be, and none of that blinking shit or nagging dialog boxes all over VSCode.

I have VScode and vim open to the same files all day quite literally side by side, because I work at Microsoft, share my screen often, and there are still people that have violent allergic reactions to a terminal and vim. Vim can do everything VSCode does and it’s not dogshit slow.

replies(1): >>44532161 #

3. Imustaskforhelp ◴[11 Jul 25 13:52 UTC] No.44532161[source]▶

>>44529753 #

I am really curious what your thoughts on zed are, given that it has a lot of features and is still mostly vim compatible (from what i know) so you have the same ergonomics and power and it has some sane defaults / I don't need to tinker as much with zed as I would have to with nvim.

Its not that I don't like tinkering. I really enjoy tinkering with config files but I never could understand nvim personally since I usually want a lsp / good enough experience that nvim or any lunarvim etc. couldn't provide without me installing additional software.

replies(1): >>44536223 #

4. hajile ◴[11 Jul 25 16:46 UTC] No.44534294[source]▶

>>44529110 (TP) #

I use most of the best vim features in VS Code with their vim bindings.

You'd be hard-pressed to find a popular editor without vim bindings.

5. iLemming ◴[11 Jul 25 18:46 UTC] No.44535648[source]▶

>>44529110 (TP) #

> vim and Emacs are not equipped to handle.

You clearly don't have a slightest idea of what you're talking about.

Emacs is actually still amazing in the LLM era. Language is all about plain text. Plain text remains crucial and will remain important because it's human-readable, machine-parsable, version-control friendly, lightweight and fast, platform-independent, and resistant to obsolescence. Even when analyzing huge amounts of complex data - images, videos, audio-recordings, etc., we often have to reduce it to text representation.

And there's simply no tool better than Emacs today that is well-suited for dealing with plain text. Nothing even comes close to what you can do with text in Emacs.

Like, check this out - I am right now transcribing my audio notes into .srt (subtitle) files. There's subed-mode where you can read through subtitles, and even play the audio, karaoke style, while following the text. I can do so many different things from here - extract the summaries, search through things, gather analytics - e.g., how often have I said 'fuck' on Wednesdays, etc.

I can similarly play YouTube videos in mpv, while controlling the playback, volume, speed, etc. from Emacs; I can extract subtitles for a given video and search through them, play the vid from the exact place in the subs.

I very often grab a selected region of screen during Zoom sessions to OCR and extract text within it and put it in my notes - yes, I do it in Emacs.

I can probably examine images, analyze their elements, create comprehensive summaries, and formulate expert artistic evaluation and critique and even ask Emacs to read it aloud back to me - the possibilities are virtually limitless.

It allows you to engage with vast array of LLM models from anywhere. I can ask a question in the midst of typing a Slack reply or reading HN comments or when composing a git commit; I can fact-check my own assumptions. I can also use tools to analyze and refactor existing codebases and vibe-code new stuff.

Anything like that even five years ago seemed like a dream; today it is possible. We can now reduce any complex digital data to plain text. And that feels miraculous.

If anything, the LLM era has made Emacs an extremely compelling choice. To be honest, for me - it's not even a choice, it's the only seriously viable option I have - despite all its drawbacks. Everything else doesn't even come close - other options either lacking critical features or have merely promising ones. Emacs is absolutely, hands-down, one of the best tools we humans have ever produced to deal with plain text. Anyone who thinks it's an opinion and not a fact simply hasn't grokked Emacs or has no clue what you can do with it.

replies(1): >>44536303 #

6. fingerlocks ◴[11 Jul 25 19:52 UTC] No.44536223{3}[source]▶

>>44532161 #

I haven’t tried zed and I’m getting old and set in my ways. If it ain’t broke don’t fix it and all that.

So if the claim is that I can get everything I have out of vim, most importantly being unbeatably fast text buffers, and I don’t need a suitcase full of config files, that’s very compelling.

Is that the promise of zed?

7. fingerlocks ◴[11 Jul 25 20:02 UTC] No.44536303[source]▶

>>44535648 #

At first I thought you were replying to me and this was a revival of the old vim + emacs wars.

I’m so glad we’re past that now and can join forces against a common enemy.

Thank you brother.

replies(1): >>44537138 #

8. iLemming ◴[11 Jul 25 21:53 UTC] No.44537138{3}[source]▶

>>44536303 #

There weren't any true "wars" to begin with. The entire thing is just absurd. These ideas are not even in competition, it's like arguing whether a piano or sheet music is "better".

Emacs veterans simply rejected the entire concept of modality, without even trying to understand what it is about. Emacs is inherently a modal editor. Key-chords are stateful, Transient menus (i.e. Magit) are modals, completion is a modal, isearch, dired, calc, C-u (universal argument), recursive editing — these are all modals. What the idea of vim-motions offers is a universal, simplified, structured language to deal with modality, that's all.

Vim users on the other hand keep saying "there's no such thing as vim-mode". And to a certain degree they are right — no vim plugin outside of vim/neovim implements all the features — IdeaVim, VSCode vim plugins, Sublime, etc. - all of them are full of holes and glaring deficiencies. With one notable exception — Evil-mode in Emacs. It is so wonderfully implemented, you wouldn't even notice that it is a plugin, an afterthought. It really does feel like a baked-in, native feature of the editor.

There are no "wars" in our industry — pretty much only misunderstanding, misinterpretation and misuse of certain ideas. It's not even technological — who knows, maybe it's not even sociotechnological. People simply like talking past each other, defending different values without acknowledging they're optimizing for different things.

It's not Vim's, Emacs' or VSCode's fault that we suffer from identity investment - we spend hundreds of hours using one so it becomes our identity. We suffer from simplification impulse — we just love binary choices, we constantly have the nagging "which is better?" question, even when it makes little sense. We're predisposed to tribal belonging — having a common enemy creates in-group cohesion.

But real, experienced craftspeople... they just use whatever works best for them in a given context. That's what we all should strive for — discover old and new ideas, study them, identify good ones, borrow them, shelve the bad ones (who knows, maybe in a different context they may still prove useful). Most importantly, use whatever makes you and your teammates happy. It's far more important than being more productive or being decisively right. If thy stupid thing works, perhaps it ain't that stupid?

↑