←back to thread

689 points dheerajvs | 6 comments | | HN request time: 0s | source | bottom
Show context
simonw ◴[] No.44523442[source]
Here's the full paper, which has a lot of details missing from the summary linked above: https://metr.org/Early_2025_AI_Experienced_OS_Devs_Study.pdf

My personal theory is that getting a significant productivity boost from LLM assistance and AI tools has a much steeper learning curve than most people expect.

This study had 16 participants, with a mix of previous exposure to AI tools - 56% of them had never used Cursor before, and the study was mainly about Cursor.

They then had those 16 participants work on issues (about 15 each), where each issue was randomly assigned a "you can use AI" v.s. "you can't use AI" rule.

So each developer worked on a mix of AI-tasks and no-AI-tasks during the study.

A quarter of the participants saw increased performance, 3/4 saw reduced performance.

One of the top performers for AI was also someone with the most previous Cursor experience. The paper acknowledges that here:

> However, we see positive speedup for the one developer who has more than 50 hours of Cursor experience, so it's plausible that there is a high skill ceiling for using Cursor, such that developers with significant experience see positive speedup.

My intuition here is that this study mainly demonstrated that the learning curve on AI-assisted development is high enough that asking developers to bake it into their existing workflows reduces their performance while they climb that learing curve.

replies(33): >>44523608 #>>44523638 #>>44523720 #>>44523749 #>>44523765 #>>44523923 #>>44524005 #>>44524033 #>>44524181 #>>44524199 #>>44524515 #>>44524530 #>>44524566 #>>44524631 #>>44524931 #>>44525142 #>>44525453 #>>44525579 #>>44525605 #>>44525830 #>>44525887 #>>44526005 #>>44526996 #>>44527368 #>>44527465 #>>44527935 #>>44528181 #>>44528209 #>>44529009 #>>44529698 #>>44530056 #>>44530500 #>>44532151 #
Uehreka ◴[] No.44523923[source]
> My personal theory is that getting a significant productivity boost from LLM assistance and AI tools has a much steeper learning curve than most people expect.

You hit the nail on the head here.

I feel like I’ve seen a lot of people trying to make strong arguments that AI coding assistants aren’t useful. As someone who uses and enjoys AI coding assistants, I don’t find this research angle to be… uh… very grounded in reality?

Like, if you’re using these things, the fact that they are useful is pretty irrefutable. If one thinks there’s some sort of “productivity mirage” going on here, well OK, but to demonstrate that it might be better to start by acknowledging areas where they are useful, and show that your method explains the reality we’re seeing before using that method to show areas where we might be fooling ourselves.

I can maybe buy that AI might not be useful for certain kinds of tasks or contexts. But I keep pushing their boundaries and they keep surprising me with how capable they are, so it feels like it’ll be difficult to prove otherwise in a durable fashion.

replies(3): >>44523987 #>>44524004 #>>44524627 #
1. TechDebtDevin ◴[] No.44523987{3}[source]
Still odd to me that the only vibe coded software that gets aquired are by companies selling tools or want to promote vibe coding.
replies(2): >>44524011 #>>44524086 #
2. furyofantares ◴[] No.44524011[source]
That's not odd. These things are incredibly useful and vibe coding mostly sucks.
3. Uehreka ◴[] No.44524086[source]
Pardon my caps, but WHO CARES about acquisitions?!

You’ve been given a dubiously capable genie that can write code without you having to do it! If this thing can build first drafts of those side projects you always think about and never get around to, that in and of itself is useful! If it can do the yak-shaving required to set up those e2e tests you know you should have but never have time for it is useful!

Have it try out all the dumb ideas you have that might be cool but don’t feel worth your time to boilerplate out!

I like to think we’re a bunch of creative people here! Stop thinking about how it can make you money and use it for fun!

replies(2): >>44524512 #>>44527679 #
4. fwip ◴[] No.44524512[source]
Unfortunately, HN is YC-backed, and attracts these types by design.
replies(1): >>44525264 #
5. Uehreka ◴[] No.44525264{3}[source]
I mean sure, but HN/YC’s founder was always going on about the kinship between “Hackers and Painters” (or at least he used to). It hasn’t always been like this, and definitely doesn’t have to be. We can and should aspire to better.
6. TechDebtDevin ◴[] No.44527679[source]
I have great code gen tools I've built for myself that build my perfect scaffolding/boilerplate every time, for any project in about 30 seconds.

Took me a week to build those tools. Its much more reliable (and flexible) than any LLM and cost me nothing.

It comes with secure Auth, email, admin, ect ect.. Doesn't cost me a dime and almost never has a common vulnerability.

Best part about it. I know how my side project runs.