Most active commenters
  • jstummbillig(5)

←back to thread

688 points dheerajvs | 13 comments | | HN request time: 0.001s | source | bottom
Show context
simonw ◴[] No.44523442[source]
Here's the full paper, which has a lot of details missing from the summary linked above: https://metr.org/Early_2025_AI_Experienced_OS_Devs_Study.pdf

My personal theory is that getting a significant productivity boost from LLM assistance and AI tools has a much steeper learning curve than most people expect.

This study had 16 participants, with a mix of previous exposure to AI tools - 56% of them had never used Cursor before, and the study was mainly about Cursor.

They then had those 16 participants work on issues (about 15 each), where each issue was randomly assigned a "you can use AI" v.s. "you can't use AI" rule.

So each developer worked on a mix of AI-tasks and no-AI-tasks during the study.

A quarter of the participants saw increased performance, 3/4 saw reduced performance.

One of the top performers for AI was also someone with the most previous Cursor experience. The paper acknowledges that here:

> However, we see positive speedup for the one developer who has more than 50 hours of Cursor experience, so it's plausible that there is a high skill ceiling for using Cursor, such that developers with significant experience see positive speedup.

My intuition here is that this study mainly demonstrated that the learning curve on AI-assisted development is high enough that asking developers to bake it into their existing workflows reduces their performance while they climb that learing curve.

replies(33): >>44523608 #>>44523638 #>>44523720 #>>44523749 #>>44523765 #>>44523923 #>>44524005 #>>44524033 #>>44524181 #>>44524199 #>>44524515 #>>44524530 #>>44524566 #>>44524631 #>>44524931 #>>44525142 #>>44525453 #>>44525579 #>>44525605 #>>44525830 #>>44525887 #>>44526005 #>>44526996 #>>44527368 #>>44527465 #>>44527935 #>>44528181 #>>44528209 #>>44529009 #>>44529698 #>>44530056 #>>44530500 #>>44532151 #
grey-area ◴[] No.44524005[source]
Well, there are two possible interpretations here of 75% of participants (all of whom had some experience using LLMs) being slower using generative AI:

LLMs have a v. steep and long learning curve as you posit (though note the points from the paper authors in the other reply).

Current LLMs just are not as good as they are sold to be as a programming assistant and people consistently predict and self-report in the wrong direction on how useful they are.

replies(6): >>44524525 #>>44524552 #>>44525186 #>>44525216 #>>44525303 #>>44526981 #
steveklabnik ◴[] No.44524552[source]
> Current LLMs

One thing that happened here is that they aren't using current LLMs:

> Most issues were completed in February and March 2025, before models like Claude 4 Opus or Gemini 2.5 Pro were released.

That doesn't mean this study is bad! In fact, I'd be very curious to see it done again, but with newer models, to see if that has an impact.

replies(1): >>44524740 #
blibble ◴[] No.44524740[source]
> One thing that happened here is that they aren't using current LLMs

I've been hearing this for 2 years now

the previous model retroactively becomes total dogshit the moment a new one is released

convenient, isn't it?

replies(10): >>44524758 #>>44524891 #>>44524893 #>>44524975 #>>44525030 #>>44525035 #>>44526195 #>>44526545 #>>44526712 #>>44535270 #
1. jstummbillig ◴[] No.44524975[source]
Convenient for whom and what...? There is nothing tangible to gain from you believing or not believing that someone else does (or does not) get a productivity boost from AI. This is not a religion and it's not crypto. The AI users' net worth is not tied to another ones use of or stance on AI (if anything, it's the opposite).

More generally, the phenomenon this is quite simply explained and nothing surprising: New things improve, quickly. That does not mean that something is good or valuable but it's how new tech gets introduced every single time, and readily explains changing sentiment.

replies(3): >>44525177 #>>44525199 #>>44525836 #
2. card_zero ◴[] No.44525177[source]
I saw that edit. Indeed you can't predict that rejecting a new thing is part of a routine of being wrong. It's true that "it's strange and new, therefore I hate it" is a very human (and adorable) instinct, but sometimes it's reasonable.
replies(2): >>44525559 #>>44530847 #
3. grey-area ◴[] No.44525199[source]
Honestly the hype cycle feels very like crypto, and just like crypto prominent vcs have a lot of money riding on the outcome.
replies(2): >>44525236 #>>44525632 #
4. steveklabnik ◴[] No.44525236[source]
I agree with you, and I think that’s coloring a lot of people’s perceptions. I am not a crypto fan but am an LLM fan.

Every hype cycle feels like this, and some of them are nonsense and some of them are real. We’ll see.

5. jstummbillig ◴[] No.44525559[source]
"I saw that edit" lol
replies(1): >>44525611 #
6. card_zero ◴[] No.44525611{3}[source]
Sorry, just happened to. Slightly rude of me.
replies(1): >>44525716 #
7. jstummbillig ◴[] No.44525632[source]
Of course, lot's of hype, but my point is that the reason why is very different and it matters: As an early bc adopter making your believe in bc is super important to my net worth (and you not believing in bc makes me look like an idiot and lose a lot of money).

In contrast, what do I care if you believe in code generation AI? If you do, you are probably driving up pricing. I mean, I am sure that there are people that care very much, but there is little inherent value for me in you doing so, as long as the people who are building the AI are making enough profit to keep it running.

With regards to the VCs, well, how many VCs are there in the world? How many of the people who have something good to say about AI are likely VCs? I might be off by an order of magnitude, but even then it would really not be driving the discussion.

replies(1): >>44525865 #
8. jstummbillig ◴[] No.44525716{4}[source]
Ah, you do you. It's just a fairly kindergarten thing to point out and not something I was actively trying to hide. Whatever it was.

Generally, I do a couple of edits for clarity after posting and reading again. Sometimes that involves removing something that I feel could have been said better. If it does not work, I will just delete the comment. Whatever it was must not have been a super huge deal (to me).

replies(1): >>44527940 #
9. leshow ◴[] No.44525836[source]
I think you're missing the broader context. There is a lot of people very invested in the maximalist outcome which does create pressure for people to be boosters. You don't need a digital token for that to happen. There's a social media aspect as well that creates a feedback loop about claims.

We're in a hype cycle, and it means we should be extra critical when evaluating the tech so we don't get taken in by exaggerated claims.

replies(1): >>44526326 #
10. leshow ◴[] No.44525865{3}[source]
I don't find that a compelling argument, lots of people get taken in by hype cycles even when they don't profit directly from it.
11. jstummbillig ◴[] No.44526326[source]
I mostly don't agree. Yes, there is always social pressure with these things, and we are in a hype cycle, but the people "buying in" are simply not doing much at all. They are mostly consumers, waiting for the next model, which they have no control over or stake in creating (by and large).

The people not buying into the hype, on the other hands, are actually the ones that have a very good reason to be invested, because if they turn out to be wrong they might face some very uncomfortable adjustments in the job landscape and a lot of the skills that they worked so hard to gain and believed to be valuable.

As always, be weary of any claims, but the tension here is very much the reverse of crypto and I don't think that's very appreciated.

12. maxbond ◴[] No.44527940{5}[source]
FYI there's a "delay" setting in your profile that allows you to make your comment invisible for up to ten minutes.
13. saturneria ◴[] No.44530847[source]
It is an even more human reaction when the new strange thing directly threatens to upend and massively change the industry that puts food on your table.

The steam-powered loom was not good for the luddites either. Good for society at large in the long term but all the negative points that a 40 year old knitter in 1810 could make against the steam-powered loom would have been perfectly reasonable and accurate judged on that individual's perspective.