Most active commenters
  • mjr00(4)

←back to thread

688 points dheerajvs | 14 comments | | HN request time: 0.586s | source | bottom
Show context
simonw ◴[] No.44523442[source]
Here's the full paper, which has a lot of details missing from the summary linked above: https://metr.org/Early_2025_AI_Experienced_OS_Devs_Study.pdf

My personal theory is that getting a significant productivity boost from LLM assistance and AI tools has a much steeper learning curve than most people expect.

This study had 16 participants, with a mix of previous exposure to AI tools - 56% of them had never used Cursor before, and the study was mainly about Cursor.

They then had those 16 participants work on issues (about 15 each), where each issue was randomly assigned a "you can use AI" v.s. "you can't use AI" rule.

So each developer worked on a mix of AI-tasks and no-AI-tasks during the study.

A quarter of the participants saw increased performance, 3/4 saw reduced performance.

One of the top performers for AI was also someone with the most previous Cursor experience. The paper acknowledges that here:

> However, we see positive speedup for the one developer who has more than 50 hours of Cursor experience, so it's plausible that there is a high skill ceiling for using Cursor, such that developers with significant experience see positive speedup.

My intuition here is that this study mainly demonstrated that the learning curve on AI-assisted development is high enough that asking developers to bake it into their existing workflows reduces their performance while they climb that learing curve.

replies(33): >>44523608 #>>44523638 #>>44523720 #>>44523749 #>>44523765 #>>44523923 #>>44524005 #>>44524033 #>>44524181 #>>44524199 #>>44524515 #>>44524530 #>>44524566 #>>44524631 #>>44524931 #>>44525142 #>>44525453 #>>44525579 #>>44525605 #>>44525830 #>>44525887 #>>44526005 #>>44526996 #>>44527368 #>>44527465 #>>44527935 #>>44528181 #>>44528209 #>>44529009 #>>44529698 #>>44530056 #>>44530500 #>>44532151 #
1. mjr00 ◴[] No.44523608[source]
> My intuition here is that this study mainly demonstrated that the learning curve on AI-assisted development is high enough that asking developers to bake it into their existing workflows reduces their performance while they climb that learing curve.

Definitely. Effective LLM usage is not as straightforward as people believe. Two big things I see a lot of developers do when they share chats:

1. Talk to the LLM like a human. Remember when internet search first came out, and people were literally "Asking Jeeves" in full natural language? Eventually people learned that you don't need to type, "What is the current weather in San Francisco?" because "san francisco weather" gave you the same, or better, results. Now we've come full circle and people talk to LLMs like humans again; not out of any advanced prompt engineering, but just because it's so anthropomorphized it feels natural. But I can assure you that "pandas count unique values column 'Foo'" is just as effective an LLM prompt as "Using pandas, how do I get the count of unique values in the column named 'Foo'?" The LLM is also not insulted by you talking to it like this.

2. Don't know when to stop using the LLM. Rather than let the LLM take you 80% of the way there and then handle the remaining 20% "manually", they'll keep trying to prompt to get the LLM to generate what they want. Sometimes this works, but often it's just a waste of time and it's far more efficient to just take the LLM output and adjust it manually.

Much like so-called Google-fu, LLM usage is a skill and people who don't know what they're doing are going to get substandard results.

replies(6): >>44523635 #>>44523674 #>>44523721 #>>44523782 #>>44524509 #>>44528152 #
2. Jaxan ◴[] No.44523635[source]
> Effective LLM usage is not as straightforward as people believe

It is not as straightforward as people are told to believe!

replies(1): >>44524217 #
3. gedy ◴[] No.44523674[source]
> Talk to the LLM like a human

Maybe the LLM doesn't strictly need it, but typing out does bring some clarity for the asker. I've found it helps a lot to catch myself - what am I even wanting from this?

4. frotaur ◴[] No.44523721[source]
I'm not sure about your example about talking to LLMs. There is good reason to think that speaking to it like a human might produce better results, as that's what most of the training data is composed of.

I don't have any studies, but it eems to me reasonable to assume.

(Unlike google, where presumably it actually used keywords anyway)

replies(1): >>44523771 #
5. mjr00 ◴[] No.44523771[source]
> I'm not sure about your example about talking to LLMs. There is good reason to think that speaking to it like a human might produce better results, as that's what most of the training data is composed of.

In practice I have not had any issues getting information out of an LLM when speaking to them like a computer, rather than a human. At least not for factual or code-related information; I'm not sure how it impacts responses for e.g. creative writing, but that's not what I'm using them for anyway.

6. lukan ◴[] No.44523782[source]
"But I can assure you that "pandas count unique values column 'Foo'" is just as effective an LLM prompt as "Using pandas, how do I get the count of unique values in the column named 'Foo'?""

How can you be so sure? Did you compare in a systematic way or read papers by people who did it?

Now I surely get results giving the llm only snippets and keywords, but anything complex, I do notice differences the way I articulate. Not claiming there is a significant difference, but it seems to me this way.

replies(1): >>44523882 #
7. mjr00 ◴[] No.44523882[source]
> How can you be so sure? Did you compare in a systematic way or read papers by people who did it?

No, but I didn't need to read scientific papers to figure how to use Google effectively, either. I'm just using a results-based analysis after a lot of LLM usage.

replies(2): >>44523986 #>>44524159 #
8. lukan ◴[] No.44523986{3}[source]
Well, I did needed some tutorials to use google efficently in the old days when + meant something specific.
9. skybrian ◴[] No.44524159{3}[source]
Other people don't have benefit of your experience, though, so there's a communications gap here: this boils down to "trust me, bro."

How do we get beyond that?

replies(1): >>44524439 #
10. sleepybrett ◴[] No.44524217[source]
^ this, so much this. The amount of bullshit that gets shoveled into hacker news threads about the supposed capabilities of these models is epic.
11. mjr00 ◴[] No.44524439{4}[source]
This is the gap between capability (what can this tool do?) versus workflow (what is the best way to use this tool to accomplish a goal?). Capabilities can be strictly evaluated, but workflow is subjective. Saying "Google has the site: and before: operators" is capability, saying "you should use site:reddit.com before:2020 in Google queries" is workflow.

LLMs have made the distinction ambiguous because their capabilities are so poorly understood. When I say "you should talk to an LLM like it's a computer", that's a workflow statement; it's a more efficient way to accomplish the same goal. You can try it for yourself and see if you agree. I personally liken people who talk to LLMs in full, proper English, capitalization and all, to boomers who still type in full sentences when running a Google query. Is there anything strictly wrong with it? Not really. Do I believe it's a more efficient workflow to just type the keywords that will give you the same result? Yes.

Workflow efficiencies can't really be scientifically evaluated. Some people still prefer to have desktop icons for programs on Windows; my workflow is pressing winkey -> typing the first few characters of the program -> enter. Is one of these methods scientifically more correct? Not really.

So, yeah -- eventually you'll either find your own workflow or copy the workflow of someone you see who is using LLMs effectively. It really is "just trust me, bro."

replies(1): >>44525681 #
12. bit1993 ◴[] No.44524509[source]
> Rather than let the LLM take you 80% of the way there and then handle the remaining 20% "manually"

IMO 80% is way too much, LLMs are probably good for things that are not your domain knowledge and you can efford to not be 100% correct, like rendering the Mandelbrot set, simple functions like that.

LLMs are not deterministic sometimes they produce correct code and other times they produce wrong code. This means one has to audit LLM generated code and auditing code takes more effort than writing it, especially if you are not the original author of the code being audited.

Code has to be 100% deterministic. As programmers we write code, detailed instructions for the computer (CPU), we have developed allot of tools such as Unit Tests to make sure the computer does exactly what we wrote.

A codebase has allot of context that you gain by writing the code, some things just look wrong and you know exactly why because you wrote the code, there is also allot of context that you should keep in your head as you write the code, context that you miss from simply prompting an LLM.

13. skybrian ◴[] No.44525681{5}[source]
Maybe it would help if more people wrote tutorials? It doesn't seem reasonable for people who don't have a buddy to learn from to have to figure it out on their own.
14. badsectoracula ◴[] No.44528152[source]
> But I can assure you that "pandas count unique values column 'Foo'" is just as effective an LLM prompt as "Using pandas, how do I get the count of unique values in the column named 'Foo'?"

While the results are going to be similar, typing a question in full can help you think about it yourself too, as if the LLM is a rubber duck that can respond back.

I've found myself adjusting and rewriting prompts during the process of writing them before i ask the LLM anything because as i was writing the prompt i was thinking about the problem simultaneously.

Of course for simple queries like "write me a function in C that calculates the length of a 3d vector using vec3 for type" you can write it like "c function vec3 length 3d" or something like that instead and the LLM will give more or less the same response (tried it with Devstral).

But TBH to me that sounds like programmers using Vim claiming they're more productive than users of other editors because they have to use less keystrokes.