Most active commenters

steveklabnik(9)
bluefirebrand(6)
jstummbillig(5)
narush(5)
furyofantares(5)
Uehreka(5)
Terr_(5)
simonw(5)
pdabbadabba(4)
fingerlocks(4)

Popular/hot comments

>>44526996 #
>>44524740 #
>>44523638 #
>>44523608 #
>>44524005 #
>>44527074 #
>>44525041 #
>>44529050 #
>>44523765 #
>>44523923 #
>>44526058 #
>>44524975 #
>>44528270 #
>>44528413 #
>>44523720 #
>>44529110 #

←back to thread

Measuring the impact of AI on experienced open-source developer productivity

(metr.org)

1. simonw ◴[10 Jul 25 17:36 UTC] No.44523442[source]▶

>>44522772 (OP) #

Here's the full paper, which has a lot of details missing from the summary linked above: https://metr.org/Early_2025_AI_Experienced_OS_Devs_Study.pdf

My personal theory is that getting a significant productivity boost from LLM assistance and AI tools has a much steeper learning curve than most people expect.

This study had 16 participants, with a mix of previous exposure to AI tools - 56% of them had never used Cursor before, and the study was mainly about Cursor.

They then had those 16 participants work on issues (about 15 each), where each issue was randomly assigned a "you can use AI" v.s. "you can't use AI" rule.

So each developer worked on a mix of AI-tasks and no-AI-tasks during the study.

A quarter of the participants saw increased performance, 3/4 saw reduced performance.

One of the top performers for AI was also someone with the most previous Cursor experience. The paper acknowledges that here:

> However, we see positive speedup for the one developer who has more than 50 hours of Cursor experience, so it's plausible that there is a high skill ceiling for using Cursor, such that developers with significant experience see positive speedup.

My intuition here is that this study mainly demonstrated that the learning curve on AI-assisted development is high enough that asking developers to bake it into their existing workflows reduces their performance while they climb that learing curve.

replies(33): >>44523608 #>>44523638 #>>44523720 #>>44523749 #>>44523765 #>>44523923 #>>44524005 #>>44524033 #>>44524181 #>>44524199 #>>44524515 #>>44524530 #>>44524566 #>>44524631 #>>44524931 #>>44525142 #>>44525453 #>>44525579 #>>44525605 #>>44525830 #>>44525887 #>>44526005 #>>44526996 #>>44527368 #>>44527465 #>>44527935 #>>44528181 #>>44528209 #>>44529009 #>>44529698 #>>44530056 #>>44530500 #>>44532151 #

2. mjr00 ◴[10 Jul 25 17:50 UTC] No.44523608[source]▶

>>44523442 (TP) #

> My intuition here is that this study mainly demonstrated that the learning curve on AI-assisted development is high enough that asking developers to bake it into their existing workflows reduces their performance while they climb that learing curve.

Definitely. Effective LLM usage is not as straightforward as people believe. Two big things I see a lot of developers do when they share chats:

1. Talk to the LLM like a human. Remember when internet search first came out, and people were literally "Asking Jeeves" in full natural language? Eventually people learned that you don't need to type, "What is the current weather in San Francisco?" because "san francisco weather" gave you the same, or better, results. Now we've come full circle and people talk to LLMs like humans again; not out of any advanced prompt engineering, but just because it's so anthropomorphized it feels natural. But I can assure you that "pandas count unique values column 'Foo'" is just as effective an LLM prompt as "Using pandas, how do I get the count of unique values in the column named 'Foo'?" The LLM is also not insulted by you talking to it like this.

2. Don't know when to stop using the LLM. Rather than let the LLM take you 80% of the way there and then handle the remaining 20% "manually", they'll keep trying to prompt to get the LLM to generate what they want. Sometimes this works, but often it's just a waste of time and it's far more efficient to just take the LLM output and adjust it manually.

Much like so-called Google-fu, LLM usage is a skill and people who don't know what they're doing are going to get substandard results.

replies(6): >>44523635 #>>44523674 #>>44523721 #>>44523782 #>>44524509 #>>44528152 #

3. Jaxan ◴[10 Jul 25 17:52 UTC] No.44523635[source]▶

>>44523608 #

> Effective LLM usage is not as straightforward as people believe

It is not as straightforward as people are told to believe!

replies(1): >>44524217 #

4. narush ◴[10 Jul 25 17:52 UTC] No.44523638[source]▶

>>44523442 (TP) #

Hey Simon -- thanks for the detailed read of the paper - I'm a big fan of your OS projects!

Noting a few important points here:

1. Some prior studies that find speedup do so with developers that have similar (or less!) experience with the tools they use. In other words, the "steep learning curve" theory doesn't differentially explain our results vs. other results.

2. Prior to the study, 90+% of developers had reasonable experience prompting LLMs. Before we found slowdown, this was the only concern that most external reviewers had about experience was about prompting -- as prompting was considered the primary skill. In general, the standard wisdom was/is Cursor is very easy to pick up if you're used to VSCode, which most developers used prior to the study.

3. Imagine all these developers had a TON of AI experience. One thing this might do is make them worse programmers when not using AI (relatable, at least for me), which in turn would raise the speedup we find (but not because AI was better, but just because with AI is much worse). In other words, we're sorta in between a rock and a hard place here -- it's just plain hard to figure out what the right baseline should be!

4. We shared information on developer prior experience with expert forecasters. Even with this information, forecasters were still dramatically over-optimistic about speedup.

5. As you say, it's totally possible that there is a long-tail of skills to using these tools -- things you only pick up and realize after hundreds of hours of usage. Our study doesn't really speak to this. I'd be excited for future literature to explore this more.

In general, these results being surprising makes it easy to read the paper, find one factor that resonates, and conclude "ah, this one factor probably just explains slowdown." My guess: there is no one factor -- there's a bunch of factors that contribute to this result -- at least 5 seem likely, and at least 9 we can't rule out (see the factors table on page 11).

I'll also note that one really important takeaway -- that developer self-reports after using AI are overoptimistic to the point of being on the wrong side of speedup/slowdown -- isn't a function of which tool they use. The need for robust, on-the-ground measurements to accurately judge productivity gains is a key takeaway here for me!

(You can see a lot more detail in section C.2.7 of the paper ("Below-average use of AI tools") -- where we explore the points here in more detail.)

replies(8): >>44523675 #>>44523822 #>>44523929 #>>44524401 #>>44524561 #>>44530302 #>>44530524 #>>44530595 #

5. gedy ◴[10 Jul 25 17:55 UTC] No.44523674[source]▶

>>44523608 #

> Talk to the LLM like a human

Maybe the LLM doesn't strictly need it, but typing out does bring some clarity for the asker. I've found it helps a lot to catch myself - what am I even wanting from this?

6. simonw ◴[10 Jul 25 17:55 UTC] No.44523675[source]▶

>>44523638 #

Thanks for the detailed reply! I need to spend a bunch more time with this I think - above was initial hunches from skimming the paper.

replies(1): >>44523748 #

7. smokel ◴[10 Jul 25 17:58 UTC] No.44523720[source]▶

>>44523442 (TP) #

I notice that some people have become more productive thanks to AI tools, while others are not.

My working hypothesis is that people who are fast at scanning lots of text (or code for that matter) have a serious advantage. Being able to dismiss unhelpful suggestions quickly and then iterating to get to helpful assistance is key.

Being fast at scanning code correlates with seniority, but there are also senior developers who can write at a solid pace, but prefer to take their time to read and understand code thoroughly. I wouldn't assume that this kind of developer gains little profit from typical AI coding assistance. There are also juniors who can quickly read text, and possibly these have an advantage.

A similar effect has been around with being able to quickly "Google" something. I wouldn't be surprised if this is the same trait at work.

replies(3): >>44525268 #>>44525430 #>>44528530 #

8. frotaur ◴[10 Jul 25 17:58 UTC] No.44523721[source]▶

>>44523608 #

I'm not sure about your example about talking to LLMs. There is good reason to think that speaking to it like a human might produce better results, as that's what most of the training data is composed of.

I don't have any studies, but it eems to me reasonable to assume.

(Unlike google, where presumably it actually used keywords anyway)

replies(1): >>44523771 #

9. narush ◴[10 Jul 25 18:00 UTC] No.44523748{3}[source]▶

>>44523675 #

Sounds great. Looking forward to hearing more detailed thoughts -- my emails in the paper :)

10. onlyrealcuzzo ◴[10 Jul 25 18:00 UTC] No.44523749[source]▶

>>44523442 (TP) #

How were "experienced engineers" defined?

I've found AI to be quite helpful in pointing me in the right direction when navigating an entirely new code-base.

When it's code I already know like the back of my hand, it's not super helpful, other than maybe doing a few automated tasks like refactoring, where there have already been some good tools for a while.

replies(1): >>44524529 #

11. furyofantares ◴[10 Jul 25 18:02 UTC] No.44523765[source]▶

>>44523442 (TP) #

> My personal theory is that getting a significant productivity boost from LLM assistance and AI tools has a much steeper learning curve than most people expect.

I totally agree with this. Although also, you can end up in a bad spot even after you've gotten pretty good at getting the AI tools to give you good output, because you fail to learn the code you're producing well.

A developer gets better at the code they're working on over time. An LLM gets worse.

You can use an LLM to write a lot of code fast, but if you don't pay enough attention, you aren't getting any better at the code while the LLM is getting worse. This is why you can get like two months of greenfield work done in a weekend but then hit a brick wall - you didn't learn anything about the code that was written, and while the LLM started out producing reasonable code, it got worse until you have a ball of mud that neither the LLM nor you can effectively work on.

So a really difficult skill in my mind is continually avoiding temptation to vibe. Take a whole week to do a month's worth of features, not a weekend to do two month's worth, and put in the effort to guide the LLM to keep producing clean code, and to be sure you know the code. You do want to know the code and you can't do that without putting in work yourself.

replies(4): >>44524369 #>>44524418 #>>44524779 #>>44525480 #

12. mjr00 ◴[10 Jul 25 18:02 UTC] No.44523771{3}[source]▶

>>44523721 #

> I'm not sure about your example about talking to LLMs. There is good reason to think that speaking to it like a human might produce better results, as that's what most of the training data is composed of.

In practice I have not had any issues getting information out of an LLM when speaking to them like a computer, rather than a human. At least not for factual or code-related information; I'm not sure how it impacts responses for e.g. creative writing, but that's not what I'm using them for anyway.

13. lukan ◴[10 Jul 25 18:03 UTC] No.44523782[source]▶

>>44523608 #

"But I can assure you that "pandas count unique values column 'Foo'" is just as effective an LLM prompt as "Using pandas, how do I get the count of unique values in the column named 'Foo'?""

How can you be so sure? Did you compare in a systematic way or read papers by people who did it?

Now I surely get results giving the llm only snippets and keywords, but anything complex, I do notice differences the way I articulate. Not claiming there is a significant difference, but it seems to me this way.

replies(1): >>44523882 #

14. paulmist ◴[10 Jul 25 18:08 UTC] No.44523822[source]▶

>>44523638 #

Were participants given time to customize their Cursor settings? In my experience tool/convention mismatch kills Cursor's productivity - once it gets going with a wrong library or doesn't use project's functions I will almost always reject code and re-prompt. But, especially for large projects, having a well-crafted repo prompt mitigates most of these issues.

15. mjr00 ◴[10 Jul 25 18:13 UTC] No.44523882{3}[source]▶

>>44523782 #

> How can you be so sure? Did you compare in a systematic way or read papers by people who did it?

No, but I didn't need to read scientific papers to figure how to use Google effectively, either. I'm just using a results-based analysis after a lot of LLM usage.

replies(2): >>44523986 #>>44524159 #

16. Uehreka ◴[10 Jul 25 18:16 UTC] No.44523923[source]▶

>>44523442 (TP) #

> My personal theory is that getting a significant productivity boost from LLM assistance and AI tools has a much steeper learning curve than most people expect.

You hit the nail on the head here.

I feel like I’ve seen a lot of people trying to make strong arguments that AI coding assistants aren’t useful. As someone who uses and enjoys AI coding assistants, I don’t find this research angle to be… uh… very grounded in reality?

Like, if you’re using these things, the fact that they are useful is pretty irrefutable. If one thinks there’s some sort of “productivity mirage” going on here, well OK, but to demonstrate that it might be better to start by acknowledging areas where they are useful, and show that your method explains the reality we’re seeing before using that method to show areas where we might be fooling ourselves.

I can maybe buy that AI might not be useful for certain kinds of tasks or contexts. But I keep pushing their boundaries and they keep surprising me with how capable they are, so it feels like it’ll be difficult to prove otherwise in a durable fashion.

replies(3): >>44523987 #>>44524004 #>>44524627 #

17. jdp23 ◴[10 Jul 25 18:17 UTC] No.44523929[source]▶

>>44523638 #

Really interesting paper, and thanks for the followon points.

The over-optimism is indeed a really important takeaway, and agreed that it's not tool-dependent.

18. lukan ◴[10 Jul 25 18:24 UTC] No.44523986{4}[source]▶

>>44523882 #

Well, I did needed some tutorials to use google efficently in the old days when + meant something specific.

19. TechDebtDevin ◴[10 Jul 25 18:24 UTC] No.44523987[source]▶

>>44523923 #

Still odd to me that the only vibe coded software that gets aquired are by companies selling tools or want to promote vibe coding.

replies(2): >>44524011 #>>44524086 #

20. grey-area ◴[10 Jul 25 18:25 UTC] No.44524005[source]▶

>>44523442 (TP) #

Well, there are two possible interpretations here of 75% of participants (all of whom had some experience using LLMs) being slower using generative AI:

LLMs have a v. steep and long learning curve as you posit (though note the points from the paper authors in the other reply).

Current LLMs just are not as good as they are sold to be as a programming assistant and people consistently predict and self-report in the wrong direction on how useful they are.

replies(6): >>44524525 #>>44524552 #>>44525186 #>>44525216 #>>44525303 #>>44526981 #

21. furyofantares ◴[10 Jul 25 18:25 UTC] No.44524004[source]▶

>>44523923 #

I think the thing is there IS a learning curve, AND there is a productivity mirage, AND they are immensely useful, AND it is context dependent. All of this leads to a lot of confusion when communicating with people who are having a different experience.

replies(2): >>44524032 #>>44524133 #

22. furyofantares ◴[10 Jul 25 18:26 UTC] No.44524011{3}[source]▶

>>44523987 #

That's not odd. These things are incredibly useful and vibe coding mostly sucks.

23. GoatInGrey ◴[10 Jul 25 18:29 UTC] No.44524032{3}[source]▶

>>44524004 #

It always comes back to nuance!

24. bgwalter ◴[10 Jul 25 18:29 UTC] No.44524033[source]▶

>>44523442 (TP) #

We have heard variations of that narrative for at least a year now. It is not hard to use these chatbots and no one who was very productive in open source before "AI" has any higher output now.

Most people who subscribe to that narrative have some connection to "AI" money, but there might be some misguided believers as well.

25. Uehreka ◴[10 Jul 25 18:34 UTC] No.44524086{3}[source]▶

>>44523987 #

Pardon my caps, but WHO CARES about acquisitions?!

You’ve been given a dubiously capable genie that can write code without you having to do it! If this thing can build first drafts of those side projects you always think about and never get around to, that in and of itself is useful! If it can do the yak-shaving required to set up those e2e tests you know you should have but never have time for it is useful!

Have it try out all the dumb ideas you have that might be cool but don’t feel worth your time to boilerplate out!

I like to think we’re a bunch of creative people here! Stop thinking about how it can make you money and use it for fun!

replies(2): >>44524512 #>>44527679 #

26. Uehreka ◴[10 Jul 25 18:39 UTC] No.44524133{3}[source]▶

>>44524004 #

Right, my problem is that while some people may be correct about the productivity mirage, many of those people are getting out over their skis and making bigger claims than they can reasonably prove. I’m arguing that they should be more nuanced and tactical.

27. skybrian ◴[10 Jul 25 18:42 UTC] No.44524159{4}[source]▶

>>44523882 #

Other people don't have benefit of your experience, though, so there's a communications gap here: this boils down to "trust me, bro."

How do we get beyond that?

replies(1): >>44524439 #

28. bc1000003 ◴[10 Jul 25 18:45 UTC] No.44524181[source]▶

>>44523442 (TP) #

"My intiution is that..." - AGREED.

I've found that there are a couple of things you need to do to be very efficient.

- Maintain an architecture.md file (with AI assistance) that answers many of the questions and clarifies a lot of the ambiguity in the design and structure of the code.

- A bootstrap.md file(s) is also useful for a lot of tasks.. having the AI read it and start with a correct idea about the subject is useful and a time saver for a variety of kinds of tasks.

- Regularly asking the AI to refactor code, simplify it, modularize it - this is what the experienced dev is for. VIBE coding generally doesn't work as AI's tend to write messy non-modular code unless you tell them otherwise. But if you review code, ask for specific changes.. they happily comply.

- Read the code produced, and carefully review it. And notice and address areas where there are issues, have the AI fix all of these.

- Take over when there are editing tasks you can do more efficiently.

- Structure the solution/architecture in ways that you know the AI will work well with.. things it knows about.. it's general sweet spots.

- Know when to stop using the AI and code it yourself.. particuarly when the AI has entered the confusion doom loop. Wasting time trying to get the AI to figure out what it's never going to is best used just fixing it yourself.

- Know when to just not ever try to use AI. Intuitively you know there's just certain code you can't trust the AI to safely work on. Don't be a fool and break your software.

----

I've found there's no guarantee that AI assistance will speed up any one project (and in some cases slow it down).. but measured cross all tasks and projects, the benefits are pretty substantial. That's probably others experience at this point too.

29. ericmcer ◴[10 Jul 25 18:47 UTC] No.44524199[source]▶

>>44523442 (TP) #

Looking at the example tasks in the pdf ("Sentencize wrongly splits sentence with multiple...") these look like really discrete and well defined bug fixes. AI should smash tasks like that so this is even less hopeful.

30. sleepybrett ◴[10 Jul 25 18:48 UTC] No.44524217{3}[source]▶

>>44523635 #

^ this, so much this. The amount of bullshit that gets shoveled into hacker news threads about the supposed capabilities of these models is epic.

31. danieldk ◴[10 Jul 25 19:04 UTC] No.44524369[source]▶

>>44523765 #

So a really difficult skill in my mind is continually avoiding temptation to vibe.

I agree. I have found that I can use agents most effectively by letting it write code in small steps. After each step I do review of the changes and polish it up (either by doing the fixups myself or prompting). I have found that this helps me understanding the code, but also avoids that the model gets in a bad solution space or produces unmaintainable code.

I also think this kind of close-loop is necessary. Like yesterday I let an LLM write a relatively complex data structure. It got the implementation nearly correct, but was stuck, unable to find an off-by-one comparison. In this case it was easy to catch because I let it write property-based tests (which I had to fix up to work properly), but it's easy for things to slip through the cracks if you don't review carefully.

(This is all using Cursor + Claude 4.)

32. gojomo ◴[10 Jul 25 19:07 UTC] No.44524401[source]▶

>>44523638 #

Did each developer do a large enough mix of AI/non-AI tasks, in varying orders, that you have any hints in your data whether the "AI penalty" grew or shrunk over time?

replies(1): >>44524452 #

33. bluefirebrand ◴[10 Jul 25 19:10 UTC] No.44524418[source]▶

>>44523765 #

> Take a whole week to do a month's worth of features

Everything else in your post is so reasonable and then you still somehow ended up suggesting that LLMs should be quadrupling our output

replies(1): >>44524826 #

34. mjr00 ◴[10 Jul 25 19:10 UTC] No.44524439{5}[source]▶

>>44524159 #

This is the gap between capability (what can this tool do?) versus workflow (what is the best way to use this tool to accomplish a goal?). Capabilities can be strictly evaluated, but workflow is subjective. Saying "Google has the site: and before: operators" is capability, saying "you should use site:reddit.com before:2020 in Google queries" is workflow.

LLMs have made the distinction ambiguous because their capabilities are so poorly understood. When I say "you should talk to an LLM like it's a computer", that's a workflow statement; it's a more efficient way to accomplish the same goal. You can try it for yourself and see if you agree. I personally liken people who talk to LLMs in full, proper English, capitalization and all, to boomers who still type in full sentences when running a Google query. Is there anything strictly wrong with it? Not really. Do I believe it's a more efficient workflow to just type the keywords that will give you the same result? Yes.

Workflow efficiencies can't really be scientifically evaluated. Some people still prefer to have desktop icons for programs on Windows; my workflow is pressing winkey -> typing the first few characters of the program -> enter. Is one of these methods scientifically more correct? Not really.

So, yeah -- eventually you'll either find your own workflow or copy the workflow of someone you see who is using LLMs effectively. It really is "just trust me, bro."

replies(1): >>44525681 #

35. narush ◴[10 Jul 25 19:12 UTC] No.44524452{3}[source]▶

>>44524401 #

You can see this analysis in the factor analysis of "Below-average use of AI tools" (C.2.7) in the paper [1], which we mark as an unclear effect.

TLDR: over the first 8 issues, developers do not appear to get majorly less slowed down.

[1] https://metr.org/Early_2025_AI_Experienced_OS_Devs_Study.pdf

replies(1): >>44525325 #

36. bit1993 ◴[10 Jul 25 19:17 UTC] No.44524509[source]▶

>>44523608 #

> Rather than let the LLM take you 80% of the way there and then handle the remaining 20% "manually"

IMO 80% is way too much, LLMs are probably good for things that are not your domain knowledge and you can efford to not be 100% correct, like rendering the Mandelbrot set, simple functions like that.

LLMs are not deterministic sometimes they produce correct code and other times they produce wrong code. This means one has to audit LLM generated code and auditing code takes more effort than writing it, especially if you are not the original author of the code being audited.

Code has to be 100% deterministic. As programmers we write code, detailed instructions for the computer (CPU), we have developed allot of tools such as Unit Tests to make sure the computer does exactly what we wrote.

A codebase has allot of context that you gain by writing the code, some things just look wrong and you know exactly why because you wrote the code, there is also allot of context that you should keep in your head as you write the code, context that you miss from simply prompting an LLM.

37. fwip ◴[10 Jul 25 19:17 UTC] No.44524512{4}[source]▶

>>44524086 #

Unfortunately, HN is YC-backed, and attracts these types by design.

replies(1): >>44525264 #

38. rafaelmn ◴[10 Jul 25 19:17 UTC] No.44524515[source]▶

>>44523442 (TP) #

>My personal theory is that getting a significant productivity boost from LLM assistance and AI tools has a much steeper learning curve than most people expect.

Are we are still selling the "you are an expert senior developer" meme ? I can completely see how once you are working on a mature codebase LLMs would only slow you down. Especially one that was not created by an LLM and where you are the expert.

replies(1): >>44524598 #

39. Terr_ ◴[10 Jul 25 19:18 UTC] No.44524525[source]▶

>>44524005 #

> people consistently predict and self-report in the wrong direction

I recall an adage about work-estimation: As chunks get too big, people unconsciously substitute "how possible does the final outcome feel" with "how long will the work take to do."

People asked "how long did it take" could be substituting something else, such as "how alone did I feel while working on it."

replies(1): >>44524653 #

40. smj-edison ◴[10 Jul 25 19:19 UTC] No.44524529[source]▶

>>44523749 #

> To directly measure the real-world impact of AI tools on software development, we recruited 16 experienced developers from large open-source repositories (averaging 22k+ stars and 1M+ lines of code) that they’ve contributed to for multiple years.

41. ◴[10 Jul 25 19:19 UTC] No.44524530[source]▶

>>44523442 (TP) #

42. steveklabnik ◴[10 Jul 25 19:21 UTC] No.44524552[source]▶

>>44524005 #

> Current LLMs

One thing that happened here is that they aren't using current LLMs:

> Most issues were completed in February and March 2025, before models like Claude 4 Opus or Gemini 2.5 Pro were released.

That doesn't mean this study is bad! In fact, I'd be very curious to see it done again, but with newer models, to see if that has an impact.

replies(1): >>44524740 #

43. amirhirsch ◴[10 Jul 25 19:22 UTC] No.44524561[source]▶

>>44523638 #

Figure 6 which breaks-down the time spent doing different tasks is very informative -- it suggest: 15% less active coding 5% less testing, 8% less research and reading

4% more idle time 20% more AI interaction time

The 28% less coding/testing/research is why developers reported 20% less work. You might be spending 20% more time overall "working" while you are really idle 5% more time and feel like you've worked less because you were drinking coffee and eating a sandwich between waiting for the AI and reading AI output.

I think the AI skill-boost comes from having work flows that let you shave half that git-ops time, cut an extra 5% off coding, but cut the idle/waiting and do more prompting of parallel agents and a bit more testing then you really are a 2x dev.

replies(2): >>44524677 #>>44527158 #

44. dmezzetti ◴[10 Jul 25 19:22 UTC] No.44524566[source]▶

>>44523442 (TP) #

I'm the developer of txtai, a fairly popular open-source project. I don't use any AI-generated code and it's not integrated into my workflows at the moment.

AI has a lot of potential but it's way over-hyped right now. Listen to the people on the ground who are doing real work and building real projects, none of them are over-hyping it. It's mostly those who have tangentially used LLMs.

It's also not surprising that many in this thread are clinging to a basic premise that it's 3 steps backwards to go 5 steps forward. Perhaps that is true but I'll take the study at face value, it seems very plausible to me.

45. bicx ◴[10 Jul 25 19:26 UTC] No.44524598[source]▶

>>44524515 #

I think it depends on the kind of work you're doing, but I use it on mature codebases where I am the expert, and I heavily delegate to Claude Code. By being knowledgeable of the codebase, I know exactly how to specify a task I need performed. I set it to work on one task, then I monitor it while personally starting on other work.

I think LLMs shine when you need to write a higher volume of code that extends a proven pattern, quickly explore experiments that require a lot of boilerplate, or have multiple smaller tasks that you can set multiple agents upon to parallelize. I've also had success in using LLMs to do a lot of external documentation research in order to integrate findings into code.

If you are fine-tuning an algorithm or doing domain-expert-level tweaks that require a lot of contextual input-output expert analysis, then you're probably better off just coding on your own.

Context engineering has been mentioned a lot lately, but it's not a meme. It's the real trick to successful LLM agent usage. Good context documentation, guides, and well-defined processes (just like with a human intern) will mean the difference between success and failure.

46. rcruzeiro ◴[10 Jul 25 19:28 UTC] No.44524627[source]▶

>>44523923 #

Exactly. The people who say that these assistants are useless or "not good enough" are basically burying their heads in the sand. The people who claim that there is no mirage are burying their head in the sand as well...

47. mnky9800n ◴[10 Jul 25 19:28 UTC] No.44524631[source]▶

>>44523442 (TP) #

I feel like I get better at it as I use Claude code more because I both understand its strength and weaknesses and also understand what context it’s usually missing. Like today I was struggling to debug an issue and realised that Claude’s idea of a coordinate system was 90 degrees rotated from mine and thus it was getting confused because I was confusing it.

replies(1): >>44524662 #

48. sandinmyjoints ◴[10 Jul 25 19:31 UTC] No.44524653{3}[source]▶

>>44524525 #

That’s an interesting adage. Any ideas of its source?

replies(1): >>44524715 #

49. throwawayoldie ◴[10 Jul 25 19:31 UTC] No.44524662[source]▶

>>44524631 #

One of the major findings is that people's perception--that is, what it felt like--was incorrect.

50. amirhirsch ◴[10 Jul 25 19:33 UTC] No.44524677{3}[source]▶

>>44524561 #

i just realized the figure is showing the time breakdown as a percentage of total time, it would be more useful to show absolute time (hours) for those side-by-side comparisons since the implied hours would boost the AI bars height by 18%

replies(1): >>44524915 #

51. Dilettante_ ◴[10 Jul 25 19:45 UTC] No.44524715{4}[source]▶

>>44524653 #

It might have been in Kahneman's "Thinking, Fast and Slow"

replies(1): >>44524803 #

52. blibble ◴[10 Jul 25 19:47 UTC] No.44524740{3}[source]▶

>>44524552 #

> One thing that happened here is that they aren't using current LLMs

I've been hearing this for 2 years now

the previous model retroactively becomes total dogshit the moment a new one is released

convenient, isn't it?

replies(10): >>44524758 #>>44524891 #>>44524893 #>>44524975 #>>44525030 #>>44525035 #>>44526195 #>>44526545 #>>44526712 #>>44535270 #

53. simonw ◴[10 Jul 25 19:49 UTC] No.44524758{4}[source]▶

>>44524740 #

The previous model retroactively becomes not as good as the best available models. I don't think that's a huge surprise.

replies(2): >>44524856 #>>44525150 #

54. WD-42 ◴[10 Jul 25 19:52 UTC] No.44524779[source]▶

>>44523765 #

I feel the same way. I use it for super small chunks, still understand everything it outputs, and often manually copy/paste or straight up write myself. I don't know if I'm actually faster before, but it feels more comfy than alt-tabbing to stack overflow, which is what I feel like it's mostly replaced.

Poor stack overflow, it looks like they are the ones really hurting from all this.

55. Terr_ ◴[10 Jul 25 19:54 UTC] No.44524803{5}[source]▶

>>44524715 #

I'm not sure, but something involving Kahneman et al. seems very plausible: The relevant term is probably "Attribute Substitution."

https://en.wikipedia.org/wiki/Attribute_substitution

56. furyofantares ◴[10 Jul 25 19:56 UTC] No.44524826{3}[source]▶

>>44524418 #

I'm specifically talking about greenfield work. I do a lot of game prototypes, it definitely does that at the very beginning.

replies(2): >>44524954 #>>44525479 #

57. cwillu ◴[10 Jul 25 19:58 UTC] No.44524856{5}[source]▶

>>44524758 #

The surprise is the implication that the crossover between net-negative and net-positive impact happened to be in the last 4 months, in light of the initial release 2 years ago and sufficient public attention for a study to be funded and completed.

Yes, it might make a difference, but it is a little tiresome that there's always a “this is based on a model that is x months old!” comment, because it will always be true: an academic study does not get funded, executed, written up, and published in less time.

replies(1): >>44525066 #

58. pdabbadabba ◴[10 Jul 25 20:02 UTC] No.44524891{4}[source]▶

>>44524740 #

Maybe it's convenient. But isn't it also just a fact that some of the models available today are better than the ones available five months ago?

replies(2): >>44524999 #>>44525074 #

59. steveklabnik ◴[10 Jul 25 20:02 UTC] No.44524893{4}[source]▶

>>44524740 #

Sorry, that’s not my take. I didn’t think these tools were useful until the latest set of models, that is, they crossed the threshold of usefulness to me.

Even then though, “technology gets better over time” shouldn’t be surprising, as it’s pretty common.

replies(2): >>44525041 #>>44525078 #

60. narush ◴[10 Jul 25 20:04 UTC] No.44524915{4}[source]▶

>>44524677 #

There's additional breakdown per-minute in the appendix -- see appendix E.4!

61. devin ◴[10 Jul 25 20:05 UTC] No.44524931[source]▶

>>44523442 (TP) #

It seems really surprising to me that anyone would call 50 hours of experience a "high skill ceiling".

62. bluefirebrand ◴[10 Jul 25 20:08 UTC] No.44524954{4}[source]▶

>>44524826 #

Greenfield is still such a tiny percentage of all software work going on in the world though :/

replies(2): >>44525006 #>>44525398 #

63. jstummbillig ◴[10 Jul 25 20:10 UTC] No.44524975{4}[source]▶

>>44524740 #

Convenient for whom and what...? There is nothing tangible to gain from you believing or not believing that someone else does (or does not) get a productivity boost from AI. This is not a religion and it's not crypto. The AI users' net worth is not tied to another ones use of or stance on AI (if anything, it's the opposite).

More generally, the phenomenon this is quite simply explained and nothing surprising: New things improve, quickly. That does not mean that something is good or valuable but it's how new tech gets introduced every single time, and readily explains changing sentiment.

replies(3): >>44525177 #>>44525199 #>>44525836 #

64. bryanrasmussen ◴[10 Jul 25 20:11 UTC] No.44524999{5}[source]▶

>>44524891 #

sure, but after having spent some time trying to get anything useful - programmatically - out of previous models and not getting anything once a new one is announced how much time should one spend.

Sure you may end up missing out on a good thing and then having to come late to the party, but coming early to the party too many times and the beer is watered down and the food has grubs is apt to make you cynical the next time a party announcement comes your way.

replies(1): >>44525321 #

65. furyofantares ◴[10 Jul 25 20:12 UTC] No.44525006{5}[source]▶

>>44524954 #

I agree, that's fair. I think a lot of people are playing around with AI on side projects and making some bad extrapolations from their initial experiences.

It'll also apply to isolated-enough features, which is still a small amount of someone's work (not often something you'd work on for a full month straight), but more people will have experience with this.

replies(1): >>44525331 #

66. cfst ◴[10 Jul 25 20:14 UTC] No.44525030{4}[source]▶

>>44524740 #

The current batch of models, specifically Claude Sonnet and Opus 4, are the first I've used that have actually been more helpful than annoying on the large mixed-language codebases I work in. I suspect that dividing line differs greatly between developers and applications.

67. nalllar ◴[10 Jul 25 20:14 UTC] No.44525035{4}[source]▶

>>44524740 #

If you interact with internet comments and discussions as an amorphous blob of people you'll see a constant trickle of the view that models now are useful, and before were useless.

If you pay attention to who says it, you'll find that people have different personal thresholds for finding llms useful, not that any given person like steveklabnik above keeps flip-flopping on their view.

This is a variant on the goomba fallacy: https://englishinprogress.net/gen-z-slang/goomba-fallacy-exp...

68. mattmanser ◴[10 Jul 25 20:15 UTC] No.44525041{5}[source]▶

>>44524893 #

Do you really see a massive jump?

For context, I've been using AI, a mix of OpenAi + Claude, mainly for bashing out quick React stuff. For over a year now. Anything else it's generally rubbish and slower than working without. Though I still use it to rubber duck, so I'm still seeing the level of quality for backend.

I'd say they're only marginally better today than they were even 2 years ago.

Every time a new model comes out you get a bunch of people raving how great the new one is and I honestly can't really tell the difference. The only real difference is reasoning models actually slowed everything down, but now I see its reasoning. It's only useful because I often spot it leaving out important stuff from the final answer.

replies(5): >>44525090 #>>44525193 #>>44525866 #>>44526601 #>>44531993 #

69. Ntrails ◴[10 Jul 25 20:17 UTC] No.44525066{6}[source]▶

>>44524856 #

Some of it is just that (probably different) people said the same damn things 6 months ago.

"No, the 2.8 release is the first good one. It massively improves workflows"

Then, 6 months later, the study comes out.

"Ah man, 2.8 was useless, 3.0 really crossed the threshold on value add"

At some point, you roll your eyes and assume it is just snake oil sales

replies(2): >>44525328 #>>44525336 #

70. Terr_ ◴[10 Jul 25 20:18 UTC] No.44525074{5}[source]▶

>>44524891 #

That's not the issue. Their complaint is that proponents keep revising what ought to be fixed goalposts... Well, fixed unless you believe unassisted human developers are also getting dramatically better at their jobs every year.

Like the boy who cried wolf, it'll eventually be true with enough time... But we should stop giving them the benefit of the doubt.

_____

Jan 2025: "Ignore last month's models, they aren't good enough to show a marked increase in human productivity, test with this month's models and the benefits are obvious."

Feb 2025: "Ignore last month's models, they aren't good enough to show a marked increase in human productivity, test with this month's models and the benefits are obvious."

Mar 2025: "Ignore last month's models, they aren't good enough to show a marked increase in human productivity, test with this month's models and the benefits are obvious."

Apr 2025: [Ad nauseam, you get the idea]

replies(1): >>44525557 #

71. ipaddr ◴[10 Jul 25 20:18 UTC] No.44525078{5}[source]▶

>>44524893 #

Wait until the next set. You will find you the previous ones weren't useful after all.

replies(1): >>44525215 #

72. hombre_fatal ◴[10 Jul 25 20:19 UTC] No.44525090{6}[source]▶

>>44525041 #

I see a massive jump every time.

Just two years ago, this failed.

> Me: What language is this: "esto está escrito en inglés"

> LLM: English

Gemini and Opus have solved questions that took me weeks to solve myself. And I'll feed some complex code into each new iteration and it will catch a race condition I missed even with testing and line by line scrutiny.

Consider how many more years of experience you need as a software engineer to catch hard race conditions just from reading code than someone who couldn't do it after trying 100 times. We take it for granted already since we see it as "it caught it or it didn't", but these are massive jumps in capability.

73. keeda ◴[10 Jul 25 20:24 UTC] No.44525142[source]▶

>>44523442 (TP) #

> My personal theory is that getting a significant productivity boost from LLM assistance and AI tools has a much steeper learning curve than most people expect.

Yes, and I'll add that there is likely no single "golden workflow" that works for everybody, and everybody needs to figure it out for themselves. It took me months to figure out how to be effective with these tools, and I doubt my approach will transfer over to others' situations.

For instance, I'm working solo on smallish, research-y projects and I had the freedom to structure my code and workflows in a way that works best for me and the AI. Briefly: I follow an ad-hoc, pair-programming paradigm, fluidly switching between manual coding and AI-codegen depending on an instinctive evaluation of whether a prompt would be faster. This rapid manual-vs-prompt assessment is second nature to me now, but it took me a while to build that muscle.

I've not worked with coding agents, but I doubt this approach will transfer over well to them.

I've said it before, but this is technology that behaves like people, and so you have to approach it like working with a colleague, with all their quirks and fallibilities and potentially-unbound capabilities, rather than a deterministic, single-purpose tool.

I'd love to see a follow-up of the study where they let the same developers get more familiar with AI-assisted coding for a few months and repeat the experiment.

replies(1): >>44525488 #

74. foobarqux ◴[10 Jul 25 20:24 UTC] No.44525150{5}[source]▶

>>44524758 #

That's not the argument being made though, which is that it does "work" now and implying that actually it didn't quite work before; except that that is the same thing the same people say for every model release, including at the time or release of the previous one, which is now acknowledged to be seriously flawed; and including the future one, at which time the current models will similarly be acknowledged to be, not only less performant that the future models, but inherently flawed.

Of course it's possible that at some point you get to a model that really works, irrespective of the history of false claims from the zealots, but it does mean you should take their comments with a grain of salt.

replies(1): >>44525369 #

75. card_zero ◴[10 Jul 25 20:26 UTC] No.44525177{5}[source]▶

>>44524975 #

I saw that edit. Indeed you can't predict that rejecting a new thing is part of a routine of being wrong. It's true that "it's strange and new, therefore I hate it" is a very human (and adorable) instinct, but sometimes it's reasonable.

replies(2): >>44525559 #>>44530847 #

76. burnte ◴[10 Jul 25 20:27 UTC] No.44525186[source]▶

>>44524005 #

> Current LLMs just are not as good as they are sold to be as a programming assistant and people consistently predict and self-report in the wrong direction on how useful they are.

I would argue you don't need the "as a programming assistant" phrase as right now from my experience over the past 2 years, literally every single AI tool is massively oversold as to its utility. I've literally not seen a single one that delivers on what it's billed as capable of.

They're useful, but right now they need a lot of handholding and I don't have time for that. Too much fact checking. If I want a tool I always have to double check, I was born with a memory so I'm already good there. I don't want to have to fact check my fact checker.

LLMs are great at small tasks. The larger the single task is, or the more tasks you try to cram into one session, the worse they fall apart.

replies(1): >>44526422 #

77. steveklabnik ◴[10 Jul 25 20:27 UTC] No.44525193{6}[source]▶

>>44525041 #

Yes. In January I would have told you AI tools are bullshit. Today I’m on the $200/month Claude Max plan.

As with anything, your miles may vary: I’m not here to tell anyone that thinks they still suck that their experience is invalid, but to me it’s been a pretty big swing.

replies(2): >>44525395 #>>44526058 #

78. grey-area ◴[10 Jul 25 20:28 UTC] No.44525199{5}[source]▶

>>44524975 #

Honestly the hype cycle feels very like crypto, and just like crypto prominent vcs have a lot of money riding on the outcome.

replies(2): >>44525236 #>>44525632 #

79. steveklabnik ◴[10 Jul 25 20:29 UTC] No.44525215{6}[source]▶

>>44525078 #

This makes no sense to me. I’m well aware that I’m getting value today, that’s not going to change in the future: it’s already happened.

Sure they may get even more useful in the future but that doesn’t change my present.

80. atiedebee ◴[10 Jul 25 20:30 UTC] No.44525216[source]▶

>>44524005 #

Let me bring you a third (not necessarily true) interpretation:

The developer who has experience using cursor saw a productivity increase not because he became better at using cursor, but because he became worse at not using it.

replies(2): >>44525343 #>>44530391 #

81. steveklabnik ◴[10 Jul 25 20:32 UTC] No.44525236{6}[source]▶

>>44525199 #

I agree with you, and I think that’s coloring a lot of people’s perceptions. I am not a crypto fan but am an LLM fan.

Every hype cycle feels like this, and some of them are nonsense and some of them are real. We’ll see.

82. Uehreka ◴[10 Jul 25 20:35 UTC] No.44525264{5}[source]▶

>>44524512 #

I mean sure, but HN/YC’s founder was always going on about the kinship between “Hackers and Painters” (or at least he used to). It hasn’t always been like this, and definitely doesn’t have to be. We can and should aspire to better.

83. luxpir ◴[10 Jul 25 20:35 UTC] No.44525268[source]▶

>>44523720 #

Just to thank you for that point. I think it's likely more true than most of us realise. That and maybe the ability to mentally scaffold or outline a system or solution ahead of time.

84. robwwilliams ◴[10 Jul 25 20:38 UTC] No.44525303[source]▶

>>44524005 #

Or a sampling artifact. 4 vs 12 does seem significant within a study, but consider a set of N such studies.

I assume that many large companies have tested efficiency gains and losses of there programmers much more extensively than the authors of this tiny study.

A survey of companies and their evaluation and conclusions would carry more weight—-excluding companies selling AI products, of course.

replies(1): >>44526370 #

85. Terr_ ◴[10 Jul 25 20:39 UTC] No.44525321{6}[source]▶

>>44524999 #

Plus it's not even possible to miss the metaphorical party: If it gets going, it will be quite obvious long before it peaks.

(Unless one believes the most grandiose prophecies of a technological-singularity apocalypse, that is.)

86. gojomo ◴[10 Jul 25 20:39 UTC] No.44525325{4}[source]▶

>>44524452 #

Thanks, that's great!

But: if all developers did 136 AI-assisted issues, why only analyze excluding the 1st 8, rather than, say, the first 68 (half)?

replies(1): >>44525838 #

87. Filligree ◴[10 Jul 25 20:40 UTC] No.44525328{7}[source]▶

>>44525066 #

Or you accept that different people have different skill levels, workflows and goals, and therefore the AIs reach usability at different times.

replies(1): >>44530511 #

88. lurking_swe ◴[10 Jul 25 20:40 UTC] No.44525331{6}[source]▶

>>44525006 #

greenfield development is also the “easiest” and most fun part of software development. As the famous saying goes, the last 10% of the project takes 90% of the time lol.

I’ve also noticed that, generally, nobody likes maintaining old systems.

so where does this leave us as software engineers? Should I be excited that it’s easy to spin up a bunch of code that I don’t deeply understand at the beginning of my project, while removing the fun parts of the project?

I’m still grappling with what this means for our industry in 5-10 years…

89. steveklabnik ◴[10 Jul 25 20:40 UTC] No.44525336{7}[source]▶

>>44525066 #

There’s a lot of confounding factors here. For example, you could point to any of these things in the last ~8 months as being significant changes:

* the release of agentic workflow tools

* the release of MCPs

* the release of new models, Claude 4 and Gemini 2.5 in particular

* subagents

* asynchronous agents

All or any of these could have made for a big or small impact. For example, I’m big on agentic tools, skeptical of MCPs, and don’t think we yet understand subagents. That’s different from those who, for example, think MCPs are the future.

> At some point, you roll your eyes and assume it is just snake oil sales

No, you have to realize you’re talking to a population of people, and not necessarily the same person. Opinions are going to vary, they’re not literally the same person each time.

There are surely snake oil salesman, but you can’t buy anything from me.

replies(1): >>44534117 #

90. card_zero ◴[10 Jul 25 20:41 UTC] No.44525343{3}[source]▶

>>44525216 #

Or, one person in 16 has a particular personality, inclined to LLM dependence.

replies(2): >>44525736 #>>44526281 #

91. steveklabnik ◴[10 Jul 25 20:43 UTC] No.44525369{6}[source]▶

>>44525150 #

> That's not the argument being made though, which is that it does "work" now and implying that actually it didn't quite work before

Right.

> except that that is the same thing the same people say for every model release,

I did not say that, no.

I am sure you can find someone who is in a Groundhog Day about this, but it’s just simpler than that: as tools improve, more people find them useful than before. You’re not talking to the same people, you are talking to new people each time who now have had their threshold crossed.

replies(1): >>44525598 #

92. Uehreka ◴[10 Jul 25 20:46 UTC] No.44525395{7}[source]▶

>>44525193 #

> In January I would have told you AI tools are bullshit. Today I’m on the $200/month Claude Max plan.

Same. For me the turning point was VS Code’s Copilot Agent mode in April. That changed everything about how I work, though it had a lot of drawbacks due to its glitches (many of these were fixed within 6 or so weeks).

When Claude Sonnet 4 came out in May, I could immediately tell it was a step-function increase in capability. It was the first time an AI, faced with ambiguous and complicated situations, would be willing to answer a question with a definitive and confident “No”.

After a few weeks, it became clear that VS Code’s interface and usage limits were becoming the bottleneck. I went to my boss, bullet points in hand, and easily got approval for the Claude Max $200 plan. Boom, another step-function increase.

We’re living in an incredibly exciting time to be a skilled developer. I understand the need to stay skeptical and measure the real benefits, but I feel like a lot of people are getting caught up in the culture war aspect and are missing out on something truly wonderful.

93. Filligree ◴[10 Jul 25 20:46 UTC] No.44525398{5}[source]▶

>>44524954 #

It’s a tiny percentage of software work because the programming is slow, and setting up new projects is even slower.

It’s been a majority of my projects for the past two months. Not because work changed, but because I’ve written a dozen tiny, personalised tools that I wouldn’t have written at all if I didn’t have Claude to do it.

Most of them were completed in less than an hour, to give you an idea of the size. Though it would have easily been a day on my own.

94. Filligree ◴[10 Jul 25 20:48 UTC] No.44525430[source]▶

>>44523720 #

An interesting point. I wonder how much my decades-old habit of watching subtitled anime helps there—it’s definitely made me dramatically faster at scanning text.

95. ummonk ◴[10 Jul 25 20:50 UTC] No.44525453[source]▶

>>44523442 (TP) #

Devil's advocate: it's also possible the one developer hasn't become more productive with Cursor, but rather has atrophied their non-AI productivity due to becoming reliant on Cursor.

replies(1): >>44527920 #

96. Dzugaru ◴[10 Jul 25 20:52 UTC] No.44525479{4}[source]▶

>>44524826 #

This is really interesting, because I do gamejams from time to time - and I try every time to make it work, but I'm still quite a lot faster doing stuff myself.

This is visible under extreme time pressure of producing a working game in 72 hours (our team scores consistenly top 100 in Ludum Dare which is a somewhat high standard).

We use a popular Unity game engine all LLMs have wealth of experience (as in game development in general), but the output is 80% so strangely "almost correct but not usable" that I cannot take the luxury of letting it figure it out, and use it as fancy autocomplete. And I also still check docs and Stackoverflow-style forums a lot, because of stuff it plainly mades up.

One of the reasons is maybe our game mechanics often is a bit off the beaten road, though the last game we made was literally a platformer with rope physics (LLM could not produce a good idea how to make stable and simple rope physics under our constraints codeable in 3 hours time).

97. jona777than ◴[10 Jul 25 20:52 UTC] No.44525480[source]▶

>>44523765 #

> but then hit a brick wall

This is my intuition as well. I had a teammate use a pretty good analogy today. He likened vibe coding to vacuuming up a string in four tries when it only takes one try to reach down and pick it up. I thought that aligned well with my experience with LLM assisted coding. We have to vacuum the floor while exercising the "difficult skill [of] continually avoiding temptation to vibe"

98. Filligree ◴[10 Jul 25 20:54 UTC] No.44525488[source]▶

>>44525142 #

> I've not worked with coding agents, but I doubt this approach will transfer over well to them.

Actually, it works well so long as you tell them when you’ve made a change. Claude gets confused if things randomly change underneath it, but it has no trouble so long as you give it a short explanation.

99. pdabbadabba ◴[10 Jul 25 21:00 UTC] No.44525557{6}[source]▶

>>44525074 #

Fair enough. For what it's worth, I've always thought that the more reasonable claim is that AI tools make poor-average developers more productive, not necessarily expert developers.

replies(1): >>44526668 #

100. jstummbillig ◴[10 Jul 25 21:00 UTC] No.44525559{6}[source]▶

>>44525177 #

"I saw that edit" lol

replies(1): >>44525611 #

101. thesz ◴[10 Jul 25 21:02 UTC] No.44525579[source]▶

>>44523442 (TP) #

  > My personal theory is that getting a significant productivity boost from LLM assistance and AI tools has a much steeper learning curve than most people expect.

This is what I heard about strong type systems (especially Haskell's) about 20-15 years ago.

"History does not repeat, but it rhymes."

If we rhyme "strong types will change the world" with "agentic LLMs will change the world," what do we get?

My personal theory is that we will get the same: some people will get modest-to-substantial benefits there, but changes in the world will be small if noticeable at all.

replies(2): >>44525751 #>>44525928 #

102. blibble ◴[10 Jul 25 21:04 UTC] No.44525598{7}[source]▶

>>44525369 #

> You’re not talking to the same people, you are talking to new people each time who now have had their threshold crossed.

no, it's the same names, again and again

replies(1): >>44525880 #

103. Aurornis ◴[10 Jul 25 21:05 UTC] No.44525605[source]▶

>>44523442 (TP) #

> A quarter of the participants saw increased performance, 3/4 saw reduced performance.

The study used 246 tasks across 16 developers, for an average of 15 tasks per developer. Divide that further in half because tasks were assigned as AI or not-AI assisted, and the sample size per developer is still relatively small. Someone would have to take the time to review the statistics, but I don’t think this is a case where you can start inferring that the developers who benefited from AI were just better at using AI tools than those who were not.

I do agree that it would be interesting to repeat a similar test on developers who have more AI tool assistance, but then there is a potential confounding effect that AI-enthusiastic developers could actually lose some of their practice in writing code without the tools.

replies(1): >>44527923 #

104. card_zero ◴[10 Jul 25 21:05 UTC] No.44525611{7}[source]▶

>>44525559 #

Sorry, just happened to. Slightly rude of me.

replies(1): >>44525716 #

105. jstummbillig ◴[10 Jul 25 21:08 UTC] No.44525632{6}[source]▶

>>44525199 #

Of course, lot's of hype, but my point is that the reason why is very different and it matters: As an early bc adopter making your believe in bc is super important to my net worth (and you not believing in bc makes me look like an idiot and lose a lot of money).

In contrast, what do I care if you believe in code generation AI? If you do, you are probably driving up pricing. I mean, I am sure that there are people that care very much, but there is little inherent value for me in you doing so, as long as the people who are building the AI are making enough profit to keep it running.

With regards to the VCs, well, how many VCs are there in the world? How many of the people who have something good to say about AI are likely VCs? I might be off by an order of magnitude, but even then it would really not be driving the discussion.

replies(1): >>44525865 #

106. skybrian ◴[10 Jul 25 21:12 UTC] No.44525681{6}[source]▶

>>44524439 #

Maybe it would help if more people wrote tutorials? It doesn't seem reasonable for people who don't have a buddy to learn from to have to figure it out on their own.

107. jstummbillig ◴[10 Jul 25 21:16 UTC] No.44525716{8}[source]▶

>>44525611 #

Ah, you do you. It's just a fairly kindergarten thing to point out and not something I was actively trying to hide. Whatever it was.

Generally, I do a couple of edits for clarity after posting and reading again. Sometimes that involves removing something that I feel could have been said better. If it does not work, I will just delete the comment. Whatever it was must not have been a super huge deal (to me).

replies(1): >>44527940 #

108. runarberg ◴[10 Jul 25 21:19 UTC] No.44525736{4}[source]▶

>>44525343 #

Invoking personality is to the behavioral science as invoking God is to the natural sciences. One can explain anything by appealing to personality, and as such it explains nothing. Psychologists have been trying to make sense of personality for over a century without much success (the best efforts so far have been a five factor model [Big 5] which has ultimately pretty minor predictive value), which is why most behavioral scientists have learned to simply leave personality to the philosophers and concentrate on much simpler theoretical framework.

A much simpler explanation is what your parent offered. And to many behavioralists it is actually the same explanation, as to a true scotsm... [cough] behavioralist personality is simply learned habits, so—by Occam’s razor—you should omit personality from your model.

replies(2): >>44525860 #>>44528801 #

109. ruszki ◴[10 Jul 25 21:21 UTC] No.44525751[source]▶

>>44525579 #

Maybe it depends on the task. I’m 100% sure, that if you think that type system is a drawback, then you have never code in a diverse, large codebase. Our 1.5 million LOC 30 years old monolith would be completely unmaintainable without it. But seriously, anything without a formal type system above 10 LOC after a few years is unmaintainable. An informal is fine for a while, but not long for sure. On a 30 years old code, basically every single informal rules are broken.

Also, my long experience is that even in PoC phase, using a type system adds almost zero extra time… of course if you know the type system, which should be trivial in any case after you’ve seen a few.

replies(2): >>44529397 #>>44529495 #

110. th0ma5 ◴[10 Jul 25 21:29 UTC] No.44525830[source]▶

>>44523442 (TP) #

Simon's opinion is unsurprisingly that people need to read his blog and spam on every story on HN lest we be left behind.

111. leshow ◴[10 Jul 25 21:30 UTC] No.44525836{5}[source]▶

>>44524975 #

I think you're missing the broader context. There is a lot of people very invested in the maximalist outcome which does create pressure for people to be boosters. You don't need a digital token for that to happen. There's a social media aspect as well that creates a feedback loop about claims.

We're in a hype cycle, and it means we should be extra critical when evaluating the tech so we don't get taken in by exaggerated claims.

replies(1): >>44526326 #

112. narush ◴[10 Jul 25 21:30 UTC] No.44525838{5}[source]▶

>>44525325 #

Sorry, this is the first 8 issues per-developer!

113. card_zero ◴[10 Jul 25 21:33 UTC] No.44525860{5}[source]▶

>>44525736 #

Fair comment, but I'm not down with behavioralism, and people have personalities, regrettably.

replies(1): >>44526101 #

114. leshow ◴[10 Jul 25 21:34 UTC] No.44525865{7}[source]▶

>>44525632 #

I don't find that a compelling argument, lots of people get taken in by hype cycles even when they don't profit directly from it.

115. simonw ◴[10 Jul 25 21:34 UTC] No.44525866{6}[source]▶

>>44525041 #

The massive jump in the last six months is that the new set of "reasoning" models got really good at reasoning about when to call tools, and were accompanied is by a flurry of tools-in-loop coding agents - Claude Code, OpenAI Codex, Cursor in Agent mode etc.

An LLM that can test the code it is writing and then iterate to fix the bugs turns out to be a huge step forward from LLMs that just write code without trying to then exercise it.

116. simonw ◴[10 Jul 25 21:36 UTC] No.44525880{8}[source]▶

>>44525598 #

Got receipts?

That sounds like a claim you could back up with a little bit of time spent using Hacker News search or similar.

(I might try to get a tool like o3 to run those searches for me.)

replies(1): >>44526026 #

117. eightysixfour ◴[10 Jul 25 21:37 UTC] No.44525887[source]▶

>>44523442 (TP) #

I have been teaching people at my company how to use AI code tools, the learning curve is way worse for developers and I have had to come up with some exercises to try and breakthrough the curve. Some seemingly can’t get it.

The short version is that devs want to give instructions instead of ask for what outcome they want. When it doesn’t follow the instructions, they double down by being more precise, the worst thing you can do. When non devs don’t get what they want, they add more detail to the description of the desired outcome.

Once you get past the control problem, then you have a second set of issues for devs where the things that should be easy or hard don’t necessarily map to their mental model of what is easy or hard, so they get frustrated with the LLM when it can’t do something “easy.”

Lastly, devs keep a shit load of context in their head - the project, what they are working on, application state, etc. and they need to do that for LLMs too, but you have to repeat themselves often and “be” the external memory for the LLM. Most devs I have taught hate that, they actually would rather have it the other way around where they get help with context and state but want to instruct the computer on their own.

Interestingly, the best AI assisted devs have often moved to management/solution architecture, and they find the AI code tools brought back some of the love of coding. I have a hypothesis they’re wired a bit differently and their role with AI tools is actually closer to management than it is development in a number of ways.

replies(2): >>44526796 #>>44527055 #

118. leshow ◴[10 Jul 25 21:40 UTC] No.44525928[source]▶

>>44525579 #

I don't think that's a fair comparison. Type systems don't produce probabilistic output. Their entire purpose is to reduce the scope of possible errors you can write. They kind of did change the world, didn't they? I mean, not everyone is writing Haskell but Rust exists and it's doing pretty well. There was also not really a case to be made where type systems made software in general _worse_. But you could definitely make the case that LLM's might make software worse.

replies(2): >>44526616 #>>44529347 #

119. heavyset_go ◴[10 Jul 25 21:49 UTC] No.44526005[source]▶

>>44523442 (TP) #

Any "tricks" you learn for one model may not be applicable to another, it isn't a given that previous experience with a company's product will increase the likelihood of productivity increases. When models change out from under you, the heuristics you've built up might be useless.

120. blibble ◴[10 Jul 25 21:52 UTC] No.44526026{9}[source]▶

>>44525880 #

try asking it what sealioning is

replies(1): >>44527616 #

121. mattmanser ◴[10 Jul 25 21:54 UTC] No.44526058{7}[source]▶

>>44525193 #

Ok, I'll have to try it out then. I've got a side project I've 3/4 finished and will let it loose on it.

So are you using Claude Code via the max plan, Cursor, or what?

I think I'd definitely hit AI news exhaustion and was viewing people raving about this agentic stuff as yet more AI fanbois. I'd just continued using the AI separate as setting up a new IDE seemed like too much work for the fractional gains I'd been seeing.

replies(3): >>44526143 #>>44528857 #>>44536836 #

122. runarberg ◴[10 Jul 25 21:59 UTC] No.44526101{6}[source]▶

>>44525860 #

This is still ultimately a research within the field of the behavior sciences, and as such the laws of human behavior apply, where behaviorism offers a far more successful theoretical framework than personality psychology.

Nobody is denying that people have personalities btw. Not even true behavioralists do that, they simply argue from reductionism that personality can be explained with learning contingencies and the reinforcement history. Very few people are true behavioralists these days though, but within the behavior sciences, scientists are much more likely to borrow missing factors (i.e. things that learning contingencies fail to explain) from fields such as cognitive science (or even further to neuroscience) and (less often) social science.

What I am arguing here, however, is that the appeal to personality is unnecessary when explaining behavior.

As for figuring out what personality is, that is still within the realm of philosophy. Maybe cognitive science will do a better job at explaining it than psychometricians have done for the past century. I certainly hope so, it would be nice to have a better model of human behavior. But I think even if we could explain personality, it still wouldn’t help us here. At best we would be in a similar situation as physics, where one model can explain things traveling at the speed of light, while another model can explain things at the sub-atomic scale, but the two models cannot be applied together.

123. steveklabnik ◴[10 Jul 25 22:02 UTC] No.44526143{8}[source]▶

>>44526058 #

I had a bad time with Cursor. I use Claude Code inside of VS: Code. You don't necessarily need Max, but you can spend a lot of money very quickly on API tokens, so I'd recommend to anyone trying, start with the $20/month one, no need to spend a ton of money just to try something out.

There is a skill gap, like, I think of it like vim: at first it slows you down, but then as you learn it, you end up speeding up. So you may also find that it doesn't really vibe with the way you work, even if I am having a good time with it. I know people who are great engineers who still don't like this stuff, just like I know ones that do too.

replies(1): >>44527419 #

124. bix6 ◴[10 Jul 25 22:07 UTC] No.44526195{4}[source]▶

>>44524740 #

Everything actually got better. Look at the image generation improvements as an easily visible benchmark.

I do not program for my day job and I vibe coded two different web projects. One in twenty mins as a test with cloudflare deployment having never used cloudflare and one in a week over vacation (and then fixed a deep safari bug two weeks later by hammering the LLM). These tools massively raise the capabilities for sub-average people like me and decrease the time / brain requirements significantly.

I had to make a little update to reset the KV store on cloudflare and the LLM did it in 20s after failing the syntax twice. I would’ve spent at least a few minutes looking it up otherwise.

125. cutemonster ◴[10 Jul 25 22:15 UTC] No.44526281{4}[source]▶

>>44525343 #

Didn't they rather mean:

Developers' own skills might atrophy, when they don't write that much code themselves, relying on AI instead.

And now when comparing with/without AI they're faster with. But a year ago they might have been that fast or faster without an AI.

I'm not saying that that's how things are. Just pointing out another way to interpret what GP said

126. jstummbillig ◴[10 Jul 25 22:20 UTC] No.44526326{6}[source]▶

>>44525836 #

I mostly don't agree. Yes, there is always social pressure with these things, and we are in a hype cycle, but the people "buying in" are simply not doing much at all. They are mostly consumers, waiting for the next model, which they have no control over or stake in creating (by and large).

The people not buying into the hype, on the other hands, are actually the ones that have a very good reason to be invested, because if they turn out to be wrong they might face some very uncomfortable adjustments in the job landscape and a lot of the skills that they worked so hard to gain and believed to be valuable.

As always, be weary of any claims, but the tension here is very much the reverse of crypto and I don't think that's very appreciated.

127. rs186 ◴[10 Jul 25 22:25 UTC] No.44526370{3}[source]▶

>>44525303 #

If you use binomial test, P(X<=4) is about 0.105 which means p = 0.21.

128. ◴[10 Jul 25 22:31 UTC] No.44526422{3}[source]▶

>>44525186 #

129. Aeolun ◴[10 Jul 25 22:46 UTC] No.44526545{4}[source]▶

>>44524740 #

It’s true though? Previous models could do well in specifically created settings. You can throw practically everything at Opus, and it’ll work mostly fine.

130. vidarh ◴[10 Jul 25 22:52 UTC] No.44526601{6}[source]▶

>>44525041 #

I've gone from asking the tools how to do things, and cut and pasting the bits (often small) that'd be helpful, via using assistants that I'd review every decision of and often having to start over, to now often starting an assistant with broad permissions and just reviewing the diff later, after they've made the changes pass the test suite, run a linter and fixed all the issues it brought up, and written a draft commit message.

The jump has been massive.

131. atlintots ◴[10 Jul 25 22:55 UTC] No.44526616{3}[source]▶

>>44525928 #

Its too bad the management people never pushed Haskell as hard as they're pushing AI today! Alas.

132. bluefirebrand ◴[10 Jul 25 23:01 UTC] No.44526668{7}[source]▶

>>44525557 #

Personally I don't want poor-average developers to be more productive, I want them to be more expert

replies(2): >>44527605 #>>44527673 #

133. mwigdahl ◴[10 Jul 25 23:06 UTC] No.44526712{4}[source]▶

>>44524740 #

I've been a proponent for a long time, so I certainly fit this at least partially. However, the combination of Claude Code and the Claude 4 models has pushed the response to my demos of AI coding at my org from "hey, that's kind of cool" to "Wow, can you get me an API key please?"

It's been a very noticeable uptick in power, and although there have been some nice increases with past model releases, this has been both the largest and the one that has unlocked the most real value since I've been following the tech.

replies(1): >>44526752 #

134. achierius ◴[10 Jul 25 23:12 UTC] No.44526752{5}[source]▶

>>44526712 #

Is that really the case vs. 3.7? For me that was the threshold, and since then the improvements have been nice but not as significant.

replies(1): >>44526968 #

135. BigGreenJorts ◴[10 Jul 25 23:19 UTC] No.44526796[source]▶

>>44525887 #

> Interestingly, the best AI assisted devs have often moved to management/solution architecture, and they find the AI code tools brought back some of the love of coding. I have a hypothesis they’re wired a bit differently and their role with AI tools is actually closer to management than it is development in a number of ways.

The CTO and VPEng at my company (very small, still do technical work occasionally) both love the agent stuff so much. Part of it for them is that it gives them the opportunity to do technical work again with the limited time they have. Without having to distract an actual dev, or spend a long time reading through the codebase, they can quickly get context for an build small items themselves.

136. mwigdahl ◴[10 Jul 25 23:49 UTC] No.44526968{6}[source]▶

>>44526752 #

I would agree with you that the jump from Sonnet 3.7 to Sonnet 4 feels notable but not shocking. Opus 4 is considerably better, and Opus 4 combined with the Claude Code harness is what really unlocks the value for me.

137. giantg2 ◴[10 Jul 25 23:51 UTC] No.44526981[source]▶

>>44524005 #

The third option is that the person who used Cursor before had some sort of skill atrophy that led to lower unassisted speed.

I think an easy measure to help identify why a slow down is happening would be to measure how much refactoring happened on the AI generated code. Often times it seems to be missing stuff like error handling, or adds in unnecessary stuff. Of course this assumes it even had a working solution in the first place.

138. ivanovm ◴[10 Jul 25 23:53 UTC] No.44526996[source]▶

>>44523442 (TP) #

I find the very popular response of "you're just not using it right" to be big copout for LLMs, especially at the scale we see today. It's hard to think of any other major tech product where it's acceptable to shift so much blame on the user. Typically if a user doesn't find value in the product, we agree that the product is poorly designed/implemented, not that the user is bad. But AI seems somehow exempt from this sentiment

replies(15): >>44527074 #>>44527365 #>>44527386 #>>44527577 #>>44527623 #>>44527723 #>>44527868 #>>44528270 #>>44528322 #>>44529356 #>>44529649 #>>44530908 #>>44532696 #>>44533993 #>>44537674 #

139. rester324 ◴[11 Jul 25 00:02 UTC] No.44527055[source]▶

>>44525887 #

> Interestingly, the best AI assisted devs have often moved to management/solution architecture, and they find the AI code tools brought back some of the love of coding

This suggests me though that they are bad at coding, otherwise they would have stayed longer. And I can't find anything in your comment that would corroborate the opposite. So what gives?

I am not saying what you say is untrue, but you didn't give any convincing arguments to us to believe otherwise.

Also, you didn't define the criteria of getting better. Getting better in terms of what exactly???

replies(2): >>44527914 #>>44528565 #

140. viraptor ◴[11 Jul 25 00:05 UTC] No.44527074[source]▶

>>44526996 #

> It's hard to think of any other major tech product where it's acceptable to shift so much blame on the user.

It's completely normal in development. How many years of programming experience you need for almost any language? How many days/weeks you need to use debuggers effectively? How long from the first contact with version control until you get git?

I think it's the opposite actually - it's common that new classes of tools in tech need experience to use well. Much less if you're moving to something different within the same class.

replies(5): >>44527384 #>>44528359 #>>44528413 #>>44529459 #>>44530955 #

141. viraptor ◴[11 Jul 25 00:18 UTC] No.44527158{3}[source]▶

>>44524561 #

> You might be spending 20% more time overall "working" while you are really idle 5% more time and feel like you've worked less because you were drinking coffee and eating a sandwich between waiting for the AI and reading AI output.

This is going to be interesting long-term. Realistically people don't spend anywhere close to 100% of time working and they take breaks after intense periods of work. So the real benefit calculation needs to include: outcome itself, time spent interacting with the app, overlap of tasks while agents are running, time spent doing work over a long period of time, any skill degradation, LLM skills, etc. It's going to take a long time before we have real answers to most of those, much less their interactions.

142. Lerc ◴[11 Jul 25 00:51 UTC] No.44527365[source]▶

>>44526996 #

>It's hard to think of any other major tech product where it's acceptable to shift so much blame on the user.

Is that perhaps because of the nature of the category of 'tech peoduct'. In other domains, this certainly isn't the case. Especially if the goal is to get the best result instead of the optimum output/effort balance.

Musical instruments are a clear case where the best results are down to the user. Most crafts are similar. There is the proverb "A bad craftsman blames his tools" that highlights that there are entire fields where the skill of the user is considered to be the most important thing.

When a product is aimed at as many people as the marketers can find, that focus on individual ability is lost and the product targets the lowest common denominator.

They are easier to use, but less capable at their peak. I think of the state of LLMs analogous to home computing at a stage of development somewhere around Altair to TRS-80 level. These are the first ones on the scene, people are exploring what they are good for, how they work, and sometimes putting them to effective use in new and interesting ways. It's not unreasonable to expect a degree of expertise at this stage.

The LLM equivalent of a Mac will come, plenty of people will attempt to make one before it's ready. There will be a few Apple Newtons along the way that will lead people to say the entire notion was foolhardy. Then someone will make it work. That's when you can expect to use something without expertise. We're not there yet.

143. AndrewKemendo ◴[11 Jul 25 00:52 UTC] No.44527368[source]▶

>>44523442 (TP) #

What you described has been true of the adoption of every technology ever

Nothing new this time except for people who have no vision and no ability to work hard not “getting it” because they don’t have the cognitive capacity to learn

144. Avshalom ◴[11 Jul 25 00:55 UTC] No.44527384{3}[source]▶

>>44527074 #

Linus did not show up in front of congress talking about how dangerously powerful unregulated version control was to the entirety of human civilization a year before he debuted Git and charged thousands a year to use it.

replies(2): >>44527519 #>>44527585 #

145. edmundsauto ◴[11 Jul 25 00:55 UTC] No.44527386[source]▶

>>44526996 #

New technologies that require new ways of thinking are always this way. "Google-fu" was literally a hirable career skill in 2004 because nobody knew how to search to get optimal outcomes. They've done alright improving things since then - let's see how good Cursor is in 10 years.

146. mh- ◴[11 Jul 25 01:02 UTC] No.44527419{9}[source]▶

>>44526143 #

Worth noting for the folks asking: there's an official Claude Code extension for VS Code now [0]. I haven't tried it personally, but that's mostly because I mainly use the terminal and vim.

[0]: https://marketplace.visualstudio.com/items?itemName=anthropi...

replies(1): >>44527989 #

147. rukuu001 ◴[11 Jul 25 01:11 UTC] No.44527465[source]▶

>>44523442 (TP) #

I'm sympathetic to the argument re experience with the tools paying off, because my personal anecdata matches that. It hasn't been until the last 6 weeks, after watching a friend demo their workflow, that my personal efficiency has improved dramatically.

The most useful thing of all would have been to have screen recordings of those 16 developers working on their assigned issues, so they could be reviewed for varying approaches to AI-assisted dev, and we could be done with this absurd debate once and for all.

148. viraptor ◴[11 Jul 25 01:21 UTC] No.44527519{4}[source]▶

>>44527384 #

Ok. You seem to be taking about a completely different issue of regulation.

149. sanderjd ◴[11 Jul 25 01:31 UTC] No.44527577[source]▶

>>44526996 #

> It's hard to think of any other major tech product where it's acceptable to shift so much blame on the user.

Maybe, but it isn't hard to think of developer tools where this is the case. This is the entire history of editor and IDE wars.

Imagine running this same study design with vim. How well would you expect the not-previously-experienced developers to perform in such a study?

replies(2): >>44528674 #>>44529374 #

150. sanderjd ◴[11 Jul 25 01:32 UTC] No.44527585{4}[source]▶

>>44527384 #

This seems like a non sequitur. What does this have to do with this thread?

replies(1): >>44529639 #

151. Terr_ ◴[11 Jul 25 01:36 UTC] No.44527605{8}[source]▶

>>44526668 #

"Compared to last quarter, we've shipped 40% more spaghetti-code!"

152. maxbond ◴[11 Jul 25 01:38 UTC] No.44527616{10}[source]▶

>>44526026 #

You've no obligation to answer, no one is entitled to your time, but it's a reasonable request. It's not sealioning to respectfully ask for directly relevant evidence that takes about 10-15m to get.

153. milchek ◴[11 Jul 25 01:38 UTC] No.44527623[source]▶

>>44526996 #

I think the reason for that is maybe you’re comparing to traditional products that are deterministic or have specific features that add value?

If my phone keeps crashing or if the browser is slow or clunky then yes, it’s not on me, it’s the phone, but an LLM is a lot more open ended in what it can do. Unlike the phone example above where I expect it to work from a simple input (turning it on) or action (open browser, punch in a url), what an LLM does is more complex and nuanced.

Even the same prompt from different users might result in different output - so there is more onus on the user to craft the right input.

Perhaps that’s why AI is exempt for now.

154. pdabbadabba ◴[11 Jul 25 01:48 UTC] No.44527673{8}[source]▶

>>44526668 #

Sure. But what would you suppose the ratio is between expert, average, and mediocre coders in the average organization? I think a small minority would be in the first category, and I don’t see a technology on the horizon that will change that except for LLMs, which seem like they could make mediocre coders both more productive and produce higher quality output.

replies(1): >>44528610 #

155. TechDebtDevin ◴[11 Jul 25 01:50 UTC] No.44527679{4}[source]▶

>>44524086 #

I have great code gen tools I've built for myself that build my perfect scaffolding/boilerplate every time, for any project in about 30 seconds.

Took me a week to build those tools. Its much more reliable (and flexible) than any LLM and cost me nothing.

It comes with secure Auth, email, admin, ect ect.. Doesn't cost me a dime and almost never has a common vulnerability.

Best part about it. I know how my side project runs.

156. ay ◴[11 Jul 25 01:58 UTC] No.44527723[source]▶

>>44526996 #

Just a few examples: Bicycle. Car(driving). Airplane(piloting). Welder. CNC machine. CAD.

All take quite an effort to master, until then they might slow one down or outright kill.

157. ◴[11 Jul 25 02:23 UTC] No.44527868[source]▶

>>44526996 #

158. eightysixfour ◴[11 Jul 25 02:32 UTC] No.44527914{3}[source]▶

>>44527055 #

> This suggests me though that they are bad at coding, otherwise they would have stayed longer.

Or they care about producing value, not just the code, and realized they had more leverage and impact in other roles.

> And I can't find anything in your comment that would corroborate the opposite.

I didn’t try and corroborate the opposite.

Honestly, I don’t care about the “best coders.” I care about people who do their job well, sometimes that is writing amazing code but most of the time it isn’t. I don’t have any devs in my company who work in a magical vacuum where they are handed perfectly written tasks, they complete them, and then they do the next one.

If I did, I could replace them with AI faster.

> Also, you didn't define the criteria of getting better. Getting better in terms of what exactly?

Delivery velocity - bug fixes, features, etc. that pass testing/QA and goes to prod.

replies(1): >>44528219 #

159. bluefirebrand ◴[11 Jul 25 02:33 UTC] No.44527920[source]▶

>>44525453 #

I suspect you're onto something here but I also think it would be an extremely dramatic atrophy to have occurred in such a short period of time...

160. bluefirebrand ◴[11 Jul 25 02:34 UTC] No.44527923[source]▶

>>44525605 #

> potential confounding effect that AI-enthusiastic developers could actually lose some of their practice in writing code without the tools

I don't think this is a confounding effect

This is something that we definitely need to measure and be aware of, if there is a risk of it

161. gexla ◴[11 Jul 25 02:37 UTC] No.44527935[source]▶

>>44523442 (TP) #

In addition to the learning curve of the tooling, there's also the learning curve of the models. Each have a certain personality that you have to figure out so that you can catch the failure patterns right away.

162. maxbond ◴[11 Jul 25 02:38 UTC] No.44527940{9}[source]▶

>>44525716 #

FYI there's a "delay" setting in your profile that allows you to make your comment invisible for up to ten minutes.

163. steveklabnik ◴[11 Jul 25 02:49 UTC] No.44527989{10}[source]▶

>>44527419 #

Yes, it’s not necessary but it is convenient for viewing diffs in Code’s diff view. The terminal is a fine way to interact with it though.

164. badsectoracula ◴[11 Jul 25 03:31 UTC] No.44528152[source]▶

>>44523608 #

> But I can assure you that "pandas count unique values column 'Foo'" is just as effective an LLM prompt as "Using pandas, how do I get the count of unique values in the column named 'Foo'?"

While the results are going to be similar, typing a question in full can help you think about it yourself too, as if the LLM is a rubber duck that can respond back.

I've found myself adjusting and rewriting prompts during the process of writing them before i ask the LLM anything because as i was writing the prompt i was thinking about the problem simultaneously.

Of course for simple queries like "write me a function in C that calculates the length of a 3d vector using vec3 for type" you can write it like "c function vec3 length 3d" or something like that instead and the LLM will give more or less the same response (tried it with Devstral).

But TBH to me that sounds like programmers using Vim claiming they're more productive than users of other editors because they have to use less keystrokes.

165. jprokay13 ◴[11 Jul 25 03:38 UTC] No.44528181[source]▶

>>44523442 (TP) #

My personal experience was that of a decrease in productivity until I spent significant time with it. Managing configurations, prompting it the right way, asking other models for code reviews… And I still see there is more I can unlock with more time learning the right interaction patterns.

For nasty, legacy codebases there is only so much you can do IMO. With green field (in certain domains), I become more confident every day that coding will be reduced to an AI task. I’m learning how to be a product manager / ideas guy in response

166. benreesman ◴[11 Jul 25 03:44 UTC] No.44528209[source]▶

>>44523442 (TP) #

I don't even think we know how to do it yet. I revise my whole attitude and all of my beliefs about this stuff every week: I figure out things that seemed really promising don't pan out, I find stuff that I kick myself for not realizing sooner, and it's still this high-stakes game. I still blow a couple of days and wish I had just done it the old-fashioned way, and then I'll catch a run where it's like, fuck, I was never that good, that's the last 5-10% that breaks a PB.

I very much think that these things are going to wind up being massive amplifiers for people who were already extremely sophisticated and then put massive effort into optimizing them and combining them with other advanced techniques (formal methods, top-to-bottom performance orientation).

I don't think this stuff is going to democratize software engineering at all, I think it's going to take the difficulty level so high that it's like back when Djikstra or Tony Hoare was a fairly typical computer programmer.

167. rester324 ◴[11 Jul 25 03:47 UTC] No.44528219{4}[source]▶

>>44527914 #

> Honestly, I don’t care about the “best coders.”

> Interestingly, the best AI assisted devs have often moved to management/solution architecture

Is it just me? Or does it seem to others as well that you pretty much rank these people even at the moment and your first comment contradicts your second comment? Especially when you admit that you rank them based on velocity.

I am not saying you shouldn't do that, but it feels to me like rating road construction workers on the number of potholes fixed, even though it's very possible that the potholes are caused by the sloppy work to begin with.

Not what I would want to do.

replies(1): >>44528847 #

168. Maxious ◴[11 Jul 25 03:58 UTC] No.44528270[source]▶

>>44526996 #

>It's hard to think of any other major tech product where it's acceptable to shift so much blame on the user.

Apple's Response to iPhone 4 Antenna Problem: You're Holding It Wrong https://www.wired.com/2010/06/iphone-4-holding-it-wrong/

replies(3): >>44528637 #>>44528869 #>>44529153 #

169. jeswin ◴[11 Jul 25 04:12 UTC] No.44528322[source]▶

>>44526996 #

Not every tool can be figured out in a day (or a week or more). That doesn't mean that the tool is useless, or that the user is incapable.

170. intended ◴[11 Jul 25 04:21 UTC] No.44528359{3}[source]▶

>>44527074 #

> LLMs, especially at the scale we see today

The OP qualifies how the marketing cycle for this product is beyond extreme, and its own category.

Normal people are being told to worry about AI ending the world, or all jobs disappearing.

Simply saying “the problem is the user”, without acknowledging the degree of hype, and expectation setting, the is irresponsible.

replies(1): >>44529050 #

171. blub ◴[11 Jul 25 04:33 UTC] No.44528413{3}[source]▶

>>44527074 #

It is completely typical, but at the same time abnormal to have tools with such poor usability.

A good debugger is very easy to use. I remember the Visual Studio debugger or the C++ debugger on Windows were a piece of cake 20 years ago, while gdb is still painful today. Java and .NET had excellent integrated debuggers while golang had a crap debugging story for so long that I don’t even use a debugger with it. In fact I almost never use debuggers any more.

Version control - same story. CVS for all its problems I had learned to use almost immediately and it had a GUI that was straightforward. git I still have to look up commands for in some cases. Literally all the good git UIs cost a non-trivial amount of money.

Programming languages are notoriously full of unnecessary complexity. Personal pet peeve: Rust lifetime management. If this is what it takes, just use GC (and I am - golang).

replies(3): >>44528882 #>>44529002 #>>44535140 #

172. blub ◴[11 Jul 25 04:56 UTC] No.44528530[source]▶

>>44523720 #

One has to take time to review code and think through different aspects of execution (like memory management, concurrency, etc). Plenty of code cannot be scanned.

That said, if the language has GC and other helpers, it makes it easier to scan.

Code and architecture review is an important part of my role and I catch issues that others miss because I spend more time. I did use AI for review (GPT 4.1), but only as an addition, since not reliable enough.

173. qingcharles ◴[11 Jul 25 05:05 UTC] No.44528565{3}[source]▶

>>44527055 #

I'm not bad at coding. I would say I'm pretty damned good. But coding is a means-to-an-end. I come up with an idea, then I have the long-winded middle bit where I have to write all the code, spin up a DB, create the tables, etc.

LLMs have given me a whole new love of coding, getting rid of the dull grind and letting me write code an order of magnitude quicker than before.

174. bluefirebrand ◴[11 Jul 25 05:14 UTC] No.44528610{9}[source]▶

>>44527673 #

They definitely aren't producing higher quality output imo, but definitely producing low quality output faster

That's not a tradeoff that I like

replies(1): >>44532838 #

175. wiether ◴[11 Jul 25 05:21 UTC] No.44528637{3}[source]▶

>>44528270 #

I don't see how the Antennagate can be qualified as "acceptable" since it caused a big public uproar and Apple had to settle a class action lawsuit.

https://www.businessinsider.com/apple-antennagate-scandal-ti...

replies(1): >>44528787 #

176. fingerlocks ◴[11 Jul 25 05:30 UTC] No.44528674{3}[source]▶

>>44527577 #

No one is claiming 10x perf gains in vim.

It’s just a fun geeky thing to use with a lot of zany customizations. And after two hellish years of memory muscling enough keyboard bindings to finally be productive, you earned it! It’s a badge of pride!

But we all know you’re still fat fingering ggdG on occasion and silently cursing to yourself.

replies(1): >>44529110 #

177. 8note ◴[11 Jul 25 05:54 UTC] No.44528787{4}[source]▶

>>44528637 #

it didnt end the iphone as a brand, or end smart phones altogether though.

how much did that uproar and settlement matter?

178. suddenlybananas ◴[11 Jul 25 05:57 UTC] No.44528801{5}[source]▶

>>44525736 #

Behaviorism is a relic of the 1950s

replies(1): >>44535677 #

179. eightysixfour ◴[11 Jul 25 06:07 UTC] No.44528847{5}[source]▶

>>44528219 #

> Is it just me? Or does it seem to others as well that you pretty much rank these people even at the moment and your first comment contradicts your second comment?

I think you are reading what you want to read and not what I said, so yes it is you. The most productive, valuable people with developer titles in my organizations are not the ones who write the cleanest, most beautiful, most perfect code. They do all of the other parts of the job well and write solid code.

Following the introduction of AI tools, many of the people in my organization who most effectively learned to use those tools are people who previously chose to move to manager and SA roles.

Not only are these not contradictory, they fit quite well together. People who do the things around coding well, but maybe have to work hard at writing the actual code, are better at using the AI tools than exceptional coders. For my organization, the former are generally more valuable than the latter without AI, and that is increasing as a result of AI.

> I am not saying you shouldn't do that, but it feels to me like rating road construction workers on the number of potholes fixed, even though it's very possible that the potholes are caused by the sloppy work to begin with.

Not if your measurement includes quality testing the pothole repairs, which mine does, as I explicitly called out. I work in industries with extensive, long testing cycles, we are (imperfectly, of course) able to measure productivity based on things which make it through those cycles.

You are trying very hard to find ways to ignore what I am saying. It is fine if you don’t want to believe me, but these things have been true based on our observations:

A. Great “coders” have a much harder time picking up AI dev tools and using them effectively, and when they see how others use them they will admit that isn’t how they use them. They will revert to their previous habits and give up on the tools.

B. The productivity gains for the people who are good at using the tools, as measured by velocity with a minimum bar for quality (with substantial QA), are very high.

C. We have measured these things to thoroughly understand the ROI and we are accelerating our investment in AI coding tools as a result.

Some caveats I am absolutely willing to make - we are not working on bleeding edge tech doing things no one has ever done before.

We failed to effectively use AI many times before we started to get it right.

There are developers who are slower with the AI code tools than without it.

replies(1): >>44528986 #

180. 8note ◴[11 Jul 25 06:10 UTC] No.44528857{8}[source]▶

>>44526058 #

id say thats not gonna be the best use for it, unless what you really want is to first document in detail everything about it.

im using claude + vscode's cline extension for the most part, but where it tends to excel is helping you write documentation, and then using that documentation to write reasonable code.

if you're 3/4 of the way done, a lot of the docs of what it wants to work well are gonna be missing, and so a lot of your intentions about why you did or didnt make certain choices will be missing. if you've got good docs, make sure to feed those in as context.

the agentic tool on its own is still kinda meh, if you only try to write code directly from it. definitely better than the non-agentic stuff, but if you start with trying to get it to document stuff, and ask you questions about what it should know in order to make the change its pretty good.

even if you dont get perfect code, or it spins in a feedback loop where its lost the plot, those questions it asks can be super handy in terms of code patterns that you havent thought about that apply to your code, and things that would usually be undefined behaviour.

my raving is that i get to leave behind useful docs in my code packages, and my team members get access to and use those docs, without the usual discoverability problems, and i get those docs for... somewhat slower than i could have written the code myself, but much much faster than if i also had to write those docs

181. davely ◴[11 Jul 25 06:13 UTC] No.44528869{3}[source]▶

>>44528270 #

Mobile phone manufacturers were telling users this long before the iPhone was ever invented.

e.g., Nokia 1600 user guide from 2005 (page 16) [0]

[0] https://www.instructionsmanuals.com/sites/default/files/2019...

182. pbasista ◴[11 Jul 25 06:15 UTC] No.44528882{4}[source]▶

>>44528413 #

> git I still have to look up commands for in some cases

I believe that this is okay. One does not need to know the details about every specific git command in order to be able to use it efficiently most of the time.

It is the same with a programming language. Most people are unfamiliar with every peculiarity of every standard library function that the language offers. And that is okay. It does not prevent them from using language efficiently most of the time.

Also in other aspects of life, it is unnecessary to know everything by memory. For example, one does not need to know how to e.g. replace a blade on a lawn mower. But that is okay. It does not prevent them from using it efficiently most of the time.

The point is that if something is done less often, it is unnecessary to remember the specifics of it. It is fine to look it up when needed.

183. rester324 ◴[11 Jul 25 06:35 UTC] No.44528986{6}[source]▶

>>44528847 #

I am not convinced.

If what you write was true, then the rate of bugs of those incredible devs would simply fall to zero at one point, and at that point they would become a legend who we all would have heard of by now. So the whole story sounds too fishy to my taste.

It's OK if you want to manage your team this way. Everyone needs some external feedback to confirm their own bias. It seems you found yours and it works for you.

It's just not a good argument in support of AI or AI assisted development.

It's too anecdotal.

And since you are the one who are telling me that you are right, and not others, it makes me even more skeptical about the whole story.

184. zingar ◴[11 Jul 25 06:37 UTC] No.44529002{4}[source]▶

>>44528413 #

Nitpick: magit for emacs is good enough for everyone whom I’ve seen talk about it describe as “the best git correct” and it is completely free.

185. nicman23 ◴[11 Jul 25 06:38 UTC] No.44529009[source]▶

>>44523442 (TP) #

i just treat ai as a very long auto complete. sometimes it surprises me. on things i do not know, like windows C calls, i think i ought to just search the documentation..

186. TeMPOraL ◴[11 Jul 25 06:47 UTC] No.44529050{4}[source]▶

>>44528359 #

AI marketing isn't extreme - not on the LLM vendor side, at least; the hype is generated downstream of it, for various reasons. And it's not the marketing that's saying "you're using it wrong" - it's other users. So, unless you believe everyone reporting good experience with LLMs is a paid shill, there might actually be some merit to it.

replies(4): >>44529194 #>>44529508 #>>44529573 #>>44538020 #

187. TeMPOraL ◴[11 Jul 25 06:58 UTC] No.44529110{4}[source]▶

>>44528674 #

> No one is claiming 10x perf gains in vim.

Sure they are - or at least were, unitl the last couple years. Same thing with Emacs.

It's hard to claim this now, because the entire industry shifted towards webshit and cloud-based practices across the board, and the classical editors just can't keep up with VS Code. Despite the latter introducing LSP, which leveled the playing field wrt. code intelligence itself, the surrounding development process and the ecosystem increasingly demands you use web-based or web-derived tools and practices, which all see a browser engine as a basic building block. Classical editors can't match the UX/DX on that, plus the whole thing breaks basic assumptions about UI that were the source of the "10x perf gains" in vim and Emacs.

Ironically, a lot of the perf gains from AI come from letting you avoid dealing with the brokenness of the current tools and processes, that vim and Emacs are not equipped to handle.

replies(3): >>44529753 #>>44534294 #>>44535648 #

188. TeMPOraL ◴[11 Jul 25 07:04 UTC] No.44529153{3}[source]▶

>>44528270 #

The important difference is that in your example, it was the manufacturer telling customers they're holding it wrong. With LLMs, the vendors say no such things - it's the actual users that are saying this to their peers.

189. carschno ◴[11 Jul 25 07:11 UTC] No.44529194{5}[source]▶

>>44529050 #

It's called grassroots marketing. It works particularly well in the context of GenAI because it is fed with esoteric and ideological fragments that overlap with common beliefs and political trends. https://en.wikipedia.org/wiki/TESCREAL

Therefore, classical marketing is less dominant, although more present at down-stream sellers.

replies(1): >>44529462 #

190. thesz ◴[11 Jul 25 07:38 UTC] No.44529347{3}[source]▶

>>44525928 #

That probabilistic output has to be symbolically constrained - SQL/JSON/other code is generated through syntax constrained beam search.

You brought up Rust, it is fascinating.

The Rust's type system differs from typical Hindle-Milner by having operations that can remove definitions from environment of the scope.

Rust was conceived in 2006.

In 2006 there already were HList papers by Oleg Kiselyov [1] that had shown how to keep type level key-value lists with addition, removal and lookup, and type-level stateful operations like in [2] were already possible, albeit, most probably, not with nice monadic syntax support.

  [1] https://okmij.org/ftp/Haskell/HList-ext.pdf
  [2] http://blog.sigfpe.com/2009/02/beyond-monads.html

It was entirely possible to have prototype Rust to be embedded into Haskell and have borrow checker implemented as type-level manipulation over double parameterized state monad.

But it was not, Rust was not embedded into Haskell and now it will never get effects (even as weak as monad transformers) and, as a consequence, will never get proper high performance software transactional memory.

So here we are: everything in Haskell's strong type system world that would make Rust better was there at the very beginning of the Rust journey, but had no impact on Rust.

Rhyme that with LLM.

191. DanielVZ ◴[11 Jul 25 07:40 UTC] No.44529356[source]▶

>>44526996 #

> It's hard to think of any other major tech product where it's acceptable to shift so much blame on the user.

Sorry to be pedantic but this is really common in tech products: vim, emacs, any second-brain app, effectiveness of IDEs depending on learning its features, git, and more.

replies(1): >>44529405 #

192. oytis ◴[11 Jul 25 07:42 UTC] No.44529374{3}[source]▶

>>44527577 #

What I like about IDE wars is that it remained a dispute between engineers. Some engineers like fancy pants IDEs and use them, some are good with vim and stick with that. No one ever assumed that Jetbrains autocomplete is going to replace me or that I am outdated for not using it - even if there might be a productivity cost associated with that choice.

193. thesz ◴[11 Jul 25 07:47 UTC] No.44529397{3}[source]▶

>>44525751 #

Contrarily I believe that strong type system is a plus. Please, look at my other comment: https://news.ycombinator.com/item?id=44529347

My original point was about history and about how can we extract possible outcome from it.

My other comment tries to amplify that too. Type systems were strong enough for several decades now, had everything Rust needed and more years before Rust began, yet they have little penetration into real world, example being that fancy dandy Rust language.

194. ndsipa_pomu ◴[11 Jul 25 07:49 UTC] No.44529405{3}[source]▶

>>44529356 #

Well, surely vim is easy to use - I started it and and haven't stopped using it yet (one day I'll learn how to exit)

195. KaiserPro ◴[11 Jul 25 07:56 UTC] No.44529459{3}[source]▶

>>44527074 #

> How many days/weeks you need to use debuggers effectively

I understand your point, but would counter with: gdb isn't marketed as a cuddly tool that can let anyone do anything.

196. TeMPOraL ◴[11 Jul 25 07:57 UTC] No.44529462{6}[source]▶

>>44529194 #

Right. Let's take a bunch of semi-related groups I don't like, and make up an acronym for them so any of my criticism can be applied to some subset of those groups in some form, thus making it seem legitimate and not just a bunch of half-assed strawman arguments.

Also, I guess you're saying I'm a paid shill, or have otherwise been brainwashed by marketing of the vendors, and therefore my positive experiences with LLMs are a lie? :).

I mean, you probably didn't mean that, but part of my point is that you see those positive reports here on HN too, from real people who've been in this community for a while and are not anonymous Internet users - you can't just dismiss that as "grassroot marketing".

replies(1): >>44530213 #

197. sfn42 ◴[11 Jul 25 08:02 UTC] No.44529495{3}[source]▶

>>44525751 #

It's generally trivial for conventional class-based type systems like those in Java and C#, but TypeScript is a different beast entirely. On the surface it seems similar but it's so much deeper than the others.

I don't like it. I know it is the way it is because it's supposed to support all the cursed weird stuff you can do in JS, but to me as a fullstack developer who's never really taken the time to deep dive and learn TS properly it often feels more like an obstacle. For my own code it's fine, but when I have to work with third party libraries it can be really confusing. It's definitely a skill issue though.

replies(1): >>44529778 #

198. intended ◴[11 Jul 25 08:04 UTC] No.44529508{5}[source]▶

>>44529050 #

It is extreme, and on the vendor side. The OpenAI non profit vs profit saga, was about profit seeking vs the future of humanity. People are talking about programming 3.0.

I can appreciate that it’s other users who are saying it’s wrong, but that doesn’t escape the point on ignoring the context.

Moreover, it’s unhelpful communication. Its gives up acknowledging a mutually shared context, the natural confusion that would arise from the ambiguous, high level hype, and the actual down to earth reality.

Even if you have found a way to make it work, having someone understand your workflow can’t happen without connecting the dots between their frame of reference and yours.

replies(1): >>44530477 #

199. OccamsMirror ◴[11 Jul 25 08:17 UTC] No.44529573{5}[source]▶

>>44529050 #

I think the relentless podcast blitz by OpenAI and Anthropic founders suggests otherwise. They're both keen to confirm that yes, in 5 - 10 years, no one will have any jobs any more. They're literally out there discussing a post employment world like it's an inevitability.

That's pretty extreme.

replies(2): >>44530181 #>>44538091 #

200. Avshalom ◴[11 Jul 25 08:28 UTC] No.44529639{5}[source]▶

>>44527585 #

It is completely reasonable to hold cursor/claude to a different standard than gdb or git.

replies(1): >>44530340 #

201. lmeyerov ◴[11 Jul 25 08:29 UTC] No.44529649[source]▶

>>44526996 #

It's a specialist tool. You wouldn't be surprised that it took awhile for someone to take a big to get at typed programming, parallel programming, docker, IaaC, etc. either.

We have 2 sibling teams, one the genAI devs and the other the regular GPU product devs. It is entirely unsurprising to me that the genAI developers are successfully using coding agents with long-running plans, while the GPU developers are still more at the level of chat-style back-and-forth.

At the same time, everyone sees the potential, and just like other automation movements, are investing in themselves and the code base.

202. lupusreal ◴[11 Jul 25 08:36 UTC] No.44529698[source]▶

>>44523442 (TP) #

A friend of mine, complete non-programmer, has been trying to use ChatGPT to write a phone app. I've been as hands off as I feel I can be, watching how the process goes for him. My observations so far is that it's not going well, he doesn't understand what questions he should be asking so the answers he's getting aren't useful. I encourage him to ask it to teach him the relevant programming but he asks it to help him make the app without programming at all.

With more coaching from me, which I might end up doing, I think he would get further. But I expected the chatbot to get him further through the process than this. My conclusion so far is that this technology won't meaningfully shift the balance of programmers to non-programmers in the general population.

203. fingerlocks ◴[11 Jul 25 08:43 UTC] No.44529753{5}[source]▶

>>44529110 #

Yeah I’m in my 40s and have been using vim for decades. Sure there was an occasional rando stirring up the forums about made-up productivity gains to get some traffic to their blog, but that was it. There has always been push back from many of the strongest vim advocates that the appeal is not about typing speed or whatever it was they were claiming. It’s just ergonomics and power.

It’s just not comparable to the LLM crazy hype train.

And to belabor your other point, I have treesitter, lsp, and GitHub Copilot agent all working flawlessly in neovim. Ts and lsp are neovim builtins now. And it’s custom built for exactly how I want it to be, and none of that blinking shit or nagging dialog boxes all over VSCode.

I have VScode and vim open to the same files all day quite literally side by side, because I work at Microsoft, share my screen often, and there are still people that have violent allergic reactions to a terminal and vim. Vim can do everything VSCode does and it’s not dogshit slow.

replies(1): >>44532161 #

204. ruszki ◴[11 Jul 25 08:47 UTC] No.44529778{4}[source]▶

>>44529495 #

I agree. Typescript is different for another reason too. They ignore edge cases many times, and because of that you can do really-really nice things with it (when it’s not broken). I wondered a lot of times why Java doesn’t include a few things which would be appropriate even in that world, and the answer is almost always because Java cares about edge cases. There are notes about those in Typescript’s doc or issues.

205. theshrike79 ◴[11 Jul 25 09:22 UTC] No.44530056[source]▶

>>44523442 (TP) #

LLMs are good for things you know how to do, but can't be arsed to. Like small tools with extensive use of random APIs etc.

For example I whipped together a Steam API -based tool that gets my game library and enriches it with data available in maybe 30 minutes of active work.

The LLM (Cursor with Gemini Pro + Claude 3.7 at the time IIRC) spent maybe 2-3 hours on it while I watched some shows on my main display and it worked on my second screen with me directing it.

Could I have done it myself from scratch like a proper artisan? Most definitely. Would I have bothered? Nope.

206. disgruntledphd2 ◴[11 Jul 25 09:36 UTC] No.44530181{6}[source]▶

>>44529573 #

Those billions won't raise themselves, you know.

More generally, these execs are talking their book as they're in a low margin capital intensive businesses whose future is entirely dependent on raising a bunch more money, so hype and insane claims are necessary for funding.

Now, maybe they do sortof believe it, but if so, why do they keep hiring software engineers and other staff?

207. carschno ◴[11 Jul 25 09:39 UTC] No.44530213{7}[source]▶

>>44529462 #

> I mean, you probably didn't mean that

Correct, I think you've read too much into it. Grassroots marketing is not a pejorative term, either. Its strategy is to trigger positive reviews about your product, ideally by independent, credible community members, indeed.

That implies that those community members have motivations other than being paid. Ideologies and shared beliefs can be some of them. Being happy about the product is a prerequisite, whatever that means for the individual user.

208. Guillaume86 ◴[11 Jul 25 09:54 UTC] No.44530302[source]▶

>>44523638 #

Using devs working in their own repository is certainly understandable, but it might also explain in part the results. Personally I barely use AI for my own code, while on the other hand when working on some one off script or unfamiliar code base, I get a lot more value from it.

209. staunton ◴[11 Jul 25 09:59 UTC] No.44530340{6}[source]▶

>>44529639 #

What standard would that be?

210. literalAardvark ◴[11 Jul 25 10:06 UTC] No.44530391{3}[source]▶

>>44525216 #

Became worse is possible

Became worse in 50 hours? Super unlikely

211. pera ◴[11 Jul 25 10:20 UTC] No.44530477{6}[source]▶

>>44529508 #

It really is, for example here is a quote from AI 2027:

> By early 2030, the robot economy has filled up the old SEZs, the new SEZs, and large parts of the ocean. The only place left to go is the human-controlled areas. [...]

> The new decade dawns with Consensus-1’s robot servitors spreading throughout the solar system. By 2035, trillions of tons of planetary material have been launched into space and turned into rings of satellites orbiting the sun. The surface of the Earth has been reshaped into Agent-4’s version of utopia: datacenters, laboratories, particle colliders, and many other wondrous constructions doing enormously successful and impressive research.

This scenario prediction, which is co-authored by a former OpenAI researcher (now at Future of Humanity Institute), received almost 1 thousand upvotes here on HN and the attention of the NYT and other large media outlets.

If you read that and still don't believe the AI hype is _extreme_ then I really don't know what else to tell you.

https://news.ycombinator.com/item?id=43571851

212. bilbo-b-baggins ◴[11 Jul 25 10:23 UTC] No.44530500[source]▶

>>44523442 (TP) #

I can say that in my experience AI is very good at early codebases and refactoring tasks that come with that.

But for very large stable codebases it is a mixed bag of results. Their selection of candidates is valid but it probably illustrates a worst case scenario for time based measurement.

If an AI code editor cannot make more changes quicker than a dev or cannot provide relevant suggestions quick enough/without being distracting then you lose time.

213. rsynnott ◴[11 Jul 25 10:25 UTC] No.44530511{8}[source]▶

>>44525328 #

The complication is that, as noted in the above paper, _people are bad at self-reporting on whether the magic robot works for them_. Just because someone _believes_ they are more effective using LLMs is not particularly strong evidence that they actually are.

214. bilbo-b-baggins ◴[11 Jul 25 10:27 UTC] No.44530524[source]▶

>>44523638 #

Your next study should be very experienced devs working in new or early life repos where AI shines for refactoring and structured code suggestion, not to mention documentation and tests.

It’s much more useful getting something off the ground than maintaining a huge codebase.

215. jspdown ◴[11 Jul 25 10:38 UTC] No.44530595[source]▶

>>44523638 #

With today's state of LLMs and Agents, it's still not good for all the tasks. It took me couple of weeks before being able to correctly adjust on what I can ask and what I can expect. As a result, I don't use Claude Code for everything and I think I'm able to better pick the right task and the right size of task to give it. These adjustment depends on what you are doing, the complexity of and the maturity of the project at play.

Very often, I have entire tasks that I can't offload to the Agent. I won't say I'm 20x more productive, it's probably more in the range of 15% to 20% (but I can't measure that obviously).

216. saturneria ◴[11 Jul 25 11:16 UTC] No.44530847{6}[source]▶

>>44525177 #

It is an even more human reaction when the new strange thing directly threatens to upend and massively change the industry that puts food on your table.

The steam-powered loom was not good for the luddites either. Good for society at large in the long term but all the negative points that a 40 year old knitter in 1810 could make against the steam-powered loom would have been perfectly reasonable and accurate judged on that individual's perspective.

217. xandrius ◴[11 Jul 25 11:26 UTC] No.44530908[source]▶

>>44526996 #

On the other hand if you don't use vim, emacs, and other spawns from hell, you get labeled a noob and nothing can ever be said about their terrible UX.

I think we can be more open minded that an absolutely brand new technology (literally did not exist 3y ago) might require some amount of learning and adjusting, even for people who see themselves as an Einstein if only they wished to apply themselves.

replies(1): >>44536768 #

218. themk ◴[11 Jul 25 11:33 UTC] No.44530955{3}[source]▶

>>44527074 #

Hmmm, I don't see it? Are debuggers hard to use? Sometimes. But the debugger is allowing you to do something you couldn't actually do before. i.e. set breakpoints, and step through your code. So, while tricky to use, you are still in a better position than not having it. Just because you can get better at using something doesn't automatically mean that using it as a beginner makes you worse off.

Same can be said for version control and programming.

replies(1): >>44536244 #

219. otabdeveloper4 ◴[11 Jul 25 13:37 UTC] No.44531993{6}[source]▶

>>44525041 #

> but now I see its reasoning

It's not showing its reasoning. "Reasoning" models are trained to output more tokens in the hope that more tokens means less hallucinations.

It's just a marketing trick and there is no evidence this sort of fake ""reasoning"" actually gives any benefit.

220. ycombinatornews ◴[11 Jul 25 13:51 UTC] No.44532151[source]▶

>>44523442 (TP) #

Thank you for the last paragraph.

Same thought came when I was reading the article and glad I am not alone.

Anecdotally, most common productivity boost is coming from cutting down weird slow steps in processes. Write an automation script, campaign previewer for marketing, etc etc.

Coding seems to transform to be a more efficient (again anecdotally) but not entirely faster. You can do a better work on a new feature in the same or slightly smaller time.

Idle time at 4% was interesting. I think this number goes higher the more you use a specific tool and adjust your workflow to that

221. Imustaskforhelp ◴[11 Jul 25 13:52 UTC] No.44532161{6}[source]▶

>>44529753 #

I am really curious what your thoughts on zed are, given that it has a lot of features and is still mostly vim compatible (from what i know) so you have the same ergonomics and power and it has some sane defaults / I don't need to tinker as much with zed as I would have to with nvim.

Its not that I don't like tinkering. I really enjoy tinkering with config files but I never could understand nvim personally since I usually want a lsp / good enough experience that nvim or any lunarvim etc. couldn't provide without me installing additional software.

replies(1): >>44536223 #

222. ◴[11 Jul 25 14:43 UTC] No.44532696[source]▶

>>44526996 #

223. pdabbadabba ◴[11 Jul 25 14:54 UTC] No.44532838{10}[source]▶

>>44528610 #

That's the study I'm really interested in: does AI use improve the output of lower-skill developers (not experts). My intuitions point me in the opposite direction. I think AI would improve their work. But I'm not aware of any hard data that would help answer this question.

224. novaleaf ◴[11 Jul 25 16:19 UTC] No.44533993[source]▶

>>44526996 #

I've spent the last 2 months trying to figure out how to utilize AI properly, and only in the last week do I feel that I've hit upon a workflow that's actually a force multiplier (vs divisor).

225. Ntrails ◴[11 Jul 25 16:29 UTC] No.44534117{8}[source]▶

>>44525336 #

> you have to realize you’re talking to a population of people, and not necessarily the same person. Opinions are going to vary, they’re not literally the same person each time.

I pointed this out in my post for a reason. I get it. But even given a different person is saying the same thing every time a new release comes out - the effect on my prior is the same.

226. hajile ◴[11 Jul 25 16:46 UTC] No.44534294{5}[source]▶

>>44529110 #

I use most of the best vim features in VS Code with their vim bindings.

You'd be hard-pressed to find a popular editor without vim bindings.

227. nsingh2 ◴[11 Jul 25 17:53 UTC] No.44535140{4}[source]▶

>>44528413 #

> It is completely typical, but at the same time abnormal to have tools with such poor usability.

The main difference I see is that LLMs are flaky, getting better over time, but still more so than traditional tooling like debuggers.

> Programming languages are notoriously full of unnecessary complexity. Personal pet peeve: Rust lifetime management. If this is what it takes, just use GC (and I am - golang).

Lifetime management is an inherently hard problem, especially if you need to be able to reason about it at compile time. I think there are some arguments to be made about tooling or syntax making reasoning about lifetimes easier, but not trivial. And in certain contexts (e.g., microcontrollers) garbage collectors are out of the question.

228. itsoktocry ◴[11 Jul 25 18:07 UTC] No.44535270{4}[source]▶

>>44524740 #

>the previous model retroactively becomes total dogshit the moment a new one is released

Keep writing your code manually, nobody cares.

229. iLemming ◴[11 Jul 25 18:46 UTC] No.44535648{5}[source]▶

>>44529110 #

> vim and Emacs are not equipped to handle.

You clearly don't have a slightest idea of what you're talking about.

Emacs is actually still amazing in the LLM era. Language is all about plain text. Plain text remains crucial and will remain important because it's human-readable, machine-parsable, version-control friendly, lightweight and fast, platform-independent, and resistant to obsolescence. Even when analyzing huge amounts of complex data - images, videos, audio-recordings, etc., we often have to reduce it to text representation.

And there's simply no tool better than Emacs today that is well-suited for dealing with plain text. Nothing even comes close to what you can do with text in Emacs.

Like, check this out - I am right now transcribing my audio notes into .srt (subtitle) files. There's subed-mode where you can read through subtitles, and even play the audio, karaoke style, while following the text. I can do so many different things from here - extract the summaries, search through things, gather analytics - e.g., how often have I said 'fuck' on Wednesdays, etc.

I can similarly play YouTube videos in mpv, while controlling the playback, volume, speed, etc. from Emacs; I can extract subtitles for a given video and search through them, play the vid from the exact place in the subs.

I very often grab a selected region of screen during Zoom sessions to OCR and extract text within it and put it in my notes - yes, I do it in Emacs.

I can probably examine images, analyze their elements, create comprehensive summaries, and formulate expert artistic evaluation and critique and even ask Emacs to read it aloud back to me - the possibilities are virtually limitless.

It allows you to engage with vast array of LLM models from anywhere. I can ask a question in the midst of typing a Slack reply or reading HN comments or when composing a git commit; I can fact-check my own assumptions. I can also use tools to analyze and refactor existing codebases and vibe-code new stuff.

Anything like that even five years ago seemed like a dream; today it is possible. We can now reduce any complex digital data to plain text. And that feels miraculous.

If anything, the LLM era has made Emacs an extremely compelling choice. To be honest, for me - it's not even a choice, it's the only seriously viable option I have - despite all its drawbacks. Everything else doesn't even come close - other options either lacking critical features or have merely promising ones. Emacs is absolutely, hands-down, one of the best tools we humans have ever produced to deal with plain text. Anyone who thinks it's an opinion and not a fact simply hasn't grokked Emacs or has no clue what you can do with it.

replies(1): >>44536303 #

230. runarberg ◴[11 Jul 25 18:50 UTC] No.44535677{6}[source]▶

>>44528801 #

Not really a relic. Reinforcement learning is one of the best model for learned behavior we have. In the 1950s however cognitive science didn’t exist, and behavioralists thought they could explain much more with their model than they could, so they oversold the idea, by a lot.

Cognitive science was able to explain stuff like biases, pattern recognition, language, etc. which behavioral science thought they could explain, but couldn’t. In the 1950s it was really the only game in town (except for psychometrics which failed in a way much more complete—albeit less spectacular—way then behaviorism), so understandably scientists (and philosophers) went a little overboard with it (kind of like evolutionary biology did in the 1920s).

I think a more fair viewpoint is to claim that behaviorism’s heyday in the 1950s has passed, but it still provides an excellent theoretical framework for some of human behavior, and along with cognitive science, is able to explain most of what we know about human behavior.

231. fingerlocks ◴[11 Jul 25 19:52 UTC] No.44536223{7}[source]▶

>>44532161 #

I haven’t tried zed and I’m getting old and set in my ways. If it ain’t broke don’t fix it and all that.

So if the claim is that I can get everything I have out of vim, most importantly being unbeatably fast text buffers, and I don’t need a suitcase full of config files, that’s very compelling.

Is that the promise of zed?

232. anthonypasq ◴[11 Jul 25 19:55 UTC] No.44536244{4}[source]▶

>>44530955 #

i guarantee you there were millions of people that needed to be forced to use excel because they thought they could do the calculations faster by hand.

we retroactively assume that everyone just obviously adopts new technology, yet im sure there were tons and tons of people that retired rather than learning how computers worked when the PC revolution was happening.

233. fingerlocks ◴[11 Jul 25 20:02 UTC] No.44536303{6}[source]▶

>>44535648 #

At first I thought you were replying to me and this was a revival of the old vim + emacs wars.

I’m so glad we’re past that now and can join forces against a common enemy.

Thank you brother.

replies(1): >>44537138 #

234. iLemming ◴[11 Jul 25 21:04 UTC] No.44536768{3}[source]▶

>>44530908 #

> you get labeled a noob

No one would call one a noob for not using Vim or Emacs. But they might for a different reason.

If someone blindly rejects even the notion of these tools without attempting to understand the underlying ideas behind them, that certainly suggests the dilettante nature of the person making the argument.

The idea of vim-motions is a beautiful, elegant, pragmatic model. Thinking that it is somehow outdated is a misapprehension. It is timeless just like musical notation - similarly it provides compositional grammar and universal language, and leads to developing muscle memory; and just like it, it can be intimidating but rewarding.

Emacs is grounded on another amazing idea - one of the greatest ideas in computer science, the idea of Lisp. And Lisp is just as everlasting, like math notation or molecular formulas — it has rigid structural rules and uniform syntax, there's compositional clarity, meta-reasoning and universal readability.

These tools remain in use today despite the abundance of "brand new technology" because time and again these concepts have proven to be highly practical. Nothing prevents vim from being integrated into new tools, and the flexibility of Lisp allows for seamless integration of new tools within the old-school engine.

replies(1): >>44538215 #

235. jpc0 ◴[11 Jul 25 21:13 UTC] No.44536836{8}[source]▶

>>44526058 #

Takes this with a massive grain of salt but my experience with Google Code CLI recently, we pay for google products but not others internally, I can’t change that decision.

I asked it two implement two bicubic filters, a high pass filter and a high shelf filter. Some context, using the gemini webapp it would split out the exact code I need with the interfaces I require one shot because this is truly trivial C++ code to write.

15 million tokens and an hour and a half later I now had a project that could not build, the filters were not implemented and my trust in AI agentic workflows broken.

It cost me nothing, I just reset the repo and I was watching youtube videos for that hour and a half.

Your mileage may vary and I’m very sure if this was golang or typescript it might have done significantly better, but even compared to the exact same model in a chat interface my experience was horrible.

I’m sticking to the slightly “worse” experience of using the chat interface which does give me significant improvements in productivity vs letting the agent burn money and time and not produce working code.

236. iLemming ◴[11 Jul 25 21:53 UTC] No.44537138{7}[source]▶

>>44536303 #

There weren't any true "wars" to begin with. The entire thing is just absurd. These ideas are not even in competition, it's like arguing whether a piano or sheet music is "better".

Emacs veterans simply rejected the entire concept of modality, without even trying to understand what it is about. Emacs is inherently a modal editor. Key-chords are stateful, Transient menus (i.e. Magit) are modals, completion is a modal, isearch, dired, calc, C-u (universal argument), recursive editing — these are all modals. What the idea of vim-motions offers is a universal, simplified, structured language to deal with modality, that's all.

Vim users on the other hand keep saying "there's no such thing as vim-mode". And to a certain degree they are right — no vim plugin outside of vim/neovim implements all the features — IdeaVim, VSCode vim plugins, Sublime, etc. - all of them are full of holes and glaring deficiencies. With one notable exception — Evil-mode in Emacs. It is so wonderfully implemented, you wouldn't even notice that it is a plugin, an afterthought. It really does feel like a baked-in, native feature of the editor.

There are no "wars" in our industry — pretty much only misunderstanding, misinterpretation and misuse of certain ideas. It's not even technological — who knows, maybe it's not even sociotechnological. People simply like talking past each other, defending different values without acknowledging they're optimizing for different things.

It's not Vim's, Emacs' or VSCode's fault that we suffer from identity investment - we spend hundreds of hours using one so it becomes our identity. We suffer from simplification impulse — we just love binary choices, we constantly have the nagging "which is better?" question, even when it makes little sense. We're predisposed to tribal belonging — having a common enemy creates in-group cohesion.

But real, experienced craftspeople... they just use whatever works best for them in a given context. That's what we all should strive for — discover old and new ideas, study them, identify good ones, borrow them, shelve the bad ones (who knows, maybe in a different context they may still prove useful). Most importantly, use whatever makes you and your teammates happy. It's far more important than being more productive or being decisively right. If thy stupid thing works, perhaps it ain't that stupid?

237. lackoftactics ◴[11 Jul 25 23:02 UTC] No.44537674[source]▶

>>44526996 #

Stay tuned, a new study is coming with another revelation: you aren't getting faster by using Vim when you are learning it.

My previous employer didn't even allow me to use Vim until I learned it properly so it wouldn't affect my productivity. Why would using a cursor automatically make you better at something if it's just new to you and you are already an elite programmer according to this study?

238. patrakov ◴[11 Jul 25 23:56 UTC] No.44538020{5}[source]▶

>>44529050 #

> And it's not the marketing that's saying "you're using it wrong" - it's other users.

No, it's the non-coding managers who vibe-coded a half-working prototype, not other users. And here, the Dunning-Kruger effect is at play - those non-coding types do not understand that AI is not working for them either.

Full disclosure: I do rely on vibe-coded jq lines in one-off scripts that will definitely not process more data after the single intended use, and this is where AI saves my time.

239. patrakov ◴[12 Jul 25 00:09 UTC] No.44538091{6}[source]▶

>>44529573 #

This was present (in a positive way, though) even in Soviet films for children.

    Позабыты хлопоты,
    Остановлен бег,
    Вкалывают роботы,
    Счастлив человек!

    Worries forgotten,
    The treadmill doesn't run,
    Robots are working,
    Humans have fun!

240. xandrius ◴[12 Jul 25 00:32 UTC] No.44538215{4}[source]▶

>>44536768 #

One could try to be poetic with LLMs in order to make their point stronger and still convince absolutely no one who wasn't already convinced.

I'm sure nobody really reject the notion of LLMs but sure as hell do like to moan if the new technology doesn't absolutely perfect fit their own way of working. Does that make them any different than people wanting an editor which is intuitive to use? Nobody will ever know.

replies(1): >>44538428 #

241. iLemming ◴[12 Jul 25 01:10 UTC] No.44538428{5}[source]▶

>>44538215 #

> still convince absolutely no one who wasn't already convinced.

I don't know, people change their opinions all the time. I wasn't convinced about many ideas throughout my career, but I'm glad I found convincing arguments for some of them later.

> wanting an editor which is intuitive to use

Are you implying that Vim and Emacs are not?

Intuitive != Familiar. What feels unintuitive is often just unfamiliar. Vim's model actually feels pretty intuitive after the initial introduction. Emacs is pretty intuitive for someone who grokked Lisp basics - structural editing and REPL-driven development. The point is also subjective, for some people "intuitive editor" means "works like MS Word", but that's just one design philosophy, not an objective standard.

Tools that survive 30+ years and maintain passionate user bases must be doing something right, no?

> the new technology doesn't absolutely perfect fit their own way of working.

Emacs is extremely flexible, and thanks to that, I've rarely complained about new things not fitting my ways. I bend tools to fit my workflow if they don't align naturally — that's just the normal approach for a programmer.

↑