Most active commenters
  • imiric(5)
  • Kiro(3)

←back to thread

221 points caspg | 25 comments | | HN request time: 0.002s | source | bottom
Show context
thefourthchime ◴[] No.42165457[source]
For years I've kept a list of apps / ideas / products I may do someday. I never made the time, with Cursor AI I have already built one, and am working on another. It's enabling me to use frameworks I barely know, like React Native, Swift, etc..

The first prompt (with o1) will get you 60% there, but then you have a different workflow. The prompts can get to a local minimum, where claude/gpt4/etc.. just can't do any better. At which point you need to climb back out and try a different approach.

I recommend git branches to keep track of this. Keep a good working copy in main, and anytime you want to add a feature, make a branch. If you get it almost there, make another branch in case it goes sideways. The biggest issue with developing like this is that you are not a coder anymore; you are a puppet master of a very smart and sometimes totally confused brain.

replies(5): >>42165545 #>>42165831 #>>42166210 #>>42169944 #>>42170110 #
lxgr ◴[] No.42165545[source]
> For years I've kept a list of apps / ideas / products I may do someday. I never made the time, with Cursor AI I have already built one, and am working on another.

This is one fact that people seem to severely under-appreciate about LLMs.

They're significantly worse at coding in many aspects than even a moderately skilled and motivated intern, but for my hobby projects, until now I haven't had any intern that would even as much as taking a stab at some of the repetitive or just not very interesting subtasks, let alone stick with them over and over again without getting tired of it.

replies(2): >>42165600 #>>42165998 #
Sakos ◴[] No.42165600[source]
It also reduces the knowledge needed. I don't particularly care about learning how to setup and configure a web extension from scratch. With LLM, I can get 90% of that working in minutes, then focus on the parts that I am interested in. As somebody with ADHD, it was primarily all that supplementary, tangential knowledge which felt like an insurmountable mountain to me and made it impossible to actually try all the ideas I'd had over the years. I'm so much more productive now that I don't have to always get into the weeds for every little thing, which could easily delay progress for hours or even days. I can pick and choose the parts I feel are important to me.
replies(1): >>42166112 #
1. imiric ◴[] No.42166112[source]
> It also reduces the knowledge needed. I don't particularly care about learning how to setup and configure a web extension from scratch. With LLM, I can get 90% of that working in minutes, then focus on the parts that I am interested in.

Eh, I would argue that the apparent lower knowledge requirement is an illusion. These tools produce non-working code more often than not (OpenAI's flagship models are not even correct 50% of the time[1]), so you still have to read, understand and debug their output. If you've ever participated in a code review, you'll know that doing that takes much more effort than actually writing the code yourself.

Not only that, but relying on these tools handicaps you into not actually learning any of the technologies you're working with. If you ever need to troubleshoot or debug something, you'll be forced to use an AI tool for help again, and good luck if that's a critical production issue. If instead you take the time to read the documentation and understand how to use the technology, perhaps even with the _assistance_ of an AI tool, then it might take you more time and effort upfront, but this will pay itself off in the long run by making you more proficient and useful if and when you need to work on it again.

I seriously don't understand the value proposition of the tools in the current AI hype cycle. They are fun and useful to an extent, but are severely limited and downright unhelpful at building and maintaining an actual product.

[1]: https://openai.com/index/introducing-simpleqa/

replies(4): >>42166445 #>>42166468 #>>42166683 #>>42166825 #
2. Robotenomics ◴[] No.42166445[source]
Things have improved considerably over the last 3 months. Claude with cursor.ai is certainly over 50%
replies(2): >>42166641 #>>42166987 #
3. Sakos ◴[] No.42166468[source]
All the projects I've been able to start and make progress in in the past year vs the ten years before that are substantive enough proof for me that you're wrong in pretty much all of your arguments. My direct experience proves statements like "the lower knowledge requirement is an illusion" and "it takes much more effort to review code than to write it" wrong. I do code reviews all the time. I write code all the time. I've had AI help me with my projects and I've reviewed and refactored that code. You're quite simply wrong. And I don't understand why you're so eager to argue that my direct experience is wrong, as if you're trying to gaslight me.

It's quite honestly mystifying to me.

It's simply not the case that we need to be experts in every single part of a software project. Not for personal projects and not for professional ones either. So it doesn't make any sense to me not to use AI if I've directly proven to myself that it can improve my productivity, my understanding and my knowledge.

> If you ever need to troubleshoot or debug something, you'll be forced to use an AI tool for help again

This is proof to me that you haven't used AI much. Because AI has helped me understand things much quicker and with much less friction than I've ever been able to before. And I have often been able to solve things AI has had issues with, even if it's a topic I have zero experience with, through the interaction with the AI.

At some point, being able to make progress (and how that affects the learning process) trumps this perfect ideal of the programmer who figures out everything on their own through tedious, mind-numbing long hours solving problems that are at best tangential to the problems they were actually trying to solve hours ago.

Frankly, I'm tired of not being able to do any of my personal projects because of all the issues I've mentioned before. And I'm tired of people like you saying I'm doing it wrong, DESPITE ME NOT BEING ABLE TO DO IT AT ALL BEFORE.

Honestly, fuck this.

replies(4): >>42166827 #>>42166967 #>>42166978 #>>42167346 #
4. kbaker ◴[] No.42166641[source]
Where the libraries are new/not known to the LLM yet, I just go find the most similar examples in the docs and chuck them in the context window too (easy to do with aider.) Then say 'fix it'. Does an incredible job.
5. lxgr ◴[] No.42166683[source]
> These tools produce non-working code more often than not (OpenAI's flagship models are not even correct 50% of the time[1]), so you still have to read, understand and debug their output.

Definitely, but what LLMs provide me that a purely textual interface can't is discoverability.

A significant advantage of GUIs is that I get to see a list of things I can do, and the task becomes figuring out which ones are going to solve my problem. For programming languages, that's usually not the case (there's documentation, but that isn't usually as nested and context sensitive as a GUI is), and LLMs are very good at bridging that gap.

So even if an LLM provides me a broken SQL query for a given task, more often than not it's exposed me to new keywords or concepts that did in fact end up solving my problem.

A hand-crafted GUI is definitely still superior to any chat-based interface (and this is in fact a direction I predict AI models will be moving to going forward), but if nobody builds one, I'll take an LLM plus a CLI and/or documentation over only the latter any day.

replies(1): >>42172231 #
6. Kiro ◴[] No.42166825[source]
> OpenAI's flagship models are not even correct 50% of the time[1]

You're reading the link wrong. They specifically picked questions that one or more models failed at. It's not representative of how often the model is wrong in general.

From the paper:

> At least one of the four completions must be incorrect for the trainer to continue with that question; otherwise, the trainer was instructed to create a new question.

7. imiric ◴[] No.42166827[source]
Hey, I'm not trying to gaslight you into anything. I'm just arguing from my point of view, which you're free to disagree with.

You're right that I've probably used these tools much less than you have. I use them ocasionally for minor things (understanding an unfamiliar API, giving me hints when web searching is unhelpful, etc.), but even in my limited experience with current state of the art services (Claude 3.5, GPT-4o) I've found them to waste my time in ways I wouldn't if I weren't using them. And at the end of the day, I'm not sure if I'm overall more productive than I would be without them. This limited usage leads me to believe that the problem would be far worse if I were to rely on them for most of my project, but the truth is I haven't actually tried that yet.

So if you feel differently, more power to you. There's no point in getting frustrated because someone has a different point of view than you.

replies(1): >>42167284 #
8. Kiro ◴[] No.42166967[source]
I understand your frustration. It's like someone trying to convince me that a red car I'm looking at is actually blue. I know what I'm seeing and experiencing. There's nothing theoretical about it and I have the results right in front of me.
9. senorrib ◴[] No.42166978[source]
It’s baffling to see all the ignorant answers to this thread, OP. My experience has been similar to yours, and I’ve been pushing complex software to production for the past 20 years.

Feels like a bunch o flat earth arguments; they’d rather ignore evidence (or even try out by themselves) to keep the illusion that you need to write it all yourself for it to be “high quality”.

replies(2): >>42167053 #>>42168551 #
10. imiric ◴[] No.42166987[source]
I haven't used cursor.ai, but Claude 3.5 Sonnet definitely has the issues I'm talking about. Maybe I'm not great at prompting, but this is far from an exact science. I always ask it specific things I need help with, making sure to provide sufficient detail, and don't ask it to produce mountains of code. I've had it generate code that not only hallucinates APIs, but has trivial bugs like referencing undefined variables. How this can scale beyond a few lines of code to produce an actually working application is beyond me. But apparently I'm in the minority here, since people are actually using these tools successfully for just that, so more power to them.
replies(1): >>42170840 #
11. imiric ◴[] No.42167053{3}[source]
Or, hey, maybe we've just had different experiences, and are using these tools differently? I even concede that I may not be great at prompting, which could be the cause of my problems.

I'm not arguing that writing everything yourself leads to higher quality. I'm arguing that _in my experience_ a) it takes more time and effort to read, troubleshoot and fix code generated by these tools than it would take me to actually write it myself, and b) that taking the time to read the documentation and understand the technologies I'm working with would actually save me time and effort in the future.

You're free to disagree with all of this, but don't try to tell me my experience is somehow lesser than yours.

replies(2): >>42168635 #>>42168853 #
12. WhatIsDukkha ◴[] No.42167284{3}[source]
I'm not frustrated with you but I'll explain why you might be getting get the vibes here.

Its like people are learning about these new things called skis.

They fall on their face a few times but then they find "wow much better than good old snowshoes!"

Of course some people are falling every 2 feet while trying skis and then go to the top of the mountain and claim skis are fake and we should all go back to snowshoes because we don't know about snow or mountains.

They are insulting about it because its important to the ragers that, despite failing at skiing, they are senior programmers and everyone else doesn't know how to compile, test and review code and they must be hallucinating their ski journeys!

Meanwhile a bunch of us took the falls and learned to ski and are laughing at the ragers.

The frustrating thing though is that for all the skiiers we can't seem to get good conversations about how to ski because there is so much raging... oh well.

replies(1): >>42167661 #
13. handzhiev ◴[] No.42167346[source]
This desire of deniers to prove to people who actually get tons of benefit of LLMs that they aren't getting it is becoming more ridiculous every time.

"You can't use LLMs for this or that because of this and that!!!".

But I AM using them. Every. Single. Day.

replies(1): >>42167719 #
14. rossvor ◴[] No.42167661{4}[source]
With your analogy I would be the one saying that I'm still not convinced that skis are faster than snowshoes.

I still use ChatGPT/Claude/Llama daily for both code generation and other things. And while it sometimes does do exactly what I want it to, and I feel more productive, it still seems to waste my time an almost an equal amount of time, and I have to give up on it and rewrite it manually or do a google search/read the actual documentation. It's good to bounce things off, it's good as starting point to learn new stuff, gives you great direction to explore new things and test things out quickly. My guess on a "happy path" it gives me 1.3 speed up, which is great when that happens, but the caveat is that you are not on a "happy path" most the time, and if you listen to the evangelists it seems like it should be 2x-5x speed up (skis). So where's the disconnect?

I'm not here to disprove your experience, but with 2 years of almost daily usage of skis, how come I feel like I'm still barely breaking even compared with snowshoes? Am I that bad with my prompting skills?

replies(2): >>42167897 #>>42176369 #
15. handzhiev ◴[] No.42167719{3}[source]
And of course every time such comments get downvoted. Folks, you can downvote as much as you want - I don't give a fuck even if my reputation goes negative. This won't make you right.
16. WhatIsDukkha ◴[] No.42167897{5}[source]
I use -

Rust, aider.chat and

I thoughtfully limit the context of what I'm coding (on 2 of 15 files).

I ./ask a few times to get the context setup. I let it speculate on the path ahead but rein it in with more conservative goals.

I then say "let's carefully and conservatively implement this" (this is really important with sonnet as its way too eager).

I get to compile by doing ./test a few times, there is sometimes a doom loop though so -

I reset the context with a better footing if things are going off track or I just think "its time".

I do not commit until I have a plausible building set of functions (it can probably handle touching 2-3 functions of configs or one complete function but don't get too much more elaborate without care and experience).

I either reset or use the remaining context to create some tests and validate.

I think saying 1.3x more productive is fair with only this loop BUT you have to keep a few things in perspective.

I wrote specs for everything I did, in other words I wrote out in english my goals and expectations of the code, that was highly valuable and something I probably wouldn't have done.

Automatic literate programming!

Sheep shearing is crazy fast with an LLM. Those tasks that would take you off in the weeds do feel 5x faster (with caveats).

I think the 2x-5x faster is true within certain bounds -

What are the things that you were psychologically avoiding /dragging or just skipping because they were too tedious to even think of?

Some people don't have that problem or maybe don't notice, to me its a real crazy benefit I love!

That's were the real speedups happens and its amazing.

replies(1): >>42169918 #
17. thefourthchime ◴[] No.42168551{3}[source]
Thanks, my guess is that many complaining about the technology haven't honestly tried to embrace it.
replies(1): >>42168731 #
18. senorrib ◴[] No.42168635{4}[source]
I wasn’t targeting this specifically at you or your individual experience. However, I did hear the same arguments you make ad nauseam, and they usually come from people that are either just too skeptical, or don’t put the effort required to use the tool.
19. rtsil ◴[] No.42168731{4}[source]
Or denial/rejection is natural defense reaction for people who feel threatened.
20. fragmede ◴[] No.42168853{4}[source]
So link chats where you've run into the very real limitations these things have. What language you're using, what framework you're in, what library it hallucinated. I'm not interested in either of us shouting past each other, I genuinely want to understand how your experience, which is not at all lesser than mine, is so different. Am I ignoring flaws that you otherwise can't overlook? Are you expecting too much from it with too little input? Without details, all we can do is describe feelings at each other and get frustrated when the other person's experience is different. Might as well ask your star sign while we're at it.
replies(1): >>42182452 #
21. max6zx ◴[] No.42169918{6}[source]
Do you mind sharing how much experience you have with the tech stack that have generated code? What I found with LLM is the perspective for AI generated code is different depends on your own experience, and I would like to know whether it is only my experience.

I have more than 20 years with backend development and just some limited experience with frontend tech stacks. I tried using LLM initially with for frontend in my personal project. I found that code generation by LLM are so good. It produces code that works immediately with my vague prompts. It happily fixes any issue that I found pretty quick and correct. I also have enough knowledge to tweak anything that I need so at the end of the day, I can see that my project work as expected. I feel really productive with it.

Then I slowly start using LLM for my backend projects at work. And I was so suprise that the experience was completely opposite. Both ChatGPT and Claude generated code that either bad practice or have flaw, or just ignore instructions in my prompt to come back to bad solutions after just a few questions. It also fails to apply common practices from architecture perspectives. So the effort to make it work is much more than when I do all coding myself.

At that point, I thought probably there are more frontend projects used to train those models than in backend projects, therefore quality of code in frontend tech is much better. But when using LLM with another language that I did not have much experience for another backend project, I found out why my experience is so much different as I can now observe more clearly on what is bad and good in the generated code.

In my previous backend project, as I have much more knowledge on languages/frameworks/practice, my criteria was also higher. It is not just the code that can run, it must be extensible, in right structure and in good architecture, use correct idiom ... Whereas my frontend experience is more limited, the generated code work as I expected but possibly it also violated all these NFRs that I do not know. It explains why using it with a new program language (something I don't have much experience) in a backend project (my well know domain) I found a mixed experience when it seems to provide me working code, but failed on following good practices.

My hypothesis is LLM can generate code at intemediate level, so if your experience is limited you see it as pure gold. But if your level is much better, those generated code are just garbage. I really want to hear more from other people to validate my hypothesis as it seems people also have opposite experiences with this.

22. disgruntledphd2 ◴[] No.42170840{3}[source]
I think it really depends on the language. It generates pretty crap but working python code, but even for SQL it generates really weird crummy code that often doesn't solve the problem.

I find it really helpful where I don't know a library very well but can assess if the output works.

More generally, I think you need to give it pretty constrained problems if you're working on anything relatively complicated.

23. Terretta ◴[] No.42172231[source]
> OpenAI's flagship models are not even correct 50% of the time[1]

Where does [1] go? In any case, try Anthropic's flagship:

91% > 50.6%

https://aider.chat/docs/leaderboards/#code-refactoring-leade...

24. Kiro ◴[] No.42176369{5}[source]
> Am I that bad with my prompting skills?

Or you're using skis on gravel. I'm a firm believer that the utility varies greatly depending on the tech stack and what you're trying to do, ranging from negative value to way more than 5x.

I also think "prompting" is a misrepresentation of where the actual skill and experiences matter. It's about being efficient with the tooling. Prompting, waiting for a response and then manually copypasting line by line into multiple places is something else entirely than having two LLMs work in tandem, with one figuring out the solution and the other applying the diff.

Good tooling also means that there's no overhead trying out multiple solutions. It should be so frictionless that you sometimes redo a working solution just because you want to see a different approach.

Finally, you must be really active and can't just passively wait for the LLM to finish before you start analyzing the output. Terminate early, reprompt and retry. The first 5 seconds after submitting is crucial and being able to take a decision just from seeing a few lines of code is a completely new skill for me.

25. imiric ◴[] No.42182452{5}[source]
I use OpenRouter, which saves chats in local storage, and my browser is configured to delete all history and data on exit. So, unfortunately, I can't link you to an exact session.

I give more details of one instance of this behavior using Claude 3.5 Sonnet a few weeks ago here[1]. I was asking it to implement a specific feature using a popular Go CLI library. I could probably reproduce it, but honestly can't be bothered, nor do I wish to use more of my API credits for this.

Besides, why should I have to prove anything in this discussion? We're arguing based on good faith, and just as I assume your experience is based on positive interactions, so should you assume mine is based on negative ones.

But I'll give you one last argument based on principles alone.

LLMs are trained on mountains of data from various online sources (web sites, blogs, documentation, GitHub, SO, etc.). This training takes many months and has a cutoff point sometime in the past. When you ask them to generate some code using a specific library, how can you be sure that the code is using the specific version of the library you're currently using? How can you be sure that the library is even in the training set and that the LLM won't just hallucinate it entirely?

Some LLMs allow you to add sufficient context to your prompts (with RAG, etc.) to increase the likelihood of generating working code, which can help, but still isn't foolproof, and not all services/tools allow this.

But more crucially, when you ask it to do something that the library doesn't support, the LLM will never tell you "this isn't possible" or "I don't know". It will instead proceed to hallucinate a solution because that's what it was trained to do.

And how are these state-of-the-art coding LLMs that pass all these coding challenges capable of producing errors like referencing an undefined variable? Surely these trivial bugs shouldn't be possible, no?

All of these issues were what caused me to waste more than an hour fighting with both Claude 3.5 Sonnet and GPT-4o. And keep in mind that this was a fairly small problem. This is why I can't imagine how building an entire app, using a framework and dozens of libraries, could possibly be more productive than doing it without them. But clearly this doesn't seem to be an opinion shared by most people here, so let's agree to disagree.

[1]: https://news.ycombinator.com/item?id=41987474