←back to thread

221 points caspg | 1 comments | | HN request time: 0.207s | source
Show context
thefourthchime ◴[] No.42165457[source]
For years I've kept a list of apps / ideas / products I may do someday. I never made the time, with Cursor AI I have already built one, and am working on another. It's enabling me to use frameworks I barely know, like React Native, Swift, etc..

The first prompt (with o1) will get you 60% there, but then you have a different workflow. The prompts can get to a local minimum, where claude/gpt4/etc.. just can't do any better. At which point you need to climb back out and try a different approach.

I recommend git branches to keep track of this. Keep a good working copy in main, and anytime you want to add a feature, make a branch. If you get it almost there, make another branch in case it goes sideways. The biggest issue with developing like this is that you are not a coder anymore; you are a puppet master of a very smart and sometimes totally confused brain.

replies(5): >>42165545 #>>42165831 #>>42166210 #>>42169944 #>>42170110 #
lxgr ◴[] No.42165545[source]
> For years I've kept a list of apps / ideas / products I may do someday. I never made the time, with Cursor AI I have already built one, and am working on another.

This is one fact that people seem to severely under-appreciate about LLMs.

They're significantly worse at coding in many aspects than even a moderately skilled and motivated intern, but for my hobby projects, until now I haven't had any intern that would even as much as taking a stab at some of the repetitive or just not very interesting subtasks, let alone stick with them over and over again without getting tired of it.

replies(2): >>42165600 #>>42165998 #
imiric ◴[] No.42165998[source]
I'm curious: what do you do when the LLM starts hallucinating, or gets stuck in a loop of generating non-working code that it can't get out of? What do you do when you need to troubleshoot and fix an issue it introduced, but has no idea how to fix?

In my experience of these tools, including the flagship models discussed here, this is a deal-breaking problem. If I have to waste time re-prompting to make progress, and reviewing and fixing the generated code, it would be much faster if I wrote the code from scratch myself. The tricky thing is that unless you read and understand the generated code, you really have no idea whether you're progressing or regressing. You can ask the model to generate tests for you as well, but how can you be sure they're written correctly, or covering the right scenarios?

More power to you if you feel like you're being productive, but the difficult things in software development always come in later stages of the project[1]. The devil is always in the details, and modern AI tools are just incapable of getting us across that last 10%. I'm not trying to downplay their usefulness, or imply that they will never get better. I think current models do a reasonably good job of summarizing documentation and producing small snippets of example code I can reuse, but I wouldn't trust them for anything beyond that.

[1]: https://en.wikipedia.org/wiki/Ninety%E2%80%93ninety_rule

replies(6): >>42166153 #>>42166276 #>>42168143 #>>42170654 #>>42172130 #>>42179603 #
zamadatix ◴[] No.42179603[source]
You may not need that last 10% on a hobby project. If you do and it's insurmountable with AI+you then you're no worse off than when it was insurmountable with just you.

Outside that context, the better way to use the tools is as a superpowered stack overflow search. Don't know how ${library} expects you to ${thing} in ${language}? Rather than just ask "I need to add a function in this codebase which..." and pastes it into your code ask "I need an example function which uses..." and use what it spits out as an example to integrate. Then you can ask "can I do it like..." and get some background on why you can/can't/should/shouldn't think about doing it that way. It's not 100% right or applicable, especially with every ${library}, ${thing}, and ${language} but it's certainly faster to a good answer most of the time than SO or searching. Worst case failure? You've spent a couple minutes to find you need to spend a lot of time reading through the docs to do you one off thing yourself still.

replies(1): >>42192861 #
1. imiric ◴[] No.42192861[source]
That's the way I currently use them. But, just like with SO, the code could be outdated and not work with the specific version of the library you're using, or just plain wrong. There's no way to tell it to show you code using version X.Y. The code could even be a mix of different versions and APIs, or the LLM might be trained on outdated versions, etc.

Even worse, the LLM will never tell you it doesn't know the answer, or that what you're trying to do is not possible, but will happily produce correct-looking code. It's not until you actually try it that you will notice an error, at which point you either go into a reprompt-retry loop, or just go read the source documentation. At least that one won't gaslight you with wrong examples (most of the time).

There are workarounds to this, and there are coding assistants that actually automate this step for you, and try to automatically run the code and debug it if something goes wrong, but that's an engineering solution to an AI problem, and something that doesn't work when using the model directly.

> Worst case failure? You've spent a couple minutes to find you need to spend a lot of time reading through the docs to do you one off thing yourself still.

It's not a couple of minutes, though. How do you know you've reached the limit of what the LLM can do, vs. not using the right prompt, or giving enough context? The answer always looks to be _almost_ there, so I'm always hopeful I can get it to produce the correct output. I've spent hours of my day in aggregate coaxing the LLM for the right answer. I want to rely on it precisely because I want to avoid looking at the documentation—which sometimes may not even exist or be good enough, otherwise it's back to trawling the web and SO. If I knew the LLM would waste my time, I could've done that from the beginning.

But I do appreciate that the output sometimes guides me in the right direction, or gives me ideas that I didn't have before. It's just that the thought of relying on this workflow to build fully-fledged apps seems completely counterproductive to me, but some folks seem to be doing this, so more power to them.