And I really wish I could trust an llm for that, or, indeed, any task. But I generally find answers fall into one of these useless buckets: 1. Reword the question as an answer (so common, so useless) 2. Trivial solutions that are correct - meaning one or two lines that are valid, but that I could have easily written myself quicker than getting an agent involved, and without the other detractors on this list 3. Wildly incorrect "solutions". I'm talking about code that doesn't even build because the llm can't take proper direction on which version of the library to refer to, so it keeps giving results based off old information that is no longer relevant. Try resolving a webpack 5 issue - you'll get a lot of webpack 4 answers and none of them will work, even if you specify webpack 5 4. The absolute worst: subtly incorrect solutions that seem correct and are confidently presented as correct. This has been my experience with basically every "oh wow, look what the llm can do" demo. I'm that annoying person who finds the big mid-demo.
The problems are: 1. A person inexperienced in the domain will flounder for ages trying out crap that doesn't work and understanding nothing of it. 2. A person experienced in the domain will spend a reasonable amount of time correcting the llm - and personally, I'd much rather write my own code via tdd-driven emergent design - I'll understand it, and it will be proven to work when it's done.
I see that proponents of the tech often gloss over this and don't realise that they're actually spending more time overall, especially when having to polish out all the bugs. Or maintain the system.
Use whatever you want, but I've got zero confidence in the models, and I prefer to write code instead of gambling. But to each, their own.
There's an old saying "Fire is a good servant, but bad master". I think same applies to AI. In "vibe-coding" AI is too much the master.
If I want to say, create a Youtube RSS hydrator that uses DeArrow to de-clickbait all URLs before they hit my RSS reader.
Level 1 (max vibe) I can either just say that to an LLM hit "go" and hope for the best (maximum vibes on spec and code). Most likely gonna be shit. Might work too.
Level 2 (pair-vibing the spec) is me pair-vibing the spec with an LLM, web versions might work if they can access sites for specs (figuring out how to turn a youtube URL to an RSS feed and how the DeArrow API works)
After the spec is done, I can give it to an agent and go do something else. In most cases there's an MVP done when I come back, depending on how easy said thing is to test (RSS/Atom is a fickle spec and readers implement it in various ways) automatically.
Level 3 continues the pair-vibed spec with pair-coding. I give the agent tasks in small parts and follow along as it progresses, interrupting if it strays.
For most senior folks with experience in writing specs for non-seniors, Level 2 will produce good enough stuff for personal use. And because you offload the time consuming bits to an agent, you can do multiple projects in parallel.
Level 3 will definitely bring the best results, but you can only progress one task at a time.