Popular/hot comments

(www.sanity.io)

548 points kmelve | 5 comments | 02 Sep 25 19:34 UTC | HN request time: 0.244s | source

Show context

spicyusername ◴[03 Sep 25 11:41 UTC] No.45114584[source]▶

>>45107962 (OP) #

I guess we're just going to be in the age of this conversation topic until everyone gets tired of talking about it.

Every one of these discussions boils down to the following:

- LLMs are not good at writing code on their own unless it's extremely simple or boilerplate

- LLMs can be good at helping you debug existing code

- LLMs can be good at brainstorming solutions to new problems

- The code that is written by LLMs always needs to be heavily monitored for correctness, style, and design, and then typically edited down, often to at least half its original size

- LLMs utility is high enough that it is now going to be a standard tool in the toolbox of every software engineer, but it is definitely not replacing anyone at current capability.

- New software engineers are going to suffer the most because they know how to edit the responses the least, but this was true when they wrote their own code with stack overflow.

- At senior level, sometimes using LLMs is going to save you a ton of time and sometimes it's going to waste your time. Net-net, it's probably positive, but there are definitely some horrible days where you spend too long going back and forth, when you should have just tried to solve the problem yourself.

replies(12): >>45114610 #>>45114779 #>>45114830 #>>45115041 #>>45115537 #>>45115567 #>>45115676 #>>45115681 #>>45116405 #>>45116622 #>>45118918 #>>45120482 #

sunir ◴[03 Sep 25 13:28 UTC] No.45115537[source]▶

>>45114584 #

All true if you one shot the code.

If you have a sophisticated agent system that uses multiple forward and backward passes, the quality improves tremendously.

Based on my set up as of today, I’d imagine by sometime next year that will be normal and then the conversation will be very different; mostly around cost control. I wouldn’t be surprised if there is a break out popular agent control flow language by next year as well.

The net is that unsupervised AI engineering isn’t really cheaper better or faster than human engineering right now. Does that mean in two years it will be? Possibly.

There will be a lot of optimizations in the message traffic, token uses, foundational models, and also just the Moore’s law of the hardware and energy costs.

But really it’s the sophistication of the agent systems that control quality more than anything. Simply following waterfall (I know, right? Yuck… but it worked) increased code quality tremoundously.

I also gave it the SelfDocumentingCode pattern language that I wrote (on WikiWikiWeb) as a code review agent and quality improved tremendously again.

replies(3): >>45115704 #>>45119020 #>>45119921 #

1. theshrike79 ◴[03 Sep 25 13:42 UTC] No.45115704[source]▶

>>45115537 #

> Based on my set up as of today, I’d imagine by sometime next year that will be normal and then the conversation will be very different; mostly around cost control. I wouldn’t be surprised if there is a break out popular agent control flow language by next year as well.

Currently it's just VC funded. The $20 packages they're selling are in no way cost-effective (for them).

That's why I'm driving all available models like I stole them, building every tool I can think of before they start charging actual money again.

By then local models will most likely be at a "good enough" level especially when combined with MCPs and tool use so I don't need to pay per token for APIs except for special cases.

replies(1): >>45115873 #

2. tempoponet ◴[03 Sep 25 13:56 UTC] No.45115873[source]▶

>>45115704 (TP) #

Once local models are good enough there will be a $20 cloud provider that can give you more context, parameters, and t/s than you could dream of at home. This is true today with services like groq.

replies(3): >>45117269 #>>45117579 #>>45124012 #

3. hatefulmoron ◴[03 Sep 25 15:57 UTC] No.45117269[source]▶

>>45115873 #

Groq and Cerebras definitely have the t/s, but their hardware is tremendously expensive, even compared to the standard data center GPUs. Worth keeping in mind if we're talking about a $20 subscription.

4. sunir ◴[03 Sep 25 16:21 UTC] No.45117579[source]▶

>>45115873 #

Not exactly. Those models are based on intermittent usage. If you're using an AI engineer using a sophisticated agent flow, the usage is constant and continuous. That can price to an equivalent of a dedicated cube at home over 2 years.

I had 3 projects running today. I hit my Claude Max Pro session limits twice today in about 90 minutes. I'm now keeping it down to 1 project, and I may interrupt it until the evening when I don't need Claude Web. If I could run it passively on my laptop, I would.

5. theshrike79 ◴[04 Sep 25 05:57 UTC] No.45124012[source]▶

>>45115873 #

Anthropic used to have unlimited subscriptions, then people started running angents 24/7.

Now they have 5 hour buckets of limited use.

Groq most likely stays afloat because they're a bit player - and propped by VC money.

With a local system I can run it at full blast all the time, nobody can suddenly make it stupid by reallocating resources to training their new model, nobody can censor it or do stealth updates that make it perform worse.

↑

A staff engineer's journey with Claude Code