Coding with LLMs in the summer of 2025 – an update

(antirez.com)

Show context

dakiol ◴[20 Jul 25 14:22 UTC] No.44625484[source]▶

> Gemini 2.5 PRO | Claude Opus 4

Whether it's vibe coding, agentic coding, or copy pasting from the web interface to your editor, it's still sad to see the normalization of private (i.e., paid) LLM models. I like the progress that LLMs introduce and I see them as a powerful tool, but I cannot understand how programmers (whether complete nobodies or popular figures) dont mind adding a strong dependency on a third party in order to keep programming. Programming used to be (and still is, to a large extent) an activity that can be done with open and free tools. I am afraid that in a few years, that will no longer be possible (as in most programmers will be so tied to a paid LLM, that not using them would be like not using an IDE or vim nowadays), since everyone is using private LLMs. The excuse "but you earn six figures, what' $200/month to you?" doesn't really capture the issue here.

replies(46): >>44625521 #>>44625545 #>>44625564 #>>44625827 #>>44625858 #>>44625864 #>>44625902 #>>44625949 #>>44626014 #>>44626067 #>>44626198 #>>44626312 #>>44626378 #>>44626479 #>>44626511 #>>44626543 #>>44626556 #>>44626981 #>>44627197 #>>44627415 #>>44627574 #>>44627684 #>>44627879 #>>44628044 #>>44628982 #>>44629019 #>>44629132 #>>44629916 #>>44630173 #>>44630178 #>>44630270 #>>44630351 #>>44630576 #>>44630808 #>>44630939 #>>44631290 #>>44632110 #>>44632489 #>>44632790 #>>44632809 #>>44633267 #>>44633559 #>>44633756 #>>44634841 #>>44635028 #>>44636374 #

simonw ◴[20 Jul 25 16:11 UTC] No.44626556[source]▶

>>44625484 #

The models I can run locally aren't as good yet, and are way more expensive to operate.

Once it becomes economical to run a Claude 4 class model locally you'll see a lot more people doing that.

The closest you can get right now might be Kimi K2 on a pair of 512GB Mac Studios, at a cost of about $20,000.

replies(12): >>44627184 #>>44627617 #>>44627695 #>>44627852 #>>44628143 #>>44631034 #>>44631098 #>>44631352 #>>44631995 #>>44632684 #>>44633226 #>>44644288 #

1. zer00eyz ◴[20 Jul 25 18:07 UTC] No.44627695[source]▶

>>44626556 #

> Once it becomes economical to run a Claude 4 class model locally you'll see a lot more people doing that.

Historically these sorts of things happened because of Moores law. Moores law is dead. For a while we have scaled on the back of "more cores", and process shrink. It looks like we hit the wall again.

We seem to be near the limit of scaling (physics) we're not seeing a lot in clock (some but not enough), and IPC is flat. We are also having power (density) and cooling (air wont cut it any more) issues.

The requirements to run something like claud 4 local aren't going to make it to house hold consumers any time soon. Simply put the very top end of consumer PC's looks like 10 year old server hardware, and very few people are running that because there isn't a need.

The only way we're going to see better models locally is if there is work (research, engineering) put into it. To be blunt that isnt really happening, because Fb/MS/Google are scaling in the only way they know how. Throw money at it to capture and dominate the market, lock out the innovators from your API and then milk the consumer however you can. Smaller, and local is antithetical to this business model.

Hoping for the innovation that gives you a moat, that makes you the next IBM isnt the best way to run a business.

Based on how often Google cancels projects, based on how often the things Zuck swear are "next" face plant (metaverse) one should not have a lot of hope about AI>

replies(3): >>44627840 #>>44628024 #>>44630780 #

2. esafak ◴[20 Jul 25 18:23 UTC] No.44627840[source]▶

>>44627695 (TP) #

Model efficiency is outpacing Moore's law. That's what DeepSeek V3 was about. It's just we're simultaneously finding ways to use increase model capacity, and that's growing even faster...

replies(1): >>44628211 #

3. mleo ◴[20 Jul 25 18:40 UTC] No.44628024[source]▶

>>44627695 (TP) #

Why wouldn’t 3rd party hardware vendors continue to work on reducing costs of running models locally? If there is a market opportunity for someone to make money, it will be filled. Just because the cloud vendors don’t develop hardware someone will. Apple has vested interest in making hardware to run better models locally, for example.

replies(1): >>44629088 #

4. zer00eyz ◴[20 Jul 25 18:58 UTC] No.44628211[source]▶

>>44627840 #

> Model efficiency is outpacing Moore's law.

Moores law is dead, has been for along time. There is nothing to outpace.

> That's what DeepSeek V3 was about.

This would be a foundational shift! What problem in complexity theory was solved that the rest of computing missed out on?

Don't get me wrong MOE is very interesting but breaking up one large model into independent chunks isn't a foundational breakthrough its basic architecture. It's 1960's time sharing on unix basics. It's decomposition of your application basics.

All that having been said, there is a ton of room for these sorts of basic, blood and guts engineering ideas to make systems more "portable" and "usable". But a shift in thinking to small, targeted and focused will have to happen. Thats antithetical to everything in one basket throw more compute at it and magically we will get to AGI. That clearly isnt the direction the industry is going... it wont give any one a moat, or market dominance.

replies(2): >>44628648 #>>44629754 #

5. moron4hire ◴[20 Jul 25 19:45 UTC] No.44628648{3}[source]▶

>>44628211 #

I agree with you that Moore's Law being dead means we can't expect much more from current, silicon-based GPU compute. Any improvement from hardware alone is going to have to come from completely new compute technology, of which I don't think there is anything mature enough to expect any results in the next 10 years.

Right now, hardware wise, we need more RAM in GPUs than we really need compute. But it's a breakpoint issue: you need enough RAM to hold the model. More RAM that is less than the model is not going to improve things much. More RAM that is more than the model is largely dead weight.

I don't think larger models are going to show any major inference improvements. They hit the long tail of diminishing returns re: model training vs quality of output at least 2 years ago.

I think the best anyone can hope for in optimizing current LLM technology is improve the performance of inference engines, and there at most I can imagine only about a 5x improvement. That would be a really long tail of performance optimizations that would take at least a decade to achieve. In the 1 to 2 year timeline, I think the best that could be hoped for is a 2x improvement. But I think we may have already seen much of the low hanging optimization fruit already picked, and are starting to turn the curve into that long tail of incremental improvements.

I think everyone betting on LLMs improving the performance of junior to mid level devs and that leading to a Renaissance of software development speed is wildly over optimistic as to the total contribution to productivity those developers already represent. Most of the most important features are banged out by harried, highly skilled senior developers. Most everyone else is cleaning up around the edges of that. Even a 2 or 3x improvement of the bottom 10% of contributions is only going to grow the pie just so much. And I think these tools are basically useless to skilled senior devs. All this "boilerplate" code folks keep cheering the AI is writing for them is just not that big of a deal. 15 minutes of savings once a month.

But I see how this technology works and what people are asking it to do (which in my company is basically "all the hard work that you already weren't doing, so how are you going to even instruct an LLM to do it if you don't really know how to do it?") and there is such a huge gap between the two that I think it's going to take at least a 100x improvement to get there.

I can't see AI being all that much of an improvement on productivity. It still gives wrong results too many times. The work needed to make it give good results is the same sort of work we should have been doing already to be able to leverage classical ML systems with more predictable performance and output. We're going to spend trillions as an industry trying to chase AI that will only end up being an exercise in making sure documents are stored in a coherent, searchable way. At which point, why not do just that and avoid having to pressure the energy industry to firing up a bunch of old coal plants to meet demand?

replies(1): >>44631767 #

6. zer00eyz ◴[20 Jul 25 20:37 UTC] No.44629088[source]▶

>>44628024 #

> Why wouldn’t 3rd party hardware vendors continue to work on reducing costs of running models locally?

Every one wants this to happen they are all trying but...

EUV, what has gotten us down to 3nm and less is HARD. Reduction in chip size has lead to increases in density and lower costs. But now yields are DOWN and the design concessions to make the processes work are hurting costs and performance. There are a lot of hopes and prayers in the 1.8 nodes but things look grim.

Power is a massive problem for everyone. It is a MASSIVE a problem IN the data center and it is a problem for GPU's at home. Considering that locally is a PHONE for most people it's an even bigger problem. With all this power comes cooling issues. The industry is starting to look at all sorts of interesting ways to move heat away from cores... ones that don't involve air.

Design has hit a wall as well. If you look at NVIDIA's latest offering its IPC, (thats Instructions Per Clock cycle) you will find they are flat. The only gains between the latest generation and previous have come from small frequency upticks. These gains came from using "more power!!!", and thats a problem because...

Memory is a problem. There is a reason that the chips for GPU's are soldered on to the boards next to the processors. There is a reason that laptops have them soldered on too. CAMM try's to fix some of this but the results are, to say the least, disappointing thus far.

All of this has been hitting cpu's slowly, but we have also had the luxury of "more cores" to throw at things. If you go back 10-15 years a top end server is about the same as a top end desktop today (core count, single core perf). Because of all of the above issues I don't think you are going to get 700+ core consumer desktops in a decade (current high end for server CPU)... because of power, costs etc.

Unless we see some foundational breakthrough in hardware (it could happen), you wont see the normal generational lift in performance that you have in the past (and I would argue that we already haven't been seeing that). Someone is going to have to make MAJOR investments in the software side, and there is NO MOAT by doing so. Simply put it's a bad investment... and if we can't lower the cost of compute (and it looks like we can't) its going to be hard for small players to get in and innovate.

It's likely you're seeing a very real wall.

7. viraptor ◴[20 Jul 25 22:03 UTC] No.44629754{3}[source]▶

>>44628211 #

> What problem in complexity theory was solved

None. We're still in the "if you spend enough effort you can make things less bad" era of LLMs. It will be a while before we even find out what are the theoretical limits in that area. Everyone's still running on roughly the same architecture after all - big corps haven't even touched recursive LLMs yet!

8. Aurornis ◴[21 Jul 25 00:43 UTC] No.44630780[source]▶

>>44627695 (TP) #

> We seem to be near the limit of scaling (physics) we're not seeing a lot in clock (some but not enough), and IPC is flat. We are also having power (density) and cooling (air wont cut it any more) issues.

This is exaggeration. CPUs are still getting faster. IPC is increasing, not flat. Cooling on air is fine unless you’re going for high density or low noise.

This is just cynicism. Even an M4 MacBook Pro is substantially faster than an M1 from a few years ago, which is substantially faster than the previous versions.

Server chips are scaling core counts and bandwidth. GPUs are getting faster and faster.

The only way you could conclude scaling is dead is if you ignored all recent progress or you’re expecting improvements at an unrealistically fast rate.

replies(1): >>44640912 #

9. bluefirebrand ◴[21 Jul 25 04:17 UTC] No.44631767{4}[source]▶

>>44628648 #

> And I think these tools are basically useless to skilled senior devs. All this "boilerplate" code folks keep cheering the AI is writing for them is just not that big of a deal. 15 minutes of savings once a month

Yep... Copy and paste with find and replace already had the boilerplate code covered

10. zer00eyz ◴[21 Jul 25 22:02 UTC] No.44640912[source]▶

>>44630780 #

> IPC is increasing, not flat.

Benchmarks going up is not IPC increasing. These are separate things.

Please look IPC for the latest GPU's from Nvidia, the latest CPU's from AMD. The IPC is flat. See intel loosing credibility with failing processors due to power problems from clocking because IPC is flat.

> Even an M4 MacBook Pro is substantially faster than an M1

Again, clocking. m4 (non pro) vs m1 are so close in IPC on common tasks that its negligible. The performance gains between the two are from memory bandwidth not core performance.

> Server chips are scaling core counts

Parallelism is not the same as performance. Intel dropping the "core duo" 20 year ago was that RUNNING at 2ghz was an admission that single threading was ending. 20 years on were 20 cores deep (consumer), and only at 4ghz with "boost clocks" (back to that pesky power and cooling problem).

And this product still exists today: the N150 (close enough). Its has lower power consumption and more cores. And what was the single core performance gain? 35% Improvement in 20 years.

None of these things are running any of the LLM's that power the tools were talking about. Those are in the datacenter. 700 core CPU's, 400-800gbps top of rack switching are the bleeding edge. This is where "power" and cooling have hit the wall. The spacing requirements of a bleeding edge NVIDIA install are impacting the costs of interconnect between systems. Lots of fiber and needing to be spaced out because of power/heat adds up to a boat load of extra networking costs. Having half empty racks because of density is now a reality.

And you see these same issues at home: power demands of GPU's for consumers and workstations are thought he roof. Were past what the PCI spec can provide, all that power is heat and has to go somewhere. Sometimes it burns up poorly designed connectors. The latest gen is consumes even more power, to push clocks higher, for very little gain (see flat IPC nvida).

↑