LLM Inevitabilism

(tomrenner.com)

Show context

delichon ◴[15 Jul 25 04:49 UTC] No.44567913[source]▶

If in 2009 you claimed that the dominance of the smartphone was inevitable, it would have been because you were using one and understood its power, not because you were reframing away our free choice for some agenda. In 2025 I don't think you can really be taking advantage of AI to do real work and still see its mass adaptation as evitable. It's coming faster and harder than any tech in history. As scary as that is we can't wish it away.

replies(17): >>44567949 #>>44567951 #>>44567961 #>>44567992 #>>44568002 #>>44568006 #>>44568029 #>>44568031 #>>44568040 #>>44568057 #>>44568062 #>>44568090 #>>44568323 #>>44568376 #>>44568565 #>>44569900 #>>44574150 #

NBJack ◴[15 Jul 25 05:00 UTC] No.44567951[source]▶

>>44567913 #

Ironically, this is exactly the technique for arguing that the blog mentions.

Remember the revolutionary, seemingly inevitable tech that was poised to rewrite how humans thought about transportation? The incredible amounts of hype, the secretive meetings disclosing the device, etc.? That turned out to be the self-balancing scooter known as a Segway?

replies(12): >>44567966 #>>44567973 #>>44567981 #>>44567984 #>>44567993 #>>44568067 #>>44568093 #>>44568163 #>>44568336 #>>44568442 #>>44568656 #>>44569295 #

HPsquared ◴[15 Jul 25 05:03 UTC] No.44567966[source]▶

>>44567951 #

1. The Segway had very low market penetration but a lot of PR. LLMs and diffusion models have had massive organic growth.

2. Segways were just ahead of their time: portable lithium-ion powered urban personal transportation is getting pretty big now.

replies(3): >>44568065 #>>44568101 #>>44568795 #

1. jdiff ◴[15 Jul 25 05:24 UTC] No.44568065[source]▶

>>44567966 #

Massive, organic, and unprofitable. And as soon as it's no longer free, as soon as the VC funding can no longer sustain it, an enormous fraction of usage and users will all evaporate.

The Segway always had a high barrier to entry. Currently for ChatGPT you don't even need an account, and everyone already has a Google account.

replies(2): >>44568094 #>>44568113 #

2. lumost ◴[15 Jul 25 05:30 UTC] No.44568094[source]▶

>>44568065 (TP) #

The free tiers might be tough to sustain, but it’s hard to imagine that they are that problematic for OpenAI et al. GPUs will become cheaper, and smaller/faster models will reach the same level of capability.

replies(2): >>44572152 #>>44573321 #

3. etaioinshrdlu ◴[15 Jul 25 05:33 UTC] No.44568113[source]▶

>>44568065 (TP) #

This is wrong because LLMs are cheap enough to run profitably on ads alone (search style or banner ad style) for over 2 years now. And they are getting cheaper over time for the same quality.

It is even cheaper to serve an LLM answer than call a web search API!

Zero chance all the users evaporate unless something much better comes along, or the tech is banned, etc...

replies(1): >>44568161 #

4. scubbo ◴[15 Jul 25 05:41 UTC] No.44568161[source]▶

>>44568113 #

> LLMs are cheap enough to run profitably on ads alone

> It is even cheaper to serve an LLM answer than call a web search API

These, uhhhh, these are some rather extraordinary claims. Got some extraordinary evidence to go along with them?

replies(2): >>44568184 #>>44568437 #

5. haiku2077 ◴[15 Jul 25 05:46 UTC] No.44568184{3}[source]▶

>>44568161 #

https://www.snellman.net/blog/archive/2025-06-02-llms-are-ch..., also note the "objections" section

Anecdotally thanks to hardware advancements the locally-run AI software I develop has gotten more than 100x faster in the past year thanks to Moore's law

replies(2): >>44568256 #>>44568289 #

6. oblio ◴[15 Jul 25 06:00 UTC] No.44568256{4}[source]▶

>>44568184 #

What hardware advancement? There's hardly any these days... Especially not for this kind of computing.

replies(2): >>44568338 #>>44568593 #

7. ◴[15 Jul 25 06:06 UTC] No.44568289{4}[source]▶

>>44568184 #

8. Sebguer ◴[15 Jul 25 06:14 UTC] No.44568338{5}[source]▶

>>44568256 #

Have you heard of TPUs?

replies(2): >>44568390 #>>44568668 #

9. oblio ◴[15 Jul 25 06:25 UTC] No.44568390{6}[source]▶

>>44568338 #

Yeah, I'm a regular Joe. How do I get one and how much does it cost?

replies(1): >>44568723 #

10. etaioinshrdlu ◴[15 Jul 25 06:33 UTC] No.44568437{3}[source]▶

>>44568161 #

I've operated a top ~20 LLM service for over 2 years, very comfortably profitably with ads. As for the pure costs you can measure the cost of getting an LLM answer from say, OpenAI, and the equivalent search query from Bing/Google/Exa will cost over 10x more...

replies(3): >>44568690 #>>44570293 #>>44571469 #

11. haiku2077 ◴[15 Jul 25 06:53 UTC] No.44568593{5}[source]▶

>>44568256 #

Specifically, I upgraded my mac and ported my software, which ran on Windows/Linux, to macos and Metal. Literally >100x faster in benchmarks, and overall user workflows became fast enough I had to "spend" the performance elsewhere or else the responses became so fast they were kind of creepy. Have a bunch of _very_ happy users running the software 24/7 on Mac Minis now.

replies(1): >>44576206 #

12. Dylan16807 ◴[15 Jul 25 07:03 UTC] No.44568668{6}[source]▶

>>44568338 #

Sort of a hardware advancement. I'd say it's more of a sidegrade between different types of well-established processor. Take out a couple cores, put in some extra wide matrix units with accumulators, watch the neural nets fly.

But I want to point out that going from CPU to TPU is basically the opposite of a Moore's law improvement.

13. clarinificator ◴[15 Jul 25 07:06 UTC] No.44568690{4}[source]▶

>>44568437 #

Profitably covering R&D or profitably using the subsidized models?

replies(1): >>44579917 #

14. Dylan16807 ◴[15 Jul 25 07:13 UTC] No.44568723{7}[source]▶

>>44568390 #

If your goal is "a TPU" then you buy a mac or anything labeled Copilot+. You'll need about $600. RAM is likely to be your main limit.

(A mid to high end GPU can get similar or better performance but it's a lot harder to get more RAM.)

replies(2): >>44568946 #>>44569079 #

15. haiku2077 ◴[15 Jul 25 07:53 UTC] No.44568946{8}[source]▶

>>44568723 #

$500 if you catch a sale at Costco or Best Buy!

16. oblio ◴[15 Jul 25 08:19 UTC] No.44569079{8}[source]▶

>>44568723 #

I want something I can put in my own PC. GPUs are utterly insane in pricing, since for the good stuff you need at least 16GB but probably a lot more.

replies(1): >>44569167 #

17. Dylan16807 ◴[15 Jul 25 08:34 UTC] No.44569167{9}[source]▶

>>44569079 #

9060 XT 16GB, $360

5060 Ti 16GB, $450

If you want more than 16GB, that's when it gets bad.

And you should be able to get two and load half your model into each. It should be about the same speed as if a single card had 32GB.

replies(1): >>44576172 #

18. johnecheck ◴[15 Jul 25 12:08 UTC] No.44570293{4}[source]▶

>>44568437 #

So you don't have any real info on the costs. The question is what OpenAI's profit margin is here, not yours. The theory is that these costs are subsidized by a flow of money from VCs and big tech as they race.

How cheap is inference, really? What about 'thinking' inference? What are the prices going to be once growth starts to slow and investors start demanding returns on their billions?

replies(2): >>44570890 #>>44573233 #

19. jsnell ◴[15 Jul 25 13:24 UTC] No.44570890{5}[source]▶

>>44570293 #

Every indication we have is that pay-per-token APIs are not subsidized or even break-even, but have very high margins. The market dynamics are such that subsidizing those APIs wouldn't make much sense.

The unprofitability of the frontier labs is mostly due to them not monetizing the majority of their consumer traffic at all.

20. throwawayoldie ◴[15 Jul 25 14:24 UTC] No.44571469{4}[source]▶

>>44568437 #

So you're not running an LLM, you're running a service built on top of a subsidized API.

replies(1): >>44574117 #

21. throwawayoldie ◴[15 Jul 25 15:24 UTC] No.44572152[source]▶

>>44568094 #

[citation needed]

replies(1): >>44573372 #

22. etaioinshrdlu ◴[15 Jul 25 16:52 UTC] No.44573233{5}[source]▶

>>44570293 #

It would be profitable even if we self-hosted the LLMs, which we've done. The only thing subsidized is the training costs. So maybe people will one day stop training AI models.

23. ◴[15 Jul 25 17:00 UTC] No.44573321[source]▶

>>44568094 #

24. jdiff ◴[15 Jul 25 17:03 UTC] No.44573372{3}[source]▶

>>44572152 #

Eh, I kinda see what they're saying. They haven't become cheaper at all, but GPUs have increased in performance, and the amount of performance you get for each dollar spent has increased.

Relative to its siblings, things have gotten worse. A GTX 970 could hit 60% of the performance of the full Titan X at 35% of the price. A 5070 hits 40% of a full 5090 for 27% of the price. That's overall less series-relative performance you're getting, for an overall increased price, by about $100 when adjusting for inflation.

But if you have a fixed performance baseline you need to hit, as long as tech gets improving, things will eventually be cheaper for that baseline. As long as you aren't also trying to improve in a way that moves the baseline up. Which so far has been the only consistent MO of the AI industry.

25. ◴[15 Jul 25 18:06 UTC] No.44574117{5}[source]▶

>>44571469 #

26. oblio ◴[15 Jul 25 21:49 UTC] No.44576172{10}[source]▶

>>44569167 #

> And you should be able to get two and load half your model into each. It should be about the same speed as if a single card had 32GB.

This seems super duper expensive and not really supported by the more reasonably priced Nvidia cards, though. SLI is deprecated, NVLink isn't available everywhere, etc.

replies(1): >>44576381 #

27. oblio ◴[15 Jul 25 21:53 UTC] No.44576206{6}[source]▶

>>44568593 #

The thing is, these kinds of optimizations happen all the time. Some of them can be as simple as using a hashmap instead of some home-baked data structure. So what you're describing is not necessarily some LLM specific improvement (though in your case it is, we can't generalize to every migration of a feature to an LLM).

And nothing I've seen about recent GPUs or TPUs, from ANY maker (Nvidia, AMD, Google, Amazon, etc) say anything about general speedups of 100x. Heck, if you go across multiple generations of what are still these very new types of hardware categories, for example for Amazon's Inferentia/Trainium, even their claims (which are quite bold), would probably put the most recent generations at best at 10x the first generations. And as we all know, all vendors exaggerate the performance of their products.

28. Dylan16807 ◴[15 Jul 25 22:11 UTC] No.44576381{11}[source]▶

>>44576172 #

No, no, nothing like that.

Every layer of an LLM runs separately and sequentially, and there isn't much data transfer between layers. If you wanted to, you could put each layer on a separate GPU with no real penalty. A single request will only run on one GPU at a time, so it won't go faster than a single GPU with a big RAM upgrade, but it won't go slower either.

replies(1): >>44579300 #

29. oblio ◴[16 Jul 25 06:28 UTC] No.44579300{12}[source]▶

>>44576381 #

Interesting, thank you for the feedback, it's definitely worth looking into!

30. guappa ◴[16 Jul 25 08:20 UTC] No.44579917{5}[source]▶

>>44568690 #

He was doing neither. He was using a 3rd party API and has no idea what it costs them to actually run it.

↑