Most active commenters
  • x0x0(3)
  • steveklabnik(3)

←back to thread

152 points isoprophlex | 24 comments | | HN request time: 0.623s | source | bottom
1. daft_pink ◴[] No.45645040[source]
I think this is a minor speed bump and VC’s believe that cost of inference will decrease over time and this is a gold rush to grab market share while cost of inference declines.

I don’t think they got it right and the market share and usage grew faster than inference dropped, but inference costs will clearly drop and these companies will eventually be very profitable.

Reality is that startups like this assume moore’s law will drop the cost over time and arrange their business around where they expect costs to be and not where costs currently are.

replies(6): >>45645108 #>>45645191 #>>45645220 #>>45645347 #>>45645403 #>>45645748 #
2. Frieren ◴[] No.45645108[source]
> I think this is a minor speed bump and VC’s believe that cost of inference will decrease over time and this is a gold rush to grab market share while cost of inference declines.

It could also be that you give too much credit to the market. People follow trends because in most cases that makes money. There is no other deeper though involved. Look at the financial crisis, totally irrational.

replies(1): >>45645866 #
3. xnx ◴[] No.45645191[source]
> inference costs will clearly drop and these companies will eventually be very profitable.

Inference costs for old models will drop, but inference costs may stay the same if models continue to improve.

No guarantee that any wrapper for inference will be able to hold on to customers when they stop selling $1.00 for $0.50.

replies(1): >>45645719 #
4. onlyrealcuzzo ◴[] No.45645220[source]
Isn't the consensus that the MOE architecture and other optimizations in the newest gen models (GPT-5, Gemini 3.0 to come, etc) will reduce inference costs by 50-75% already?
replies(2): >>45645591 #>>45646600 #
5. x0x0 ◴[] No.45645347[source]
> inference costs will clearly drop

They haven't though. On two fronts: 1, the soa models have been pretty constantly priced, and everyone wants the soa models. Likely the only way costs drop is the models get so good that people are like hey, I'm fine with a less useful answer (which is still good enough) and that seems, right now, like a bad bet.

and 2 - we use a lot more tokens now. No more pasting Q&A into a site; now people hammer up chunks of their codebases and would love to push more. More context, more thinking, more everything.

replies(3): >>45645435 #>>45645642 #>>45645775 #
6. Analemma_ ◴[] No.45645403[source]
My own usage and the usage of pretty much everyone I know says that as inference costs drop, usage goes up in lockstep, and I’m still nowhere near the ceiling of how many tokens I could use if they were free.

I think if these companies are gambling their future on COGS going down, that’s a gamble they’re going to lose.

7. ctoth ◴[] No.45645435[source]
You're describing increased spending while calling it increased cost. These aren't the same thing. A task that cost me $5 to accomplish with GPT-4 last year might cost $1 with Sonnet today, even though I'm now spending $100/month total on AI instead of $20 because I'm doing 100x more tasks. The cost per task dropped 80%. My spending went up 5x. Both statements are true.

Here's an analogy you may understand:

https://crespo.business/posts/cost-of-inference/

replies(1): >>45645594 #
8. yomismoaqui ◴[] No.45645591[source]
Sounds interesting, do you have some links with more info about this?

Thanks!

9. KallDrexx ◴[] No.45645594{3}[source]
Fwiw that's not necessarily true, because if Sonnet ends up using reasoning, then you are using more tokens than GPT-4 would have used for the same task. Same with GPT-5 since it will decide (using an LLM) if it should use the thinking model for it (and you don't have as much control over it).
replies(1): >>45645698 #
10. infecto ◴[] No.45645642[source]
Anecdote of 1. Costs for openai on a per token basis have absolutely dropped and that accounts for new sota models over time. I think by now we can all agree that inference costs from providers are largely at or above breakeven. So more tokens is a good problem to have.
11. steveklabnik ◴[] No.45645698{4}[source]
This is addressed in the post.
replies(1): >>45646216 #
12. ◴[] No.45645719[source]
13. username223 ◴[] No.45645748[source]
Color me skeptical. We're running into the speed of light when it comes to transistor size, and the parallelism that made neural nets take off is running into power demands. Where do the exponential hardware gains come from? Optimizing the software by 2x or 4x happens only once. Then there's the other side: if Moore's Law works too well, local models will be good enough for most tasks, and these companies won't be able to do the SaaS thing.

It seems to me like models' capability scales logarithmically with size and wattage, making them the rare piece of software that can counteract Moore's Law. That doesn't seem like a way to make a trillion dollars.

replies(1): >>45645841 #
14. dexwiz ◴[] No.45645775[source]
Point 2 was the analysis I saw. Context size and token cost grow inversely at a rate that keeps prices constant, almost like supply and demand curves.
15. throwaway290 ◴[] No.45645841[source]
One improvement is from scraping and stealing better quality IP to train on. And they can just ride Moore's law until they profit then lobby governments to require licenses for fast GPUs because of national security.
16. rglover ◴[] No.45645866[source]
This. Post-crypto, AI was the obvious next gambit for VC. Their money flows to hype, not product. The second that hype starts to fade and the money dries up, VCs will be running with their Harold Hill trunk full of cash toward the border. Just from the content they publish alone you can tell they're channeling their inner Barnum & Bailey in between Ayahuasca seizures.
17. x0x0 ◴[] No.45646216{5}[source]
Right, but if I understand you, the counterargument is dumb since the context in which we are discussing is business viability (vcs investing in businesses where the unit economics require inference cost decreases), so actual dollars out rather than some imaginary cost per token is the metric that matters.

Inference is getting so much cheaper that cursor and zed have had to raise prices.

replies(2): >>45646301 #>>45647083 #
18. steveklabnik ◴[] No.45646301{6}[source]
> so actual dollars out rather than some imaginary cost per token is the metric that matters.

Even if we take this as true, the point is that this is different than "the cost of inference isn't going down." It is going down, it's just that people want more performance, and are willing to pay for it. Spend going up is not the same as cost going up.

I don't disagree that there are a wide variety of things to talk about here, but that means it's extra important to get what you're talking about straight.

replies(1): >>45646445 #
19. x0x0 ◴[] No.45646445{7}[source]
Playing word games labeling inference narrowly as the cost per token rather than the per-X $ going to your llm api provider per customer/user/use/whatever is kinda silly?

The cost of inference -- ie $ that go to your llm api provider -- has increased and certainly appears to continue to increase.

see also https://ethanding.substack.com/p/ai-subscriptions-get-short-...

replies(1): >>45646884 #
20. ACCount37 ◴[] No.45646600[source]
Kind of. Frontier LLMs aren't going to get cheaper, but that's because the frontier keeps advancing.

Price-performance though? The trend is clear: a given level of LLM capability keeps getting cheaper, and that trend is expected to hold. Improvements in architecture and training make LLMs more capability-dense, and advanced techniques make inference cheaper.

replies(1): >>45648110 #
21. steveklabnik ◴[] No.45646884{8}[source]
> The cost of inference -- ie $ that go to your llm api provider

This is the crux of it: when talking about "the cost of inference" for the purposes of the unit economics of the business, what's being discussed is not what they charge you. It's about their COGs.

That's not word games. It's about being clear about what's being talked about.

Talking about increased prices is something that could be talked about! But it's a different thing. For example, what you're talking about here is total spend, not about individual pricing going up or down. That's also a third thing!

You can't come to agreement unless you agree on what's being discussed.

22. dcre ◴[] No.45647083{6}[source]
Why do the unit economics require a decrease in inference spend per user? This is discussed at the end of the post. I think this is based on the very strange assumption that these businesses must charge $20 a month no matter how much inference their customers want to use. This is precisely what the move to usage-based pricing was about. End users want to use more inference because they like it so much, and are knocking down these companies’ doors demanding to be allowed to pay them more money to get more inference.
23. onlyrealcuzzo ◴[] No.45648110{3}[source]
> Frontier LLMs aren't going to get cheaper

One of the main selling points to MOE, is that the architecture is designed such that you can re-train experts independently, as well as add new experts, change the size of an experts parameters, etc, without retraining the entire model.

If 80% of you usage comes from 20% of your experts, you can cut your future training costs SUBSTANTIALLY.

replies(1): >>45648428 #
24. ACCount37 ◴[] No.45648428{4}[source]
I don't think anyone has ever managed to pull off a success story with this kind of advanced expert manipulation.

It's not entirely impossible, but I remain skeptical until see a proof that it, first, works. And, second, that it actually has an advantage over "we'll just train another base model from scratch, but 10% larger, with those +5% performance architecture tweaks, and a new modality blender, and more of that good highly curated data in the dataset, and fresher data overall, and it'll be glorious".