←back to thread

152 points isoprophlex | 3 comments | | HN request time: 0s | source
Show context
daft_pink ◴[] No.45645040[source]
I think this is a minor speed bump and VC’s believe that cost of inference will decrease over time and this is a gold rush to grab market share while cost of inference declines.

I don’t think they got it right and the market share and usage grew faster than inference dropped, but inference costs will clearly drop and these companies will eventually be very profitable.

Reality is that startups like this assume moore’s law will drop the cost over time and arrange their business around where they expect costs to be and not where costs currently are.

replies(6): >>45645108 #>>45645191 #>>45645220 #>>45645347 #>>45645403 #>>45645748 #
onlyrealcuzzo ◴[] No.45645220[source]
Isn't the consensus that the MOE architecture and other optimizations in the newest gen models (GPT-5, Gemini 3.0 to come, etc) will reduce inference costs by 50-75% already?
replies(2): >>45645591 #>>45646600 #
1. ACCount37 ◴[] No.45646600[source]
Kind of. Frontier LLMs aren't going to get cheaper, but that's because the frontier keeps advancing.

Price-performance though? The trend is clear: a given level of LLM capability keeps getting cheaper, and that trend is expected to hold. Improvements in architecture and training make LLMs more capability-dense, and advanced techniques make inference cheaper.

replies(1): >>45648110 #
2. onlyrealcuzzo ◴[] No.45648110[source]
> Frontier LLMs aren't going to get cheaper

One of the main selling points to MOE, is that the architecture is designed such that you can re-train experts independently, as well as add new experts, change the size of an experts parameters, etc, without retraining the entire model.

If 80% of you usage comes from 20% of your experts, you can cut your future training costs SUBSTANTIALLY.

replies(1): >>45648428 #
3. ACCount37 ◴[] No.45648428[source]
I don't think anyone has ever managed to pull off a success story with this kind of advanced expert manipulation.

It's not entirely impossible, but I remain skeptical until see a proof that it, first, works. And, second, that it actually has an advantage over "we'll just train another base model from scratch, but 10% larger, with those +5% performance architecture tweaks, and a new modality blender, and more of that good highly curated data in the dataset, and fresher data overall, and it'll be glorious".