←back to thread

Anthropic raises $13B Series F

(www.anthropic.com)
585 points meetpateltech | 3 comments | | HN request time: 0.001s | source
Show context
llamasushi ◴[] No.45105325[source]
The compute moat is getting absolutely insane. We're basically at the point where you need a small country's GDP just to stay in the game for one more generation of models.

What gets me is that this isn't even a software moat anymore - it's literally just whoever can get their hands on enough GPUs and power infrastructure. TSMC and the power companies are the real kingmakers here. You can have all the talent in the world but if you can't get 100k H100s and a dedicated power plant, you're out.

Wonder how much of this $13B is just prepaying for compute vs actual opex. If it's mostly compute, we're watching something weird happen - like the privatization of Manhattan Project-scale infrastructure. Except instead of enriching uranium we're computing gradient descents lol

The wildest part is we might look back at this as cheap. GPT-4 training was what, $100M? GPT-5/Opus-4 class probably $1B+? At this rate GPT-7 will need its own sovereign wealth fund

replies(48): >>45105396 #>>45105412 #>>45105420 #>>45105480 #>>45105535 #>>45105549 #>>45105604 #>>45105619 #>>45105641 #>>45105679 #>>45105738 #>>45105766 #>>45105797 #>>45105848 #>>45105855 #>>45105915 #>>45105960 #>>45105963 #>>45105985 #>>45106070 #>>45106096 #>>45106150 #>>45106272 #>>45106285 #>>45106679 #>>45106851 #>>45106897 #>>45106940 #>>45107085 #>>45107239 #>>45107242 #>>45107347 #>>45107622 #>>45107915 #>>45108298 #>>45108477 #>>45109495 #>>45110545 #>>45110824 #>>45110882 #>>45111336 #>>45111695 #>>45111885 #>>45111904 #>>45111971 #>>45112441 #>>45112552 #>>45113827 #
1. me551ah ◴[] No.45105549[source]
And distillation makes the compute moat irrelevant. You could spend trillions to train a model, but some companies is going to get enough data from your model and distill it's own at a much cheaper upfront cost. This would allow them to offer them for cheaper inference cost too, totally defeating the point of spending crazy money on training.
replies(1): >>45105769 #
2. fredoliveira ◴[] No.45105769[source]
A couple of counter-arguments:

Labs can just step up the way they track signs of prompts meant for model distillation. Distillation requires a fairly large number of prompt/response tuples, and I am quite certain that all of the main labs have the capability to detect and impede that type of use if they put their backs into it.

Distillation doesn't make the compute moat irrelevant. You can get good results from distillation, but (intuitively, maybe I'm wrong here because I haven't done evals on this myself) you can't beat the upstream model in performance. That means that most (albeit obviously not all) customers will simply gravitate toward the better performing model if the cost/token ratio is aligned for them.

Are there always going to be smaller labs? Sure, yes. Is the compute mote real, and does it matter? Absolutely.

replies(1): >>45107343 #
3. serf ◴[] No.45107343[source]
>Labs can just step up the way they track signs of prompts meant for model distillation. Distillation requires a fairly large number of prompt/response tuples, and I am quite certain that all of the main labs have the capability to detect and impede that type of use if they put their backs into it.

....while degrading their service for paying customers.

This is the same problem as law-enforcement-agency forwarding threats and training LLMs to avoid user-harm -- it's great if it works as intended, but more often than not it throws a lot more prompt cancellations at actual users by mistake, refuses queries erroneously -- and just ruins user experience.

i'm not convinced any of the groups can avoid distillation without ruining customer experience.