Anthropic raises $13B Series F

(www.anthropic.com)

585 points meetpateltech | 1 comments | 02 Sep 25 16:04 UTC | HN request time: 0.296s | source

Show context

llamasushi ◴[02 Sep 25 16:29 UTC] No.45105325[source]▶

The compute moat is getting absolutely insane. We're basically at the point where you need a small country's GDP just to stay in the game for one more generation of models.

What gets me is that this isn't even a software moat anymore - it's literally just whoever can get their hands on enough GPUs and power infrastructure. TSMC and the power companies are the real kingmakers here. You can have all the talent in the world but if you can't get 100k H100s and a dedicated power plant, you're out.

Wonder how much of this $13B is just prepaying for compute vs actual opex. If it's mostly compute, we're watching something weird happen - like the privatization of Manhattan Project-scale infrastructure. Except instead of enriching uranium we're computing gradient descents lol

The wildest part is we might look back at this as cheap. GPT-4 training was what, $100M? GPT-5/Opus-4 class probably $1B+? At this rate GPT-7 will need its own sovereign wealth fund

replies(48): >>45105396 #>>45105412 #>>45105420 #>>45105480 #>>45105535 #>>45105549 #>>45105604 #>>45105619 #>>45105641 #>>45105679 #>>45105738 #>>45105766 #>>45105797 #>>45105848 #>>45105855 #>>45105915 #>>45105960 #>>45105963 #>>45105985 #>>45106070 #>>45106096 #>>45106150 #>>45106272 #>>45106285 #>>45106679 #>>45106851 #>>45106897 #>>45106940 #>>45107085 #>>45107239 #>>45107242 #>>45107347 #>>45107622 #>>45107915 #>>45108298 #>>45108477 #>>45109495 #>>45110545 #>>45110824 #>>45110882 #>>45111336 #>>45111695 #>>45111885 #>>45111904 #>>45111971 #>>45112441 #>>45112552 #>>45113827 #

duxup ◴[02 Sep 25 16:33 UTC] No.45105396[source]▶

>>45105325 #

It's not clear to me that each new generation of models is going to be "that" much better vs cost.

Anecdotally moving from model to model I'm not seeing huge changes in many use cases. I can just pick an older model and often I can't tell the difference...

Video seems to be moving forward fast from what I can tell, but it sounds like the back end cost of compute there is skyrocketing with it raising other questions.

replies(9): >>45105636 #>>45105699 #>>45105746 #>>45105777 #>>45105835 #>>45106211 #>>45106364 #>>45106367 #>>45106463 #

derefr ◴[02 Sep 25 17:26 UTC] No.45106211[source]▶

>>45105396 #

> Anecdotally moving from model to model I'm not seeing huge changes in many use cases.

Probably because you're doing things that are hitting mostly the "well-established" behaviors of these models — the ones that have been stable for at least a full model-generation now, that the AI bigcorps are currently happy keeping stable (since they achieved 100% on some previous benchmark for those behaviors, and changing them now would be a regression per those benchmarks.)

Meanwhile, the AI bigcorps are focusing on extending these models' capabilities at the edge/frontier, to get them to do things they can't currently do. (Mostly this is inside-baseball stuff to "make the model better as a tool for enhancing the model": ever-better domain-specific analysis capabilities, to "logic out" whether training data belongs in the training corpus for some fine-tune; and domain-specific synthesis capabilities, to procedurally generate unbounded amounts of useful fine-tuning corpus for specific tasks, ala AlphaZero playing unbounded amounts of Go games against itself to learn on.)

This means that the models are getting constantly bigger. And this is unsustainable. So, obviously, the goal here is to go through this as a transitionary bootstrap phase, to reach some goal that allows the size of the models to be reduced.

IMHO these models will mostly stay stable-looking for their established consumer-facing use-cases, while slowly expanding TAM "in the background" into new domain-specific use-cases (e.g. constructing novel math proofs in iterative cooperation with a prover) — until eventually, the sum of those added domain-specific capabilities will turn out to have all along doubled as a toolkit these companies were slowly building to "use models to analyze models" — allowing the AI bigcorps to apply models to the task of optimizing models down to something that run with positive-margin OpEx on whatever hardware that would be available at that time 5+ years down the line.

And then we'll see them turn to genuinely improving the model behavior for consumer use-cases again; because only at that point will they genuinely be making money by scaling consumer usage — rather than treating consumer usage purely as a marketing loss-leader paid for by the professional usage + ongoing capital investment that that consumer usage inspires.

replies(3): >>45106382 #>>45106411 #>>45111625 #

Workaccount2 ◴[02 Sep 25 17:38 UTC] No.45106411[source]▶

>>45106211 #

>Mostly this is inside-baseball stuff to "make the model better as a tool for enhancing the model"

Last week I put GPT-5 and Gemini 2.5 in a conversation with each other about a topic of GPT-5's choosing. What did it pick?

Improving LLMs.

The conversation was far over my head, but the two seemed to be readily able to get deep into the weeds on it.

I took it as a pretty strong signal that they have an extensive training set of transformer/LLM tech.

replies(1): >>45107117 #

1. temp0826 ◴[02 Sep 25 18:29 UTC] No.45107117[source]▶

>>45106411 #

Like trying to have a lunch conversation with coworkers about anything other than work

↑