Meta's open AI hardware vision

1. m_ke ◴[15 Oct 24 19:20 UTC] No.41852127[source]▶

I wonder when Meta, Microsoft and OpenAI will partner on an open chip design to compete with NVIDIA.

They’re all blowing billions of dollars on NVIDIA hardware with like 70% margin and with triton backing PyTorch it shouldn’t be that hard to move off of CUDA stack.

replies(6): >>41852135 #>>41852183 #>>41852200 #>>41852264 #>>41852336 #>>41852372 #

2. ◴[15 Oct 24 19:21 UTC] No.41852135[source]▶

>>41852127 (TP) #

3. throwaway48476 ◴[15 Oct 24 19:25 UTC] No.41852183[source]▶

>>41852127 (TP) #

Each of them is designing their own hardware. The goal isn't really to compete with nvidia though, whose market is general purpose GPU compute. Instead they're customizing hardware for inference to drive down product cost.

4. infecto ◴[15 Oct 24 19:27 UTC] No.41852200[source]▶

>>41852127 (TP) #

What is a large mount of money to you, is not that significant to some of these companies. I suspect for the vast majority of these companies, it still represents a small expense. General sentiment is that there is probably overspending in the area but its better to spend it and not risk being left behind.

replies(1): >>41853535 #

5. insane_dreamer ◴[15 Oct 24 19:33 UTC] No.41852264[source]▶

>>41852127 (TP) #

I can't see Meta and MSFT getting into the chip design business. Maybe OpenAI.

Apple's already not using nVidia chips to train its models.

replies(2): >>41852318 #>>41853096 #

6. mhh__ ◴[15 Oct 24 19:39 UTC] No.41852318[source]▶

>>41852264 #

I think Microsoft have dabbled in the space already.

7. mhh__ ◴[15 Oct 24 19:41 UTC] No.41852336[source]▶

>>41852127 (TP) #

It would require a fairly big bet on AI models not changing structure all that much, I guess

I think if the tooling and supply chain falls into place it would surprise me if meta and friends didn't make their own chips, assuming it was a good fit of course.

Note: AMD's missed opportunity here is so bad people jump "make their own chip" rather than "buy AMD". Although watch that space.

replies(1): >>41853437 #

8. wmf ◴[15 Oct 24 19:44 UTC] No.41852372[source]▶

>>41852127 (TP) #

https://ai.meta.com/blog/next-generation-meta-training-infer...

https://azure.microsoft.com/en-us/blog/azure-maia-for-the-er...

They're not teaming up and neither design is open, but they definitely have their own designs.

9. mhandley ◴[15 Oct 24 21:03 UTC] No.41853096[source]▶

>>41852264 #

Meta is already in the chip design business, albeit in collaboration with Broadcom who build it for them: https://ai.meta.com/blog/next-generation-meta-training-infer...

10. ClassyJacket ◴[15 Oct 24 21:45 UTC] No.41853437[source]▶

>>41852336 #

Aren't they basically safe as long as future AI models are still basically multiplying tensors of floats?

replies(1): >>41861538 #

11. m_ke ◴[15 Oct 24 21:57 UTC] No.41853535[source]▶

>>41852200 #

Half of NVIDIA's 2nd quarter revenue (30 billion) came from 4 customers, with Microsoft and Meta already having spent 40-60 billion each on GPU data centers (of which most goes to NVIDIA). "Open"AI just raised a few billion and is supposedly planning on building their own training clusters soon.

For a small fraction of that they could poach a ton of people from NVIDIA and publish a new open chip spec that anyone could manufacture.

https://www.fool.com/investing/2024/09/12/46-nvidias-30-bill...

replies(1): >>41854859 #

12. infecto ◴[16 Oct 24 01:48 UTC] No.41854859{3}[source]▶

>>41853535 #

That again underestimates the challenges in that undertaking. After all these costs are still drops on the bucket. Why distract yourself from your business to go and build chips.

They all use SFDC, should they go and create and open source sales platform?

replies(1): >>41855334 #

13. m_ke ◴[16 Oct 24 03:29 UTC] No.41855334{4}[source]▶

>>41854859 #

https://developers.facebook.com/blog/post/2021/09/07/eli5-op...

That's exactly what they id with their server design.

I'm saying come up with an open standard for tensor processing chips, with open drivers and core compute libraries, then let hardware vendors innovate and compete to drive down the price.

Meta spent like 10% of their revenue on ML hardware, it's not a drop in a bucket and with model scaling and large scale deployment these costs are not going down. https://www.datacenterdynamics.com/en/news/meta-to-operate-6...

14. cma ◴[16 Oct 24 17:26 UTC] No.41861538{3}[source]▶

>>41853437 #

Training uses other operations for normalizing, etc.