OpenAI delays launch of open-weight model

1. mystraline ◴[12 Jul 25 01:46 UTC] No.44538610[source]▶

To be completely and utterly fair, I trust Deepseek and Qwen (Alibaba) more than American AI companies.

American AI companies have shown they are money and compute eaters, and massively so at that. Billions later, and well, not much to show.

But Deepseek cost $5M to develop, and made multiple novel ways to train.

Oh, and their models and code are all FLOSS. The US companies are closed. Basically, the US ai companies are too busy treating each other as vultures.

replies(8): >>44538670 #>>44538694 #>>44538700 #>>44538816 #>>44538905 #>>44539727 #>>44540309 #>>44540945 #

2. ryao ◴[12 Jul 25 02:01 UTC] No.44538670[source]▶

>>44538610 (TP) #

Wasn’t that figure just the cost of the GPUs and nothing else?

replies(3): >>44538699 #>>44538709 #>>44538740 #

3. kamranjon ◴[12 Jul 25 02:06 UTC] No.44538694[source]▶

>>44538610 (TP) #

Actually the majority of Google models are open source and they also were pretty fundamental in pushing a lot of the techniques in training forward - working in the AI space I’ve read quite a few of their research papers and I really appreciate what they’ve done to share their work and also release their models under licenses that allow you to use them for commercial purposes.

replies(1): >>44538806 #

4. rynn ◴[12 Jul 25 02:07 UTC] No.44538699[source]▶

>>44538670 #

It was more than $5m

https://interestingengineering.com/culture/deepseeks-ai-trai...

5. Aunche ◴[12 Jul 25 02:08 UTC] No.44538700[source]▶

>>44538610 (TP) #

$5 million was the gpu hour cost of a single training run.

replies(1): >>44539046 #

6. rpdillon ◴[12 Jul 25 02:09 UTC] No.44538709[source]▶

>>44538670 #

Yeah, I hate that this figure keeps getting thrown around. IIRC, it's the price of 2048 H800s for 2 months at $2/hour/GPU. If you consider months to be 30 days, that's around $5.7M, which lines up. What doesn't line up is ignoring the costs of facilities, salaries, non-cloud hardware, etc. which will dominate costs, I'd expect. $100M seems like a fairer estimate, TBH. The original paper had more than a dozen authors, and DeepSeek had about 150 researchers working on R1, which supports the notion that personnel costs would likely dominate.

replies(1): >>44539421 #

7. 3eb7988a1663 ◴[12 Jul 25 02:15 UTC] No.44538740[source]▶

>>44538670 #

That is also just the final production run. How many experimental runs were performed before starting the final batch? It could be some ratio like 10 hours of research to every one hour of final training.

8. simonw ◴[12 Jul 25 02:30 UTC] No.44538806[source]▶

>>44538694 #

"Actually the majority of Google models are open source"

That's not accurate. The Gemini family of models are all proprietary.

Google's Gemma models (which are some of the best available local models) are open weights but not technically OSI-compatible open source - they come with usage restrictions: https://ai.google.dev/gemma/terms

replies(1): >>44539023 #

9. IncreasePosts ◴[12 Jul 25 02:33 UTC] No.44538816[source]▶

>>44538610 (TP) #

Deepseek R1 was trained at least partially on the output of other LLMs. So, it might have been much more expensive if they needed to do it themselves from scratch.

replies(1): >>44538913 #

10. refulgentis ◴[12 Jul 25 02:52 UTC] No.44538905[source]▶

>>44538610 (TP) #

> Billions later, and well, not much to show.

This is obviously false, I'm curious why you included it.

> Oh, and their models and code are all FLOSS.

No?

11. nomel ◴[12 Jul 25 02:54 UTC] No.44538913[source]▶

>>44538816 #

Lawsuit, since it was against OpenAI TOS: https://hls.harvard.edu/today/deepseek-chatgpt-and-the-globa...

12. kamranjon ◴[12 Jul 25 03:22 UTC] No.44539023{3}[source]▶

>>44538806 #

You’re ignoring the T5 series of models that were incredibly influential, the T5 models and their derivatives (FLAN-T5, Long-T5, ByT5, etc) have been downloaded millions of times on huggingface and are real workhorses. There are even variants still being produced within the last year or so.

A yea the Gemma series is incredible and while maybe not meeting the standards of OSI - I consider them to be pretty open as far as local models go. And it’s not just the standard Gemma variants, Google is releasing other incredible Gemma models that I don’t think people have really even caught wind of yet like MedGemma, of which the 4b variant has vision capability.

I really enjoy their contributions to the open source AI community and think it’s pretty substantial.

13. dumbmrblah ◴[12 Jul 25 03:27 UTC] No.44539046[source]▶

>>44538700 #

Exactly. Not to minimize Deepseeks tremendous achievement, but that $5 million was just for the training run, not the GPUs used they purchased before, and all the OpenAI API calls they likely used to assist in synthetic data generation.

14. moralestapia ◴[12 Jul 25 04:59 UTC] No.44539421{3}[source]▶

>>44538709 #

>ignoring the costs of facilities, salaries, non-cloud hardware, etc.

If you lease, those costs are amortized. It was definitely more than $5M, but I don't think it was as high as $100M. All things considered, I still believe Deepseek was trained at one (perhaps two) orders of magnitude lower cost than other competing models.

replies(1): >>44541610 #

15. NitpickLawyer ◴[12 Jul 25 06:07 UTC] No.44539727[source]▶

>>44538610 (TP) #

> But Deepseek cost $5M to develop, and made multiple novel ways to train

This is highly contested, and was either a big misunderstanding by everyone reporting it, or maliciously placed there (by a quant company, right before the stock fell a lot for nvda and the rest) depending on who you ask.

If we're being generous and assume no malicious intent (big if), anyone who has trained a big model can tell you that the cost of 1 run is useless in the big scheme of things. There is a lot of cost in getting there, in the failed runs, in the subsequent runs, and so on. The fact that R2 isn't there after ~6 months should say a lot. Sometimes you get a great training run, but no-one is looking at the failed ones and adding up that cost...

replies(1): >>44539854 #

16. jampa ◴[12 Jul 25 06:36 UTC] No.44539854[source]▶

>>44539727 #

They were pretty explicit that this was only the cost in GPU hours to USD for the final run. Journalists and Twitter tech bros just saw an easy headline there. It's the same with Clair Obscur developer's Sandfall, where the people say that the game was made by 30 people, when there were 200 people involved.

replies(2): >>44540638 #>>44540645 #

17. buyucu ◴[12 Jul 25 08:29 UTC] No.44540309[source]▶

>>44538610 (TP) #

Deepseek is far more worthy of the name OpenAI than Sam Altman's ClosedAI.

18. badsectoracula ◴[12 Jul 25 09:29 UTC] No.44540638{3}[source]▶

>>44539854 #

These "200 people" were counted from credits which list pretty much everyone who even sniffed at the general direction of the studio's direction. The studio itself is ~30 people (just went and check on their website, they have a team list with photos for everyone). The rest are contractors whose contributions usually vary wildly. Besides, credits are free so unless the the company are petty (see Rockstar not crediting people on their games if they leave before the game is released even if they worked on it for years) people err on the site on crediting everyone. Personally i've been credited on a game that used a library i wrote once and i learned about it years after the release.

Most importantly those who mention that the game was made by 30 people do it to compare it with other much larger teams with hundreds if not thousands of people and those teams use contractors too!

19. NitpickLawyer ◴[12 Jul 25 09:31 UTC] No.44540645{3}[source]▶

>>44539854 #

> They were pretty explicit that this was only the cost in GPU hours to USD for the final run.

The researchers? Yes.

What followed afterwards, I'm not so sure. There was clearly some "cheap headlines" in the media, but there were also some weird coverage being pushed everywhere, from weird tlds, and they were all pushing nvda dead, cheap deepseek, you can run it on raspberries, etc. That might have been a campaign designed to help short the stocks.

20. baobabKoodaa ◴[12 Jul 25 10:27 UTC] No.44540945[source]▶

>>44538610 (TP) #

> American AI companies have shown they are money and compute eaters

Don't forget they also quite literally eat books

replies(1): >>44541808 #

21. rpdillon ◴[12 Jul 25 12:34 UTC] No.44541610{4}[source]▶

>>44539421 #

Perhaps. Do you think DeepSeek made use of those competing models at all in order to train theirs?

22. knicholes ◴[12 Jul 25 13:07 UTC] No.44541808[source]▶

>>44540945 #

Who is literally eating books?