←back to thread

584 points Alifatisk | 1 comments | | HN request time: 0.206s | source
Show context
okdood64 ◴[] No.46181759[source]
From the blog:

https://arxiv.org/abs/2501.00663

https://arxiv.org/pdf/2504.13173

Is there any other company that's openly publishing their research on AI at this level? Google should get a lot of credit for this.

replies(12): >>46181829 #>>46182057 #>>46182168 #>>46182358 #>>46182633 #>>46183087 #>>46183462 #>>46183546 #>>46183827 #>>46184875 #>>46186114 #>>46189989 #
Palmik ◴[] No.46184875[source]
DeepSeek and other Chinese companies. Not only do they publish research, they also put their resources where their mouth (research) is. They actually use it and prove it through their open models.

Most research coming out of big US labs is counter indicative of practical performance. If it worked (too) well in practice, it wouldn't have been published.

Some examples from DeepSeek:

https://arxiv.org/abs/2405.04434

https://arxiv.org/abs/2502.11089

replies(1): >>46186643 #
abbycurtis33[dead post] ◴[] No.46186643[source]
[flagged]
CGMthrowaway[dead post] ◴[] No.46186712[source]
[flagged]
elmomle ◴[] No.46187015[source]
Your comment seems to imply "these views aren't valid" without any evidence for that claim. Of course the theft claim was a strong one to make without evidence too. So, to that point--it's pretty widely accepted as fact that DeepSeek was at its core a distillation of ChatGPT. The question is whether that counts as theft. As to evidence, to my knowledge it's a combination of circumstantial factors which add up to paint a pretty damning picture:

(1) Large-scale exfiltration of data from ChatGPT when DeepSeek was being developed, and which Microsoft linked to DeepSeek

(2) DeepSeek's claim of training a cutting-edge LLM using a fraction of the compute that is typically needed, without providing a plausible, reproducible method

(3) Early DeepSeek coming up with near-identical answers to ChatGPT--e.g. https://www.reddit.com/r/ChatGPT/comments/1idqi7p/deepseek_a...

replies(4): >>46187080 #>>46187116 #>>46188534 #>>46189289 #
1. orbital-decay ◴[] No.46188534[source]
>Your comment seems to imply "these views aren't valid" without any evidence for that claim.

No, your comment seems to be a deflection. You made an outstanding claim, that DS stole some IP, and have been asked for outstanding evidence, or at least some evidence. You need to provide it if you want to be taken seriously.

>Large-scale exfiltration of data from ChatGPT when DeepSeek was being developed, and which Microsoft linked to DeepSeek

Where's the evidence for that? I also have a claim that I can't back up with anything more than XLab's report: before the release of R1, there were multiple attempts to hack DS's systems, which nobody noticed. [1]

You really seem to have no idea what you're talking about. R1 was an experiment on teaching the model to reason on its own, exactly to avoid large amounts of data in post-training. It also partially failed, they called the failed snapshot R1-Zero. And it's pretty different from any OpenAI or Anthropic model.

>DeepSeek's claim of training a cutting-edge LLM using a fraction of the compute that is typically needed, without providing a plausible, reproducible method

DeepSeek published a lot more about their models than any top tier US lab before them, including their production code. And they're continuing doing so. All their findings in R1 are highly plausible and most are replicated to some degree and adopted in the research and industry. Moonshot AI trained their K2 on DeepSeek's architecture with minor tweaks (not to diminish their novel findings). That's a really solid model.

Moreover, they released their DeepSeek-Math-7B-RL back in April 2024. [2] It was a tiny model that outperformed huge then-SOTA LLMs like Claude 3 Opus in math, and validated their training technique (GPRO). Basically, they made the first reasoning model worth talking about. Their other optimizations (MLA) can be traced back to DeepSeek v2.

>Early DeepSeek coming up with near-identical answers to ChatGPT--e.g. https://www.reddit.com/r/ChatGPT/comments/1idqi7p/deepseek_a...

That's n=1 nonsense, not evidence. GPT contamination was everywhere, even Claude used to claim to be GPT-3 occasionally, or Reddit Anti-Evil Team. (yes, really) All models have overlapping datasets that are also contaminated with previous models outputs, and mode collapse makes them converge on similar patterns which seem to come and go with each generation.

[1] https://www.globaltimes.cn/page/202501/1327676.shtml

[2] https://huggingface.co/deepseek-ai/deepseek-math-7b-rl