https://arxiv.org/abs/2501.00663
https://arxiv.org/pdf/2504.13173
Is there any other company that's openly publishing their research on AI at this level? Google should get a lot of credit for this.
https://arxiv.org/abs/2501.00663
https://arxiv.org/pdf/2504.13173
Is there any other company that's openly publishing their research on AI at this level? Google should get a lot of credit for this.
Most research coming out of big US labs is counter indicative of practical performance. If it worked (too) well in practice, it wouldn't have been published.
Some examples from DeepSeek:
(1) Large-scale exfiltration of data from ChatGPT when DeepSeek was being developed, and which Microsoft linked to DeepSeek
(2) DeepSeek's claim of training a cutting-edge LLM using a fraction of the compute that is typically needed, without providing a plausible, reproducible method
(3) Early DeepSeek coming up with near-identical answers to ChatGPT--e.g. https://www.reddit.com/r/ChatGPT/comments/1idqi7p/deepseek_a...
Here's an umbrella doc from the USTR, and the good stuff: China used foreign ownership restrictions, such as joint venture (JV) requirements and foreign equity limitations, and various administrative review and licensing processes, to require or pressure technology transfer from U.S. companies. 2. China’s regime of technology regulations forced U.S. companies seeking to license technologies to Chinese entities to do so on non-market-based terms that favor Chinese recipients. 3. China directed and unfairly facilitated the systematic investment in, and acquisition of, U.S. companies and assets by Chinese companies to obtain cutting-edge technologies and IP and generate the transfer of technology to Chinese companies. 4. China conducted and supported unauthorized intrusions into, and theft from, the computer networks of U.S. companies to access their IP, including trade secrets, and confidential business information.
As mentioned - no one has claimed that DeepSeek in its entirety was stolen from the U.S.
It is almost a certainty based on decades of historical precedent of systematic theft that techniques, research, and other IP was also systematically stolen for this critical technology.
Don't close your eyes when the evidence, both rigorously proven and common sense, is staring you in the face.
...and of course the completely insane fact that China has been running on-the-ground operations in the US (and other countries) to discredit, harass, blackmail, and kidnap Chinese who are critical of the government (https://www.npr.org/2020/10/28/928684913/china-runs-illegal-... and https://www.justice.gov/archives/opa/pr/eight-individuals-ch...) - INCLUDING CITIZENS OF OTHER COUNTRIES (https://www.smh.com.au/world/asia/detained-blogger-revealed-...).
No, your comment seems to be a deflection. You made an outstanding claim, that DS stole some IP, and have been asked for outstanding evidence, or at least some evidence. You need to provide it if you want to be taken seriously.
>Large-scale exfiltration of data from ChatGPT when DeepSeek was being developed, and which Microsoft linked to DeepSeek
Where's the evidence for that? I also have a claim that I can't back up with anything more than XLab's report: before the release of R1, there were multiple attempts to hack DS's systems, which nobody noticed. [1]
You really seem to have no idea what you're talking about. R1 was an experiment on teaching the model to reason on its own, exactly to avoid large amounts of data in post-training. It also partially failed, they called the failed snapshot R1-Zero. And it's pretty different from any OpenAI or Anthropic model.
>DeepSeek's claim of training a cutting-edge LLM using a fraction of the compute that is typically needed, without providing a plausible, reproducible method
DeepSeek published a lot more about their models than any top tier US lab before them, including their production code. And they're continuing doing so. All their findings in R1 are highly plausible and most are replicated to some degree and adopted in the research and industry. Moonshot AI trained their K2 on DeepSeek's architecture with minor tweaks (not to diminish their novel findings). That's a really solid model.
Moreover, they released their DeepSeek-Math-7B-RL back in April 2024. [2] It was a tiny model that outperformed huge then-SOTA LLMs like Claude 3 Opus in math, and validated their training technique (GPRO). Basically, they made the first reasoning model worth talking about. Their other optimizations (MLA) can be traced back to DeepSeek v2.
>Early DeepSeek coming up with near-identical answers to ChatGPT--e.g. https://www.reddit.com/r/ChatGPT/comments/1idqi7p/deepseek_a...
That's n=1 nonsense, not evidence. GPT contamination was everywhere, even Claude used to claim to be GPT-3 occasionally, or Reddit Anti-Evil Team. (yes, really) All models have overlapping datasets that are also contaminated with previous models outputs, and mode collapse makes them converge on similar patterns which seem to come and go with each generation.
This is not the same thing at all. Current legal doctrine is that ChatGPT output is not copyrightable, so at most Deepseek violated the terms of use of ChatGPT.
That isn't IP theft.
To add to that example, there are numerous open-source datasets that are derived from ChatGPT data. Famously, the Alpaca dataset kick-started the open source LLM movement by fine tuning Llama on a GPT-derived dataset: https://huggingface.co/datasets/tatsu-lab/alpaca