Huawei cloned Qwen and DeepSeek models, claimed as own

1. bigmattystyles ◴[06 Jul 25 16:46 UTC] No.44482203[source]▶

>>44482051 (OP) #

Old maps (and perhaps new ones) used to add fake little alleys so a publisher could quickly spot publishers infringing on their IP rather than going out and actually mapping. I wonder if something similar is possible with LLMs.

replies(6): >>44482287 #>>44482430 #>>44482713 #>>44482830 #>>44482968 #>>44482971 #

2. Tokumei-no-hito ◴[06 Jul 25 16:58 UTC] No.44482287[source]▶

>>44482203 (TP) #

i have come across this one for example https://github.com/sentient-agi/OML-1.0-Fingerprinting

> Welcome to OML 1.0: Fingerprinting. This repository houses the tooling for generating and embedding secret fingerprints into LLMs through fine-tuning to enable identification of LLM ownership and protection against unauthorized use.

replies(1): >>44482449 #

3. varispeed ◴[06 Jul 25 17:16 UTC] No.44482430[source]▶

>>44482203 (TP) #

I often say an odd thing on public forum or make up a story and then see if LLM can bring it up.

I started doing that once LLM provided me with a solution to a problem that was quite elegant, but was not implemented in the particular project. Turns out it learned it from GitHub issues post that described how particular problem could be tackled, but PR never actually got in.

replies(1): >>44482815 #

4. NitpickLawyer ◴[06 Jul 25 17:19 UTC] No.44482449[source]▶

>>44482287 #

Would be interesting to see if this kind of watermarking survives the frankenstein types of editing they are presumably doing. Per the linked account, they took a model, changed tokenizers, and added layers on top. They then presumably did some form of continued pre-training, and then post-training. It would have to be some very resistant watermarking to survive that. It's not as simple as making the model reply with "my tokens are my passport, verify me" when you ask them the weather in NonExistingCity... Interesting nonetheless.

replies(1): >>44484318 #

5. yorwba ◴[06 Jul 25 17:55 UTC] No.44482713[source]▶

>>44482203 (TP) #

The original whisteblower article in Chinese at the bottom (but not the English version at the top) has this part:

实际上，对于后续训了很久很久的这个模型，Honestagi能够分析出这个量级的相似性我已经很诧异了，因为这个模型为了续训洗参数，所付出的算力甚至早就足够从头训一个同档位的模型了。听同事说他们为了洗掉千问的水印，采取了不少办法，甚至包括故意训了脏数据。这也为学术界研究模型血缘提供了一个前所未有的特殊模范吧。以后新的血缘方法提出可以拿出来溜溜。

In fact, I'm surprised that HonestAGI's analysis could show this level of similarity for this model that had been post-trained for a long time, because the computing power used to train-wash the parameters of this model was enough to train a model of the same size from scratch. I heard from my colleagues that they took many measures to wash off Qwen's watermark, even deliberately training on dirty data. This also provides an unprecedented case study for the academic community studying model lineage. If a new lineage method is put forward in the future, you can take it for a spin.

6. richardw ◴[06 Jul 25 18:11 UTC] No.44482815[source]▶

>>44482430 #

I’ve wondered whether humans who wanted to protect some areas of knowledge just start writing BS here and there. Organised and large scale, with hidden orchestration channels, it could potentially really screw with models. Put the signal to humans in related but slightly removed places.

7. landl0rd ◴[06 Jul 25 18:12 UTC] No.44482830[source]▶

>>44482203 (TP) #

The classic example here is subtle, harmless defects/anomalies built into computer chips. Half the stuff china's made is full of these because they're straight ripped from reverse engineering of TI or whomever's stuff.

Very funny that the chinese even do this to each other; equal-opportunity cheats.

replies(1): >>44482989 #

8. tedivm ◴[06 Jul 25 18:33 UTC] No.44482968[source]▶

>>44482203 (TP) #

When I was at Malwarebytes we had concerns that IOBit was stealing our database and passing it off on their own. While we had a lot of obvious proof, we felt it wasn't enough for the average person to understand.

To get real proof we created a new program that only existed on a single machine, and then added a signature for that application. This way there could be no claim that they independently added something to their database, as the program was not malware and literally impossible to actually find in the wild. Once they added it to their database we made a blog post and the issue got a lot of attention.

https://forums.malwarebytes.com/topic/29681-iobit-steals-mal...

replies(2): >>44483250 #>>44483605 #

9. ateng ◴[06 Jul 25 18:33 UTC] No.44482971[source]▶

>>44482203 (TP) #

Youtuber Jay Foreman made a video about fake alleys in maps https://www.youtube.com/watch?v=DeiATy-FfjI

10. throwaway74354 ◴[06 Jul 25 18:36 UTC] No.44482989[source]▶

>>44482830 #

It's important part of the culture and is not considered cheating. IP protection laws legal precedents are not the universal truth.

This article on the topic is a good explainer, https://aeon.co/essays/why-in-china-and-japan-a-copy-is-just... , but it's a thoroughly studied phenomenon.

replies(1): >>44487457 #

11. e9 ◴[06 Jul 25 19:21 UTC] No.44483250[source]▶

>>44482968 #

I was learning OS stuff and made a toy virus for myself back in 1999 and I thought it would be cool if antivirus officially recognized it so I sent a copy to antivirus company(Dr.Web. I think it was called?) and to my surprise now all antivirus databases have it and someone even has gif recording of machine booting up with it… so clearly they must be sharing not just db but also the executables etc

replies(1): >>44483450 #

12. tedivm ◴[06 Jul 25 19:46 UTC] No.44483450{3}[source]▶

>>44483250 #

There are sharing programs between companies, yes, but that isn't what we're talking about here.

13. belter ◴[06 Jul 25 20:06 UTC] No.44483605[source]▶

>>44482968 #

> When I was at Malwarebytes

I hope you were not the one that decided to uninstall the product, you need to download a support utility... :-)

14. Tokumei-no-hito ◴[06 Jul 25 21:39 UTC] No.44484318{3}[source]▶

>>44482449 #

i have never used it and have limited understand of fine tune models. i only remember see this a few weeks ago and your comment reminds me. i am curious too.

15. cadamsdotcom ◴[07 Jul 25 06:58 UTC] No.44487457{3}[source]▶

>>44482989 #

Thanks for this read, it really opened my eyes to some things I thought were universal - what copying actually is.

More interestingly that article dives into the reasons why keeping “old stuff” around (instead of renewing it) is only a winning strategy while your society is “only” a few centuries old. The West will one day be old enough that it decides to renew its old stuff too, just like the eternally 20-year-old Japanese temple.