> Welcome to OML 1.0: Fingerprinting. This repository houses the tooling for generating and embedding secret fingerprints into LLMs through fine-tuning to enable identification of LLM ownership and protection against unauthorized use.
I started doing that once LLM provided me with a solution to a problem that was quite elegant, but was not implemented in the particular project. Turns out it learned it from GitHub issues post that described how particular problem could be tackled, but PR never actually got in.
实际上,对于后续训了很久很久的这个模型,Honestagi能够分析出这个量级的相似性我已经很诧异了,因为这个模型为了续训洗参数,所付出的算力甚至早就足够从头训一个同档位的模型了。听同事说他们为了洗掉千问的水印,采取了不少办法,甚至包括故意训了脏数据。这也为学术界研究模型血缘提供了一个前所未有的特殊模范吧。以后新的血缘方法提出可以拿出来溜溜。
In fact, I'm surprised that HonestAGI's analysis could show this level of similarity for this model that had been post-trained for a long time, because the computing power used to train-wash the parameters of this model was enough to train a model of the same size from scratch. I heard from my colleagues that they took many measures to wash off Qwen's watermark, even deliberately training on dirty data. This also provides an unprecedented case study for the academic community studying model lineage. If a new lineage method is put forward in the future, you can take it for a spin.
Very funny that the chinese even do this to each other; equal-opportunity cheats.
To get real proof we created a new program that only existed on a single machine, and then added a signature for that application. This way there could be no claim that they independently added something to their database, as the program was not malware and literally impossible to actually find in the wild. Once they added it to their database we made a blog post and the issue got a lot of attention.
https://forums.malwarebytes.com/topic/29681-iobit-steals-mal...
This article on the topic is a good explainer, https://aeon.co/essays/why-in-china-and-japan-a-copy-is-just... , but it's a thoroughly studied phenomenon.
More interestingly that article dives into the reasons why keeping “old stuff” around (instead of renewing it) is only a winning strategy while your society is “only” a few centuries old. The West will one day be old enough that it decides to renew its old stuff too, just like the eternally 20-year-old Japanese temple.