SynthID – A tool to watermark and identify content generated through AI

1. mingtianzhang ◴[30 Aug 25 10:08 UTC] No.45073410[source]▶

1. One-sample detection is impossible. These detection methods work at the distributional level—more like a two-sample test in statistics—which means you need to collect a large amount of generated text from the same model to make the test significant. Detecting based on a short piece of generated text is theoretically impossible. For example, imagine two different Gaussian distributions: you can never be 100% certain whether a single sample comes from one Gaussian or the other, since both share the same support.

2. Adding watermarks may reduce the ability of an LLM, which is why I don’t think they will be widely adopted.

3. Consider this simple task: ask an LLM to repeat exactly what you said. Is the resulting text authored by you, or by the AI?

replies(1): >>45073419 #

2. mingtianzhang ◴[30 Aug 25 10:11 UTC] No.45073419[source]▶

>>45073410 (TP) #

For images/video/audio, removing such a watermark is very simple. By adding noise to the generated image and then using an open-source diffusion model to denoise it, the watermark can be broken. Or in an autoregressive model, use an open-sourced model to do generation with "teacher forcing" loll.

replies(3): >>45073483 #>>45075121 #>>45081558 #

3. drdebug ◴[30 Aug 25 10:23 UTC] No.45073483[source]▶

>>45073419 #

I wonder where you got that impression. Several professional watermarking systems for movie studio type content I have worked with (and on) are highly resistant to noise removal while remaining imperceptible.

replies(1): >>45073588 #

4. mingtianzhang ◴[30 Aug 25 10:46 UTC] No.45073588{3}[source]▶

>>45073483 #

Based on my research experience and judgment, I have published several top-conference papers in both the detection and diffusion domain, but I haven’t explored the engineering/product side. I believe that if such a system hasn’t been invented yet, it wouldn’t be difficult to create one to remove that watermark using an open-source image/video model and maintain the high quality. Would you be interested in having a further discussion on this?

5. kimi ◴[30 Aug 25 14:48 UTC] No.45075121[source]▶

>>45073419 #

For text, have a big model generate the "intelligent" answer, and then ask a local LLM to rephrase.

replies(1): >>45076846 #

6. mingtianzhang ◴[30 Aug 25 18:17 UTC] No.45076846{3}[source]▶

>>45075121 #

Yeah exactly, you can always do that by using another model that doesn't have the watermark.

7. umbra07 ◴[31 Aug 25 08:45 UTC] No.45081558[source]▶

>>45073419 #

UMG (music label) has been watermarking their music for many years now, and I'm unaware of any tool to remove their watermarks.

replies(1): >>45092386 #

8. mingtianzhang ◴[01 Sep 25 13:00 UTC] No.45092386{3}[source]▶

>>45081558 #

Do you think if such a tool exists, will it benefit the community or not?