←back to thread

110 points jonbaer | 1 comments | | HN request time: 0s | source
Show context
mingtianzhang ◴[] No.45073410[source]
1. One-sample detection is impossible. These detection methods work at the distributional level—more like a two-sample test in statistics—which means you need to collect a large amount of generated text from the same model to make the test significant. Detecting based on a short piece of generated text is theoretically impossible. For example, imagine two different Gaussian distributions: you can never be 100% certain whether a single sample comes from one Gaussian or the other, since both share the same support.

2. Adding watermarks may reduce the ability of an LLM, which is why I don’t think they will be widely adopted.

3. Consider this simple task: ask an LLM to repeat exactly what you said. Is the resulting text authored by you, or by the AI?

replies(1): >>45073419 #
mingtianzhang ◴[] No.45073419[source]
For images/video/audio, removing such a watermark is very simple. By adding noise to the generated image and then using an open-source diffusion model to denoise it, the watermark can be broken. Or in an autoregressive model, use an open-sourced model to do generation with "teacher forcing" loll.
replies(3): >>45073483 #>>45075121 #>>45081558 #
umbra07 ◴[] No.45081558[source]
UMG (music label) has been watermarking their music for many years now, and I'm unaware of any tool to remove their watermarks.
replies(1): >>45092386 #
1. mingtianzhang ◴[] No.45092386[source]
Do you think if such a tool exists, will it benefit the community or not?