Most active commenters

mingtianzhang(5)
utilize1808(4)
shkkmo(4)
xpe(4)
drdebug(3)

Popular/hot comments

>>45072366 #
>>45072652 #
>>45072658 #
>>45073112 #
>>45073155 #
>>45073419 #
>>45073915 #

SynthID – A tool to watermark and identify content generated through AI

(deepmind.google)

1. egeozcan ◴[30 Aug 25 05:28 UTC] No.45072101[source]▶

I guess this is the start of a new arms race on making generated content pass these checks undetected and detecting them anyway.

replies(1): >>45072376 #

2. peterkelly ◴[30 Aug 25 05:45 UTC] No.45072162[source]▶

>>45071677 (OP) #

Create the problem, sell the solution.

3. 9dev ◴[30 Aug 25 06:09 UTC] No.45072289[source]▶

>>45071677 (OP) #

You can never be sure something has been generated by a model embedding one of these anyway, so it’s pretty moot.

4. montag ◴[30 Aug 25 06:18 UTC] No.45072330[source]▶

>>45071677 (OP) #

"The watermarks are embedded across Google’s generative AI consumer products, and are imperceptible to humans."

I'd love to see the data behind this claim, especially on the audio side.

replies(1): >>45072342 #

5. donperignon ◴[30 Aug 25 06:19 UTC] No.45072334[source]▶

>>45071677 (OP) #

I am not sure that text watermarking will be accurate, I foresee plenty of false positives.

replies(1): >>45073690 #

6. pelasaco ◴[30 Aug 25 06:20 UTC] No.45072339[source]▶

>>45071677 (OP) #

looks like the same as anti-virus companies in the 80s? Write virus, Write anti-virus and profit!

7. donperignon ◴[30 Aug 25 06:21 UTC] No.45072342[source]▶

>>45072330 #

Nah that’s a solved problem if you work on the frequency domain. Same for image. Text is the hard rock here.

8. teiferer ◴[30 Aug 25 06:27 UTC] No.45072366[source]▶

>>45071677 (OP) #

Could anybody explain how this isn't easily circumvented by using a competitor's model?

Also, if everything in the future has some touch of AI inside, for example cameras using AI to slightly improve the perceived picture quality, then "made with AI" won't be a categorization that anybody lifts an eyebrow about.

replies(5): >>45072388 #>>45072596 #>>45072693 #>>45073014 #>>45074013 #

9. dragonwriter ◴[30 Aug 25 06:28 UTC] No.45072376[source]▶

>>45072101 #

Its not really an arms race; any gen AI system that doesn't explicitly incorporate a watermarking tool like this won't be detectable by tools that read the watermarks.

There is a kind of arms race that has existed for a while for non-watermarked content, except that the detection tools are pretty much Magic 8-ball level of reliability, so there's not a lot of effort on the counter-detection side.

10. doawoo ◴[30 Aug 25 06:32 UTC] No.45072381[source]▶

>>45071677 (OP) #

the beginning of walled garden “AI” tools has been interesting to follow

11. dragonwriter ◴[30 Aug 25 06:34 UTC] No.45072388[source]▶

>>45072366 #

> Could anybody explain how this isn't easily circumvented by using a competitor's model?

Almost all the big hosted AI providers are publicly working on watermarking for at least media (text is more of a mixed bag); ultimately, its probably a regulatory play—the big providers expect that the combination of legitimate concerns and their own active fearmongering, combined with them demonstrating watermarking, will result in mandates for commercial AI generation services to include watermarking. This may even be part of the regulatory play to restrict availability and non-research use of open models.

replies(1): >>45072872 #

12. chii ◴[30 Aug 25 07:07 UTC] No.45072544[source]▶

>>45071677 (OP) #

i find the premise to be an invalid one personally - why is the property that a works from an AI model must be identified/identifiable?

replies(2): >>45072650 #>>45073062 #

13. verisimi ◴[30 Aug 25 07:18 UTC] No.45072596[source]▶

>>45072366 #

If you see the mark, you'd know at least that you aren't dealing with a purely mechanic rendering of whatever-it-is.

14. Oras ◴[30 Aug 25 07:26 UTC] No.45072637[source]▶

>>45071677 (OP) #

OpenAI has been doing something similar for generated images using C2PA [0]

It is easy to alter by just saving to a different format or basic cropping.

I would love to see how SynthID is fixing this issue.

https://help.openai.com/en/articles/8912793-c2pa-in-chatgpt-...

replies(1): >>45074524 #

15. HighGoldstein ◴[30 Aug 25 07:29 UTC] No.45072650[source]▶

>>45072544 #

Video evidence of you committing a crime, for example, should be identifiable as AI-generated.

replies(2): >>45072669 #>>45073301 #

16. JimDabell ◴[30 Aug 25 07:29 UTC] No.45072652[source]▶

>>45071677 (OP) #

> Large language models generate text one word (token) at a time. Each word is assigned a probability score, based on how likely it is to be generated next. So for a sentence like “My favourite tropical fruits are mango and…”, the word “bananas” would have a higher probability score than the word “airplanes”.

> SynthID adjusts these probability scores to generate a watermark. It's not noticeable to the human eye, and doesn’t affect the quality of the output.

I think they need to be clearer about the constraints involved here. If I ask What is the capital of France? Just the answer, no extra information.” then there’s no room to vary the probability without harming the quality of the output. So clearly there is a lower bound beyond which this becomes ineffective. And presumably the longer the text, the more resilient it is to alterations. So what are the constraints?

I also think that this is self-interest dressed up as altruism. There’s always going to be generative AI that doesn’t include watermarks, so a watermarking scheme cannot tell you if something is genuine. It is, however, useful for determining that something came from a specific provider, which could be valuable to Google in all sorts of ways.

replies(4): >>45072837 #>>45074190 #>>45074874 #>>45077978 #

17. HighGoldstein ◴[30 Aug 25 07:30 UTC] No.45072658[source]▶

>>45071677 (OP) #

I wonder if, conversely, authentic media can be falsely watermarked as AI-generated.

replies(3): >>45072682 #>>45072735 #>>45072841 #

18. chii ◴[30 Aug 25 07:33 UTC] No.45072669{3}[source]▶

>>45072650 #

how do we currently deal with tampered video evidence today, before the advent of ai generated videos? Why cant same methods be used for an ai generated video?

replies(1): >>45073662 #

19. ◴[30 Aug 25 07:35 UTC] No.45072682[source]▶

>>45072658 #

20. progval ◴[30 Aug 25 07:39 UTC] No.45072693[source]▶

>>45072366 #

By lobbying regulators to force your competitors to add watermarks too.

21. notpushkin ◴[30 Aug 25 07:48 UTC] No.45072735[source]▶

>>45072658 #

For photos, I think the answer is yes. For texts, the wording will be changed when you watermark them, so I guess that’s a no.

22. R_Spaghetti ◴[30 Aug 25 07:53 UTC] No.45072765[source]▶

>>45071677 (OP) #

It only works across Google shit.

23. DrNosferatu ◴[30 Aug 25 08:04 UTC] No.45072831[source]▶

>>45071677 (OP) #

If I slightly edit plain text watermarked with it, will the watermark identification be robust?

24. merelysounds ◴[30 Aug 25 08:05 UTC] No.45072837[source]▶

>>45072652 #

This might be enforced in some trivial way, e.g. by requiring AI models to answer with at least a sentence. The constraints may not be fully published and the obscurity might make it more efficient, if only temporarily.

Printer tracking dots[1] is one prior solution like this; annoying, largely unknown, workarounds exist, still - surprisingly efficient.

[1]: https://en.m.wikipedia.org/wiki/Printer_tracking_dots

replies(2): >>45073190 #>>45073841 #

25. DrNosferatu ◴[30 Aug 25 08:05 UTC] No.45072838[source]▶

>>45071677 (OP) #

The first good use of blockchain comes to mind.

26. NitpickLawyer ◴[30 Aug 25 08:06 UTC] No.45072841[source]▶

>>45072658 #

When chatgpt launched there was a rush of "solutions" to catch llm generated text. The problem was not their terrible accuracy, but their even more terrible false positive rates. The classic example was pasting the declaration of independence, and getting 100% AI generated. What's even more sad is that some of those solutions are still used today, and for a while they were used against students, with chilling repercussions for them.

27. mhl47 ◴[30 Aug 25 08:13 UTC] No.45072872{3}[source]▶

>>45072388 #

Yes but isn't the cat out of the box already? Don't we have sufficiently strong local models that can be finetuned in various ways to rewrite text/alternate images and thus destroy possible watermarks.

Sure in some cases a model might do some astounding things that always shine through, but I guess the jury still out on these questions.

28. wenbin ◴[30 Aug 25 08:13 UTC] No.45072874[source]▶

>>45071677 (OP) #

I really hope SynthID becomes a widely adopted standard - at the very least, Google should implement it across its own products like NotebookLM.

The problem is becoming urgent: more and more so-called “podcasts” are entirely fake, generated by NotebookLM and pushed to every major platform purely to farm backlinks and run blackhat SEO campaigns.

Beyond SynthID or similar watermarking standards, we also need models trained specifically [0] to detect AI-generated audio. Otherwise, the damage compounds - people might waste 30 minutes listening to a meaningless AI-generated podcast, or worse, absorb and believe misleading or outright harmful information.

[0] 15,000+ ai generated fake podcasts https://www.kaggle.com/datasets/listennotes/ai-generated-fak...

replies(1): >>45073528 #

29. michaelt ◴[30 Aug 25 08:42 UTC] No.45073014[source]▶

>>45072366 #

> Could anybody explain how this isn't easily circumvented by using a competitor's model?

If the problem is "kids are using AI to cheat on their schoolwork and it's bad PR / politicians want us to do something" then competitors' models aren't your problem.

On the other hand, if the problem is "social media is flooded with undetectable, super-realistic bots pushing zany, divisive political opinions, we need to save the free world from our own creation" then yes, your competitors' models very much are part of the problem too.

30. hiatus ◴[30 Aug 25 08:52 UTC] No.45073062[source]▶

>>45072544 #

People want to know when they are interacting with AI-generated content.

31. utilize1808 ◴[30 Aug 25 08:59 UTC] No.45073112[source]▶

>>45071677 (OP) #

I feel this is not the scalable/right way to approach this. The right way would be for human creators to apply their own digital signatures to the original pieces they created (specialised chips on camera/in software to inject hidden pixel patterns that are verifiable). If a piece of work lacks such signature, it should be considered AI-generated by default.

replies(3): >>45073155 #>>45073302 #>>45073834 #

32. shkkmo ◴[30 Aug 25 09:10 UTC] No.45073155[source]▶

>>45073112 #

That seems like a horrible blow to anonymity and psuedonymity that would also empower identity thieves.

replies(3): >>45073244 #>>45073831 #>>45074209 #

33. kedv ◴[30 Aug 25 09:12 UTC] No.45073168[source]▶

>>45071677 (OP) #

Would be nice if you guys open source the detection code, similar to the way C2PA is open

replies(2): >>45073352 #>>45074799 #

34. utilize1808 ◴[30 Aug 25 09:28 UTC] No.45073244{3}[source]▶

>>45073155 #

Not necessarily. It’s basically document signing with key pairs —- old tech that is known to work. It’s purpose is not to identify the individual creators, but to verify that a piece of work was created by a process/device that is not touched by AI.

replies(2): >>45073863 #>>45076968 #

35. ◴[30 Aug 25 09:39 UTC] No.45073301{3}[source]▶

>>45072650 #

36. HPsquared ◴[30 Aug 25 09:40 UTC] No.45073302[source]▶

>>45073112 #

Then you just point the special camera at a screen showing the AI content.

replies(1): >>45073754 #

37. harshreality ◴[30 Aug 25 09:52 UTC] No.45073352[source]▶

>>45073168 #

That's like asking for Adobe to open source their C2PA signing keys.

AI watermarking is adversarial, and anyone who generates a watermarked output either doesn't care, or wants the watermarked removed.

C2PA is cooperative: publishers want the signatures intact, so that the audience has trust in the publisher.

By "adversarial" and "cooperative", I mean in relation to the primary content distributor. There's an adversarial aspect to C2PA, too: bad actors want leaked keys so they can produce fake video and images with metadata attesting that they're real.

A lot of people have a large incentive to disrupt the AI watermark. Leaked C2PA keys will be a problem, but probably a minor one. C2PA is merely an additional assurance, beyond the reputation and representation of the publishing entity, of the origin of a piece of media.

38. mingtianzhang ◴[30 Aug 25 10:08 UTC] No.45073410[source]▶

>>45071677 (OP) #

1. One-sample detection is impossible. These detection methods work at the distributional level—more like a two-sample test in statistics—which means you need to collect a large amount of generated text from the same model to make the test significant. Detecting based on a short piece of generated text is theoretically impossible. For example, imagine two different Gaussian distributions: you can never be 100% certain whether a single sample comes from one Gaussian or the other, since both share the same support.

2. Adding watermarks may reduce the ability of an LLM, which is why I don’t think they will be widely adopted.

3. Consider this simple task: ask an LLM to repeat exactly what you said. Is the resulting text authored by you, or by the AI?

replies(1): >>45073419 #

39. mingtianzhang ◴[30 Aug 25 10:11 UTC] No.45073419[source]▶

>>45073410 #

For images/video/audio, removing such a watermark is very simple. By adding noise to the generated image and then using an open-source diffusion model to denoise it, the watermark can be broken. Or in an autoregressive model, use an open-sourced model to do generation with "teacher forcing" loll.

replies(3): >>45073483 #>>45075121 #>>45081558 #

40. drdebug ◴[30 Aug 25 10:23 UTC] No.45073483{3}[source]▶

>>45073419 #

I wonder where you got that impression. Several professional watermarking systems for movie studio type content I have worked with (and on) are highly resistant to noise removal while remaining imperceptible.

replies(1): >>45073588 #

41. fertrevino ◴[30 Aug 25 10:31 UTC] No.45073517[source]▶

>>45071677 (OP) #

I wonder what exactly would prevent a developer from removing the signature from a generated file. One could remove arbitrary segments that signal that it is AI generated.

replies(1): >>45073572 #

42. 6LLvveMx2koXfwn ◴[30 Aug 25 10:33 UTC] No.45073528[source]▶

>>45072874 #

Given there is "misleading or outright harmful" information generated by humans, why is it more pressing that we track such content generated by AI?

replies(1): >>45073711 #

43. 42lux ◴[30 Aug 25 10:43 UTC] No.45073572[source]▶

>>45073517 #

For images it's not that easy it's in Fourier space and injected through the whole denoising process.

44. mingtianzhang ◴[30 Aug 25 10:46 UTC] No.45073588{4}[source]▶

>>45073483 #

Based on my research experience and judgment, I have published several top-conference papers in both the detection and diffusion domain, but I haven’t explored the engineering/product side. I believe that if such a system hasn’t been invented yet, it wouldn’t be difficult to create one to remove that watermark using an open-source image/video model and maintain the high quality. Would you be interested in having a further discussion on this?

45. drdebug ◴[30 Aug 25 11:06 UTC] No.45073662{4}[source]▶

>>45072669 #

If you are interested, you can look into the work of Hany Farid on this topic as a good introduction to image forensics and related topics.

46. drdebug ◴[30 Aug 25 11:14 UTC] No.45073690[source]▶

>>45072334 #

In practice, very short texts don't carry very high value so watermarking is (usually) less important. For longer text false positives are not an issue at all since you have a large amount of data to extract your signal from.

47. anuramat ◴[30 Aug 25 11:19 UTC] No.45073711{3}[source]▶

>>45073528 #

I suppose efficiency? It's easier to filter out petabytes of AI slop than to determine which human generated content is harmful

48. greatgib ◴[30 Aug 25 11:21 UTC] No.45073729[source]▶

>>45071677 (OP) #

Also, what will happen if you cut and paste some part or the whole image inside another bigger one, like traditional photo editing?

And if I scan the image or take a picture of the image on display.

49. utilize1808 ◴[30 Aug 25 11:28 UTC] No.45073754{3}[source]▶

>>45073302 #

Sure. But then it will receive more scrutiny because you are showing a "capture" rather than the raw content.

replies(1): >>45074633 #

50. taminka ◴[30 Aug 25 11:45 UTC] No.45073831{3}[source]▶

>>45073155 #

there's very likely already some sort of fingerprinting in camera chips, à la printer yellow dot watermarks that uniquely identify a printer and a print job...

replies(1): >>45077007 #

51. jay-barronville ◴[30 Aug 25 11:46 UTC] No.45073834[source]▶

>>45073112 #

> If a piece of work lacks such signature, it should be considered AI-generated by default.

That sounds like a nightmare to me.

replies(1): >>45074189 #

52. ChrisMarshallNY ◴[30 Aug 25 11:48 UTC] No.45073841{3}[source]▶

>>45072837 #

I think they are what busted that ironically-named young lady that leaked NSA information.

replies(1): >>45074237 #

53. BoiledCabbage ◴[30 Aug 25 11:53 UTC] No.45073863{4}[source]▶

>>45073244 #

And what happens when someone uses their digital signature to sign an essay that was generated by AI?

replies(1): >>45073997 #

54. daft_pink ◴[30 Aug 25 12:04 UTC] No.45073915[source]▶

>>45071677 (OP) #

Would you really use Google products to write your email if you knew that they were watermarking it like this?

I think this technology is gonna quickly get eliminated from the marketplace, cause people aren’t willing to use AI for many common tasks that are watermarked this way. It’s ultimately gonna cause Google to lose share.

This technology has a basic use dilemma problem where widely publishing it’s ability and existence will cause your AI to stop being used in some applications

replies(3): >>45073981 #>>45074232 #>>45074378 #

55. A4ET8a8uTh0_v2 ◴[30 Aug 25 12:17 UTC] No.45073981[source]▶

>>45073915 #

While I want to believe this is true, the experience of being human over the past decade or so suggests otherwise. I think, overall, most people would not even begin this line of inquiry; not to mention care once the thought is considered.

replies(1): >>45074239 #

56. utilize1808 ◴[30 Aug 25 12:21 UTC] No.45073997{5}[source]▶

>>45073863 #

You can’t. It may be set up such that your advisor could sign it if they know for sure that you wrote it yourself with using AI.

replies(1): >>45074751 #

57. QuadmasterXLII ◴[30 Aug 25 12:23 UTC] No.45074013[source]▶

>>45072366 #

I wonder if this will survive distillation. I vaguely recall that most open models answer “ I am chat gpt” when asked who they are, as they’re heavily trained on openai outputs. If the version of chatgpt used to generate the training data had a watermark, a sufficiently powerful function approximator would just learn the watermark.

replies(1): >>45074218 #

58. xpe ◴[30 Aug 25 12:49 UTC] No.45074189{3}[source]▶

>>45073834 #

You aren’t specifying your point of comparison. A nightmare relative to what? You might be saying a nightmare relative to what we have now. Are you?

We once considered text to be generated exclusively by humans, but this assumption must be tossed out now.

I usually reject arguments based on an assumption of some status quo that somehow just continues.

Why? I’ll give two responses, which are similar but use different language.

1. There is a fallacy where people compare a future state to the present state, but this is incorrect. One has to compare two future states, because you don’t get to go back in time.

2. The “status quo” isn’t necessarily a stable equilibrium. The state of things now is not necessarily special nor guaranteed.

I’m now of the inclination to ask for a supporting model (not just one rationale) for any prediction, even ones that seem like common sense. Common sense can be a major blind spot.

replies(1): >>45075211 #

59. DrewADesign ◴[30 Aug 25 12:49 UTC] No.45074190[source]▶

>>45072652 #

Security and surveillance products don’t have to be perfect to be useful enough to some.

60. xpe ◴[30 Aug 25 12:51 UTC] No.45074209{3}[source]▶

>>45073155 #

Maybe as the direct effect, maybe not. Also think about second order effects: how would various interests respond? The desire for privacy is strong and people will search for ways to get it.

Have you looked into kinds of mitigations that cryptography offers? I’m not an expert, but I would expect there are ways to balance some degree of anonymity with some degree of human identity verification.

Perhaps there are some experts out there who can comment?

61. xpe ◴[30 Aug 25 12:53 UTC] No.45074218{3}[source]▶

>>45074013 #

Are you expecting a distilled model to be sufficiently powerful to capture the watermark? I wouldn’t.

Additionally, I don’t think the watermark has to be deterministic.

62. djoldman ◴[30 Aug 25 12:53 UTC] No.45074221[source]▶

>>45071677 (OP) #

Here's the SynthID paper:

https://www.nature.com/articles/s41586-024-08025-4.pdf

https://www.nature.com/articles/s41586-024-08025-4

63. DrewADesign ◴[30 Aug 25 12:56 UTC] No.45074232[source]▶

>>45073915 #

People use much more invasive tools than that. If it works, they don’t care.

I think it’s weirder that they’re clamoring to give people tools to detect AI while clamoring to present AI-generated content as perfectly normal— no different than if the user had typed it in themselves.

64. merelysounds ◴[30 Aug 25 12:56 UTC] No.45074237{4}[source]▶

>>45073841 #

Yes, the Wikipedia article mentions that and includes links to more sources:

> Both journalists and security experts have suggested that The Intercept's handling of the leaks by whistleblower Reality Winner, which included publishing secret NSA documents unredacted and including the printer tracking dots, was used to identify Winner as the leaker, leading to her arrest in 2017 and conviction.

65. daft_pink ◴[30 Aug 25 12:57 UTC] No.45074239{3}[source]▶

>>45073981 #

Maybe not initially but over time, I believe this will be the case.

66. stillsut ◴[30 Aug 25 13:04 UTC] No.45074289[source]▶

>>45071677 (OP) #

Hey I made an open source version of this last week (albeit for different purposes). Check it out at: https://github.com/sutt/innocuous

There's lot of room for contributions here, and I think "fingerprinting layer" is an under-valued part of the LLM stack, not being explored by enough entrants.

67. xpe ◴[30 Aug 25 13:16 UTC] No.45074378[source]▶

>>45073915 #

Have you weighed factors that would push in the other direction? This requires a synthesis, and it requires breaking out of the tendency to only think about factors that support one narrative.

To the extent watermarking technology builds trust and confidence in a product, this is a factor that moves against your prediction.

Talk is cheap. People sometimes make predictions just as easily as they generate words.

68. NoahZuniga ◴[30 Aug 25 13:30 UTC] No.45074524[source]▶

>>45072637 #

No this is very different. C2PA is just some extra metadata, it doens't watermark the image.

69. tiku ◴[30 Aug 25 13:38 UTC] No.45074603[source]▶

>>45071677 (OP) #

The whole NFT thing could be used to mark content, pictures and hashes of texts for example.

70. HPsquared ◴[30 Aug 25 13:41 UTC] No.45074633{4}[source]▶

>>45073754 #

Actually come to think of it, I suppose a "special camera" could also record things like focusing distance, zoom, and accelerations/rotation rates. These could be correlated to the image seen to detect this kind of thing.

replies(1): >>45080018 #

71. wiradikusuma ◴[30 Aug 25 13:50 UTC] No.45074684[source]▶

>>45071677 (OP) #

For image, what happen if I screenshot it? Will the watermark survive?

72. akoboldfrying ◴[30 Aug 25 13:59 UTC] No.45074751{6}[source]▶

>>45073997 #

> You can’t.

I like the digital signature approach in general, and have argued for it before, but this is the weak link. For photos and video, this might be OK if there's a way to reliably distinguish "photos of real things" from "photos of AI images"; for plain text, you basically need a keystroke-authenticating keyboard on a computer with both internet access and copy and paste functionality securely disabled -- and then you still need an authenticating camera on the user the whole time to make sure they aren't just asking Gemini on their phone and typing its answer in.

replies(1): >>45077091 #

73. spidersouris ◴[30 Aug 25 14:04 UTC] No.45074799[source]▶

>>45073168 #

There is a repo: https://github.com/google-deepmind/synthid-text

74. postquantumfax ◴[30 Aug 25 14:15 UTC] No.45074874[source]▶

>>45072652 #

Choosing the slightly less probable output is changing the quality of the output if it weren't LLMs wouldn't work by processing a large amount of data to get these probabilities as accurate as possible.

75. kimi ◴[30 Aug 25 14:48 UTC] No.45075121{3}[source]▶

>>45073419 #

For text, have a big model generate the "intelligent" answer, and then ask a local LLM to rephrase.

replies(1): >>45076846 #

76. jay-barronville ◴[30 Aug 25 14:57 UTC] No.45075211{4}[source]▶

>>45074189 #

> You aren’t specifying your point of comparison. A nightmare relative to what? You might be saying a nightmare relative to what we have now. Are you?

Very fair point.

And no, it’s less about the status quo and more about AI being the default. There are just too many reasons why this proposal, on its face, seems problematic to me. The following are some questions to highlight just a few of them:

- How exactly would “human creators [applying] their own digital signatures to the original pieces they created” work for creators who have already passed away?

- How fair exactly would it be to impose such a requirement when large portions of the world’s creators (especially in underdeveloped areas) would likely not be able to access and use the necessary software?

- How exactly do anonymous and pseudonymous creators survive such a requirement?

77. mingtianzhang ◴[30 Aug 25 18:17 UTC] No.45076846{4}[source]▶

>>45075121 #

Yeah exactly, you can always do that by using another model that doesn't have the watermark.

78. shkkmo ◴[30 Aug 25 18:36 UTC] No.45076968{4}[source]▶

>>45073244 #

> It’s basically document signing with key pairs —- old tech that is known to work.

I understand the technical side of the suggestion. The social and practical side is inevitably flawed.

You need some sort of global registry of public keys. Not only does each registrar have to be trusted but you also need to both trust every single real person to protect and not misuse their keys.

Leaving aside the complete practical infeasability that, even if you accomplish it, you now have a unique identifier tied to every piece of text. There will inevitably be both legal processes to identify who produce a signed work as well as data analysis approaches to deanonamize the public keys.

The end result is pretty clearly that anyone wishing to present material that purports to be human made has to forgo anonymity/pseudonymity. Claiming otherwise is like claiming we can have a secure government backdoor for encryption.

79. shkkmo ◴[30 Aug 25 18:44 UTC] No.45077007{4}[source]▶

>>45073831 #

The way those work is primarily through a combination of obscurity (most people don't know they exist) and through a lack of real finacial incentive to break them at scale.

I would also argue that those techniques do greatly reduce privacy and anonymity.

80. shkkmo ◴[30 Aug 25 18:55 UTC] No.45077091{7}[source]▶

>>45074751 #

> for plain text, you basically need a keystroke-authenticating keyboard on a computer with both internet access and copy and paste functionality securely disabled -- and then you still need an authenticating camera on the user the whole time to make sure they aren't just asking Gemini on their phone and typing its answer in.

Which is why I say it would destroy privacy/pseudonymity.

> For photos and video, this might be OK if there's a way to reliably distinguish "photos of real things" from "photos of AI images";

I suspect if you think about it, many of the issues with text also apply to images and videos.

You'd need a secure enclave You'd need a chain of signatures and images to allow human editing. You'd need a way of revoking the public keys of not just insecure software, but bad actors. You would need verified devices to prevent allowing AI tooling the using software to edit the image....etc.

This are only the flaws I can think of in like 5 minutes. You've created a huge incentive to break an incredibly complex system. I have no problem comfortably saying that the end result is a complete lack of privacy for most people while those with power/knowledge would still be able to circumvent it.

81. trehans ◴[30 Aug 25 21:03 UTC] No.45077978[source]▶

>>45072652 #

For answers like that, it probably wouldn't matter whether it was AI-generated or not. It becomes more relevant with long-form generated content

82. tough ◴[31 Aug 25 03:06 UTC] No.45080018{5}[source]▶

>>45074633 #

ROC Camera does exactly this

> Creates a Zero Knowledge (ZK) Proof of the camera sensor data and other metadata

https://roc.camera/

83. umbra07 ◴[31 Aug 25 08:45 UTC] No.45081558{3}[source]▶

>>45073419 #

UMG (music label) has been watermarking their music for many years now, and I'm unaware of any tool to remove their watermarks.

replies(1): >>45092386 #

84. mingtianzhang ◴[01 Sep 25 13:00 UTC] No.45092386{4}[source]▶

>>45081558 #

Do you think if such a tool exists, will it benefit the community or not?

↑