←back to thread

1503 points participant3 | 4 comments | | HN request time: 0.202s | source
1. traverseda ◴[] No.43575285[source]
I don't understand why problems like this aren't solved by vector similarity search. Indiana Jones lives in a particular part of vector space.

Two close to one of the licensed properties you care to censor the generation of? Push that vector around. Honestly detecting whether a given sentence is a thinly veiled reference to indiana jones seems to be exactly the kind of thing AI vector search is going to be good at.

replies(2): >>43575341 #>>43575408 #
2. htrp ◴[] No.43575341[source]
Not worth it to compute the embedding for Indy and a "bull-whip archaeologist" most guardrails operate at the input level it seems?
replies(1): >>43575543 #
3. genericone ◴[] No.43575408[source]
Thinking of it in terms of vector similarity does seem appropriate, and then definition of similarity suddenly comes into debate: If you don't get Harrison Ford, but a different well-known actor along with everything else Indiana-Jones, what is that? Do you flatten the vector similarity matrix to a single infringement-scale?
4. gavmor ◴[] No.43575543[source]
> Not worth it to compute the embedding for Indy

If IP holders submit embeddings for their IP, how can image generators "warp" the latent space around a set of embeddings so that future inferences slide around and avoid them--not perfectly, or literally, but as a function of distance, say, following a power curve?

Maybe by "Finding non-linear RBF paths in GAN latent space"[0] to create smooth detours around protected regions.

0. https://openaccess.thecvf.com/content/ICCV2021/papers/Tzelep...