It's easier than ever to de-censor videos

(www.jeffgeerling.com)

387 points DamonHD | 1 comments | 15 Apr 25 17:12 UTC | HN request time: 0.262s | source

Show context

lynndotpy ◴[15 Apr 25 20:19 UTC] No.43697899[source]▶

> Years ago it would've required a supercomputer and a PhD to do this stuff

This isn't actually true. You could do this 20 years ago on a consumer laptop, and you don't need the information you get for free from text moving under a filter either.

What you need is the ability to reproduce the conditions the image was generated and pixelated/blurred under. If the pixel radius only encompasses, say, 4 characters, then you only need to search for those 4 characters first. And then you can proceed to the next few characters represented under the next pixelated block.

You can think of pixelation as a bad hash which is very easy to find a preimage for.

No motion necessary. No AI necessary. No machine learning necessary.

The hard part is recreating the environment though, and AI just means you can skip having that effort and know-how.

replies(4): >>43697947 #>>43698101 #>>43698597 #>>43698629 #

nartho ◴[15 Apr 25 21:27 UTC] No.43698597[source]▶

>>43697899 #

Noob here, can you elaborate on this ? if you take for example a square of 25px and change the value of each individual pixels to the average color of the group, most of the data is lost, no ? if the group of pixels are big enough can you still undo it ?

replies(6): >>43698743 #>>43698999 #>>43699022 #>>43699023 #>>43699026 #>>43711797 #

1. lynndotpy ◴[15 Apr 25 22:08 UTC] No.43699023[source]▶

>>43698597 #

TLDR: Most of the data is indeed "lost". If the group of pixels are big enough, this method alone becomes infeasible.

More details:

The larger the group of pixels, the more characters you'd have to guess, and so the longer this would take. Each character makes it combinatorially more difficult

To make matters worse, by the pigeonhole principle, you are guaranteed to have collisions (i.e. two different sets of characters which pixelate to the same value). E.g. A space with just 6 possible characters, even if limited to a-zA-Z0-9, that's 62*6 = 56800235584, while you can expect at most 2048 color values for it to map to.

(Side note: That's 2048 colors, not 256, between #000000 and #FFFFFF. This is because your pixelation / mosaic algorithm can have eight steps inclusive between, say, #000000 and #010101. That's #000000, #000001, #000100, #010000, #010001, #010100, #000101, and #010101.

Realistically, in scenarios where you wouldn't have pixel-perfect reproduction, you'd need to generate all the combos and sort by closest to the target color, possibly also weighted by a prior on the content of the text. This is even worse, since you might have too many combinations to store.)

So, at 25 pixel blocks, encompassing many characters, you're going to have to get more creative with this. (Remember, just 6 alphanumeric characters = 56 billion combinations.)

Thinking about this as "finding the preimage of a hash", you might take a page from the password cracking toolset and assume priors on the data. (I.e. Start with blocks of text that are more likely, rather than random strings or starting from 'aaaaaa' and counting up.)

↑