AV1@Scale: Film Grain Synthesis, The Awakening

(netflixtechblog.com)

252 points CharlesW | 3 comments | 03 Jul 25 16:34 UTC | HN request time: 0s | source

Show context

crazygringo ◴[03 Jul 25 20:54 UTC] No.44459098[source]▶

This fails to acknowledge that synthesized noise can lack the detail and information in the original noise.

When you watch a high-quality encode that includes the actual noise, there is a startling increase in resolution from seeing a still to seeing the video. The noise is effectively dancing over a signal, and at 24 fps the signal is still perfectly clear behind it.

Whereas if you lossily encode a still that discards the noise and then adds back artificial noise to match the original "aesthetically", the original detail is non-recoverable if this is done frame-by-frame. Watching at 24 fps produces a fundamentally blurrier viewing experience. And it's not subtle -- on old noisy movies the difference in detail can be 2x.

Now, if h.265 or AV1 is actually building its "noise-removed" frames by always taking into account several preceding and following frames while accounting for movement, it could in theory discover the signal of the full detail across time and encode that, and there wouldn't be any loss in detail. But I don't think it does? I'd love to know if I'm mistaken.

But basically, the point is: comparing noise removal and synthesis can't be done using still images. You have to see an actual video comparison side-by-side to determine if detail is being thrown away or preserved. Noise isn't just noise -- noise is detail too.

replies(7): >>44459330 #>>44459689 #>>44460601 #>>44461005 #>>44463130 #>>44465357 #>>44467163 #

kderbe ◴[03 Jul 25 21:25 UTC] No.44459330[source]▶

>>44459098 #

Grain is independent frame-to-frame. It doesn't move with the objects in the scene (unless the video's already been encoded strangely). So long as the synthesized noise doesn't have an obvious temporal pattern, comparing stills should be fine.

Regarding aesthetics, I don't think AV1 synthesized grain takes into account the size of the grains in the source video, so chunky grain from an old film source, with its big silver halide crystals, will appear as fine grain in the synthesis, which looks wrong (this might be mitigated by a good film denoiser). It also doesn't model film's separate color components properly, but supposedly that doesn't matter because Netflix's video sources are often chroma subsampled to begin with: https://norkin.org/pdf/DCC_2018_AV1_film_grain.pdf

Disclaimer: I just read about this stuff casually so I could be wrong.

replies(6): >>44459691 #>>44460021 #>>44460119 #>>44460217 #>>44460409 #>>44461097 #

alright2565 ◴[03 Jul 25 22:31 UTC] No.44459691[source]▶

>>44459330 #

I think you've missed the point here: the noise in the originals acts as dithering, and increases the resolution of the original video. This is similar to the noise introduced intentionally in astronomy[1] and in signal processing[2].

Smoothing the noise out doesn't make use of that additional resolution, unless the smoothing happens over the time axis as well.

Perfectly replicating the noise doesn't help in this situation.

[1]: https://telescope.live/blog/improve-image-quality-dithering [2] https://electronics.stackexchange.com/questions/69748/using-...

replies(1): >>44460016 #

1. kderbe ◴[03 Jul 25 23:38 UTC] No.44460016[source]▶

>>44459691 #

Your first link doesn't seem to be about introducing noise, but removing it by averaging the value of multiple captures. The second is to mask quantizer-correlated noise in audio, which I'd compare to spatial masking of banding artifacts in video.

Noise is reduced to make the frame more compressible. This reduces the resolution of the original only because it inevitably removes some of the signal that can't be differentiated from noise. But even after noise reduction, successive frames of a still scene retain some frame-to-frame variance, unless the noise removal is too aggressive. When you play back that sequence of noise-reduced frames you still get a temporal dithering effect.

replies(1): >>44460218 #

2. magicalhippo ◴[04 Jul 25 00:26 UTC] No.44460218[source]▶

>>44460016 (TP) #

Here's[1] a more concrete source, which summarizes dithering in analog to digital converters as follows:

With no dither, each analog input voltage is assigned one and only one code. Thus, there is no difference in the output for voltages located on the same ‘‘step’’ of the ADC’s ‘‘staircase’’ transfer curve. With dither, each analog input voltage is assigned a probability distribution for being in one of several digital codes. Now, different voltages with-in the same ‘‘step’’ of the original ADC transfer function are assigned different probability distributions. Thus, one can see how the resolution of an ADC can be improved to below an LSB.

In actual film, I presume the random inconsistencies of the individual silver halide grains is the noise source, and when watching such a film, I presume the eyes are doing the averaging through persistence of vision[2].

In either case, a key point is that you can't bring back any details by adding noise after the fact.

[1]: https://www.ti.com/lit/an/snoa232/snoa232.pdf section 3.0 - Dither

[2]: https://en.wikipedia.org/wiki/Persistence_of_vision

replies(1): >>44460540 #

3. adgjlsfhk1 ◴[04 Jul 25 02:01 UTC] No.44460540[source]▶

>>44460218 #

One thing worth noting is that this extra detail from dithering can be recovered when denoising by storing the image to higher precision. This is a lot of the reason 10 bit AV1 is so popular. It turns out that by adding extra bits of image, you end up with an image that is easier to compress accurately since the encoder has lower error from quantization.

↑