Most active commenters

arghwhat(3)

Popular/hot comments

>>44459330 #

←back to thread

AV1@Scale: Film Grain Synthesis, The Awakening

(netflixtechblog.com)

1. crazygringo ◴[03 Jul 25 20:54 UTC] No.44459098[source]▶

>>44456779 (OP) #

This fails to acknowledge that synthesized noise can lack the detail and information in the original noise.

When you watch a high-quality encode that includes the actual noise, there is a startling increase in resolution from seeing a still to seeing the video. The noise is effectively dancing over a signal, and at 24 fps the signal is still perfectly clear behind it.

Whereas if you lossily encode a still that discards the noise and then adds back artificial noise to match the original "aesthetically", the original detail is non-recoverable if this is done frame-by-frame. Watching at 24 fps produces a fundamentally blurrier viewing experience. And it's not subtle -- on old noisy movies the difference in detail can be 2x.

Now, if h.265 or AV1 is actually building its "noise-removed" frames by always taking into account several preceding and following frames while accounting for movement, it could in theory discover the signal of the full detail across time and encode that, and there wouldn't be any loss in detail. But I don't think it does? I'd love to know if I'm mistaken.

But basically, the point is: comparing noise removal and synthesis can't be done using still images. You have to see an actual video comparison side-by-side to determine if detail is being thrown away or preserved. Noise isn't just noise -- noise is detail too.

replies(7): >>44459330 #>>44459689 #>>44460601 #>>44461005 #>>44463130 #>>44465357 #>>44467163 #

2. kderbe ◴[03 Jul 25 21:25 UTC] No.44459330[source]▶

>>44459098 (TP) #

Grain is independent frame-to-frame. It doesn't move with the objects in the scene (unless the video's already been encoded strangely). So long as the synthesized noise doesn't have an obvious temporal pattern, comparing stills should be fine.

Regarding aesthetics, I don't think AV1 synthesized grain takes into account the size of the grains in the source video, so chunky grain from an old film source, with its big silver halide crystals, will appear as fine grain in the synthesis, which looks wrong (this might be mitigated by a good film denoiser). It also doesn't model film's separate color components properly, but supposedly that doesn't matter because Netflix's video sources are often chroma subsampled to begin with: https://norkin.org/pdf/DCC_2018_AV1_film_grain.pdf

Disclaimer: I just read about this stuff casually so I could be wrong.

replies(6): >>44459691 #>>44460021 #>>44460119 #>>44460217 #>>44460409 #>>44461097 #

3. arghwhat ◴[03 Jul 25 22:31 UTC] No.44459689[source]▶

>>44459098 (TP) #

The noise does not contain a signal, does not dance over it, and is not detail. It is purely random fluctuations that are added to a signal.

If you have a few static frames and average them, you improve SNR by retaining the unchanged signal and having the purely random noise cancel itself out. Retaining noise itself is not useful.

I suspect the effect you might be seeing is either just an aesthetic preference for the original grain behavior, or that you are comparing low bandwidth content with heavy compression artifacts like smoothing/low pass filtering (not storing fine detail saves significant bandwidth) to high bandwidth versions that maintain full detail, entirely unrelated to the grain overlaid on top.

replies(2): >>44462135 #>>44462977 #

4. alright2565 ◴[03 Jul 25 22:31 UTC] No.44459691[source]▶

>>44459330 #

I think you've missed the point here: the noise in the originals acts as dithering, and increases the resolution of the original video. This is similar to the noise introduced intentionally in astronomy[1] and in signal processing[2].

Smoothing the noise out doesn't make use of that additional resolution, unless the smoothing happens over the time axis as well.

Perfectly replicating the noise doesn't help in this situation.

[1]: https://telescope.live/blog/improve-image-quality-dithering [2] https://electronics.stackexchange.com/questions/69748/using-...

replies(1): >>44460016 #

5. kderbe ◴[03 Jul 25 23:38 UTC] No.44460016{3}[source]▶

>>44459691 #

Your first link doesn't seem to be about introducing noise, but removing it by averaging the value of multiple captures. The second is to mask quantizer-correlated noise in audio, which I'd compare to spatial masking of banding artifacts in video.

Noise is reduced to make the frame more compressible. This reduces the resolution of the original only because it inevitably removes some of the signal that can't be differentiated from noise. But even after noise reduction, successive frames of a still scene retain some frame-to-frame variance, unless the noise removal is too aggressive. When you play back that sequence of noise-reduced frames you still get a temporal dithering effect.

replies(1): >>44460218 #

6. TD-Linux ◴[03 Jul 25 23:39 UTC] No.44460021[source]▶

>>44459330 #

The AR coefficients described in the paper are what allow basic modeling of the scale of the noise.

> In this case, L = 0 corresponds to the case of modeling Gaussian noise whereas higher values of L may correspond to film grain with larger size of grains.

7. zoky ◴[04 Jul 25 00:00 UTC] No.44460119[source]▶

>>44459330 #

> Grain is independent frame-to-frame. It doesn't move with the objects in the scene (unless the video's already been encoded strangely)

That might seem like a reasonable assumption, but in practice it’s not really the case. Due to nonlinear response curves, adding noise to a bright part of an image has far less effect than a darker part. If the image is completely blown out the grain may not be discernible at all. So practically speaking, grain does travel with objects in a scene.

This means detail is indeed encoded in grain to an extent. If you algorithmically denoise an image and then subtract the result from the original to get only the grain, you can easily see “ghost” patterns in the grain that reflect the original image. This represents lost image data that cannot be recovered by adding synthetic grain.

replies(2): >>44460195 #>>44460512 #

8. creato ◴[04 Jul 25 00:20 UTC] No.44460195{3}[source]▶

>>44460119 #

> If you algorithmically denoise an image and then subtract the result from the original to get only the grain, you can easily see “ghost” patterns in the grain that reflect the original image. This represents lost image data that cannot be recovered by adding synthetic grain.

The synthesized grain is dependent on the brightness. If you were to just replace the frames with the synthesized grain described in the OP post instead of adding it, you would see something very similar.

9. crazygringo ◴[04 Jul 25 00:26 UTC] No.44460217[source]▶

>>44459330 #

> Grain is independent frame-to-frame. It doesn't move with the objects in the scene (unless the video's already been encoded strangely). So long as the synthesized noise doesn't have an obvious temporal pattern, comparing stills should be fine.

Sorry if I wasn't clear -- I was referring to the underlying objects moving. The codec is trying to capture those details, the same way our eye does.

But regardless of that, you absolutely cannot compare stills. Stills do not allow you to compare against the detail that is only visible over a number of frames.

10. magicalhippo ◴[04 Jul 25 00:26 UTC] No.44460218{4}[source]▶

>>44460016 #

Here's[1] a more concrete source, which summarizes dithering in analog to digital converters as follows:

With no dither, each analog input voltage is assigned one and only one code. Thus, there is no difference in the output for voltages located on the same ‘‘step’’ of the ADC’s ‘‘staircase’’ transfer curve. With dither, each analog input voltage is assigned a probability distribution for being in one of several digital codes. Now, different voltages with-in the same ‘‘step’’ of the original ADC transfer function are assigned different probability distributions. Thus, one can see how the resolution of an ADC can be improved to below an LSB.

In actual film, I presume the random inconsistencies of the individual silver halide grains is the noise source, and when watching such a film, I presume the eyes are doing the averaging through persistence of vision[2].

In either case, a key point is that you can't bring back any details by adding noise after the fact.

[1]: https://www.ti.com/lit/an/snoa232/snoa232.pdf section 3.0 - Dither

[2]: https://en.wikipedia.org/wiki/Persistence_of_vision

replies(1): >>44460540 #

11. godelski ◴[04 Jul 25 01:23 UTC] No.44460409[source]▶

>>44459330 #

People often assume noise is normal and IID but it usually isn't. It's s fine approximation but isn't the same thing, which is what the parent is discussing.

Here's an example that might help you intuit why this is true.

Let's suppose you have a digital camera and walk towards a radiation source and then away. Each radioactive particle that hits the CCD causes it to over saturate, creating visible noise in the image. The noise it introduces is random (Poisson) but your movement isn't.

Now think about how noise is introduced. There's a lot of ways actually, but I'm sure this thought exercise will reveal to you how some cause noise across frames to be dependent. Maybe as a first thought, think about from sitting on a shelf degrading.

replies(1): >>44460676 #

12. wyager ◴[04 Jul 25 01:50 UTC] No.44460512{3}[source]▶

>>44460119 #

It sounds like the "scaling function" mentioned in the article may be intended to account for the nonlinear interaction of the noise.

13. adgjlsfhk1 ◴[04 Jul 25 02:01 UTC] No.44460540{5}[source]▶

>>44460218 #

One thing worth noting is that this extra detail from dithering can be recovered when denoising by storing the image to higher precision. This is a lot of the reason 10 bit AV1 is so popular. It turns out that by adding extra bits of image, you end up with an image that is easier to compress accurately since the encoder has lower error from quantization.

14. cma ◴[04 Jul 25 02:16 UTC] No.44460601[source]▶

>>44459098 (TP) #

You could do an ai or compressed sensing upscale first, with multiple frames helping, have that be the base video sent, and then regranularize that.

15. notpushkin ◴[04 Jul 25 02:33 UTC] No.44460676{3}[source]▶

>>44460409 #

I think this is geared towards film grain noise, which is independent from movement?

replies(1): >>44461826 #

16. dperfect ◴[04 Jul 25 03:52 UTC] No.44461005[source]▶

>>44459098 (TP) #

This is a really good point.

To illustrate the temporal aspect: consider a traditional film projector. Between every frame, we actually see complete darkness for a short time. We could call that darkness "noise", and if we were to linger on that moment, we'd see nothing of the original signal. But since our visual systems tend to temporally average things out to a degree, we barely even notice that flicker (https://en.wikipedia.org/wiki/Flicker_fusion_threshold). I suspect noise and grain are perceived in a similar way, where they become less pronounced compared to the stable parts of the signal/image.

Astrophotographers stack noisy images to obtain images with higher SNR. I think our brains do a bit of that too, and it doesn't mean we're hallucinating detail that isn't there; the recorded noise - over time - returns to the mean, and that mean represents a clearer representation of the actual signal (though not entirely, due to systematic/non-random noise, but that's often less significant).

Denoising algorithms that operate on individual frames don't have that context, so they will lose detail (or will try to compensate by guessing). AV1 doesn't specify a specific algorithm to use, so I suppose in theory, a smart algorithm could use the temporal context to preserve some additional detail.

17. majormajor ◴[04 Jul 25 04:19 UTC] No.44461097[source]▶

>>44459330 #

> So long as the synthesized noise doesn't have an obvious temporal pattern, comparing stills should be fine.

The problem is that the initial noise-removal and compression passes still removed detail (that is more visible in motion than in stills) that you aren't adding back.

If you do noise-removal well you don't have to lose detail over time.

But it's much harder to do streaming-level video compression on a noisy source without losing that detail.

The grain they're adding somewhat distracts from the compression blurriness but doesn't bring back the detail.

replies(1): >>44461153 #

18. Thorrez ◴[04 Jul 25 04:36 UTC] No.44461153{3}[source]▶

>>44461097 #

>The grain they're adding somewhat distracts from the compression blurriness but doesn't bring back the detail.

Instead of wasting bits trying to compress noise, they can remove noise first, then compress, then add noise back. So now there aren't wasted bits compressing noise, and those bits can be used to compress detail instead of noise. So if you compare FGS compression vs non-FGS compression at the same bitrate, the FGS compression did add some detail back.

19. godelski ◴[04 Jul 25 06:58 UTC] No.44461826{4}[source]▶

>>44460676 #

It's the same thing. Yes, not related to the movement of the camera, but I thought that would be easier to build your intuition about silver particles being deposited onto film. You make in batches, right?

The point is that just because things are random doesn't mean there aren't biases.

To get much more accurate, it helps to understand what randomness actually is. It is a measurement of uncertainty. A measurement of the unknown. This is even true for quantum processes that are truly random. That means we can't know. But just because we can't know doesn't mean it's completely unknown, right? We have different types of distributions and different parameters in those distributions. That's what we're trying to build intuition about

20. Scaevolus ◴[04 Jul 25 07:42 UTC] No.44462135[source]▶

>>44459689 #

Denoising generally removes signal too. Removing noise and reconstituting similar noise to maintain the apparent qualities of an input can help compression, but you are also cutting out true details (typically fine detail).

The effect GP is pointing out is how denoisers damage detail, which is true. This detail can persist over multiple frames, which is why many denoisers include a temporal comparison component to mitigate the damage, but you still lose detail.

replies(1): >>44462243 #

21. arghwhat ◴[04 Jul 25 07:58 UTC] No.44462243{3}[source]▶

>>44462135 #

Definitely, but that's a result of compressing (with any algorithm) past a bitrate that would allow detail to be retained with high frequency details including noise being the first to go, not the result of noise having signal as was stated and more importantly unrelated to film grain synthesis which is about adding something new on the output side.

22. account42 ◴[04 Jul 25 09:47 UTC] No.44462977[source]▶

>>44459689 #

> If you have a few static frames and average them, you improve SNR by retaining the unchanged signal and having the purely random noise cancel itself out.

That's exactly the point of GP though. Even though each individual frame might be almost indistinguishable from random noise you can still extract patterns over time. This is also the case if you don't average the frames in software but let the viewer's brain do it. If you just remove all "noise" from each frame and then add random noise back those patterns will be lost.

In practice you won't have static frames but also movement so recovering the signal from the noise becomes a lot more complicated.

replies(1): >>44465325 #

23. FieryTransition ◴[04 Jul 25 10:14 UTC] No.44463130[source]▶

>>44459098 (TP) #

I love this concept/principle, one similar example I often bring up when I talk about machine learning, is comparing how a human would analyse night footage from a camera, and how a ML algorithm can pick up things no human would think about, even artifacts from the sensors which can be used as features. Noise is rarely ever just noise.

24. arghwhat ◴[04 Jul 25 15:22 UTC] No.44465325{3}[source]▶

>>44462977 #

Anything with a pattern is by definition not noise, and the comment was that noise had signal. If you remove all noise, no signal or pattern is lost by definition.

However, the issue is that lossy compression removes various types of minute detail, smoothing surfaces to reduce the amount of data that has to be stored, be it noise "grain" or skin pores, according to compression settings. Storing the original noise as it was would basically make any compression impossible.

25. hungmung ◴[04 Jul 25 15:27 UTC] No.44465357[source]▶

>>44459098 (TP) #

Some of the new 4K discs use DRR and the denoising process seems to remove the pores on people's faces occasionally, leaving actors looking like their face is made of wax.

26. BoingBoomTschak ◴[04 Jul 25 19:24 UTC] No.44467163[source]▶

>>44459098 (TP) #

Don't want to sound too snarky, but aren't you just saying that good denoisers must be temporal in addition to spatial? Something like an improved V-BM3D (cf https://arxiv.org/abs/2001.01802) and even traditional V-BM3D works fine even on non-Gaussian noise.

Together with external noise generation (something like https://old.reddit.com/r/AV1/comments/r86nsb/custom_photonno...) that'd support external denoising (pass the reference and denoised video to get the grain spec), the whole FGS thing would be much more interesting.

↑