AV1@Scale: Film Grain Synthesis, The Awakening

This fails to acknowledge that synthesized noise can lack the detail and information in the original noise.

When you watch a high-quality encode that includes the actual noise, there is a startling increase in resolution from seeing a still to seeing the video. The noise is effectively dancing over a signal, and at 24 fps the signal is still perfectly clear behind it.

Whereas if you lossily encode a still that discards the noise and then adds back artificial noise to match the original "aesthetically", the original detail is non-recoverable if this is done frame-by-frame. Watching at 24 fps produces a fundamentally blurrier viewing experience. And it's not subtle -- on old noisy movies the difference in detail can be 2x.

Now, if h.265 or AV1 is actually building its "noise-removed" frames by always taking into account several preceding and following frames while accounting for movement, it could in theory discover the signal of the full detail across time and encode that, and there wouldn't be any loss in detail. But I don't think it does? I'd love to know if I'm mistaken.

But basically, the point is: comparing noise removal and synthesis can't be done using still images. You have to see an actual video comparison side-by-side to determine if detail is being thrown away or preserved. Noise isn't just noise -- noise is detail too.

This is a really good point.

To illustrate the temporal aspect: consider a traditional film projector. Between every frame, we actually see complete darkness for a short time. We could call that darkness "noise", and if we were to linger on that moment, we'd see nothing of the original signal. But since our visual systems tend to temporally average things out to a degree, we barely even notice that flicker (https://en.wikipedia.org/wiki/Flicker_fusion_threshold). I suspect noise and grain are perceived in a similar way, where they become less pronounced compared to the stable parts of the signal/image.

Astrophotographers stack noisy images to obtain images with higher SNR. I think our brains do a bit of that too, and it doesn't mean we're hallucinating detail that isn't there; the recorded noise - over time - returns to the mean, and that mean represents a clearer representation of the actual signal (though not entirely, due to systematic/non-random noise, but that's often less significant).

Denoising algorithms that operate on individual frames don't have that context, so they will lose detail (or will try to compensate by guessing). AV1 doesn't specify a specific algorithm to use, so I suppose in theory, a smart algorithm could use the temporal context to preserve some additional detail.