Netflix’s AV1 Journey: From Android to TVs and Beyond

(netflixtechblog.com)

548 points CharlesW | 1 comments | 05 Dec 25 00:09 UTC | HN request time: 0s | source

Show context

VerifiedReports ◴[05 Dec 25 03:30 UTC] No.46156575[source]▶

I had forgotten about the film-grain extraction, which is a clever approach to a huge problem for compression.

But... did I miss it, or was there no mention of any tool to specify grain parameters up front? If you're shooting "clean" digital footage and you decide in post that you want to add grain, how do you convey the grain parameters to the encoder?

It would degrade your work and defeat some of the purpose of this clever scheme if you had to add fake grain to your original footage, feed the grainy footage to the encoder to have it analyzed for its characteristics and stripped out (inevitably degrading real image details at least a bit), and then have the grain re-added on delivery.

So you need a way to specify grain characteristics to the encoder directly, so clean footage can be delivered without degradation and grain applied to it upon rendering at the client.

replies(2): >>46156722 #>>46163995 #

crazygringo ◴[05 Dec 25 04:01 UTC] No.46156722[source]▶

>>46156575 #

You just add it to your original footage, and accept whatever quality degradation that grain inherently provides.

Any movie or TV show is ultimately going to be streamed in lots of different formats. And when grain is added, it's often on a per-shot basis, not uniformly. E.g. flashback scenes will have more grain. Or darker scenes will have more grain added to emulate film.

Trying to tie it to the particular codec would be a crazy headache. For a solo project it could be doable but I can't ever imagine a streamer building a source material pipeline that would handle that.

replies(1): >>46157637 #

VerifiedReports ◴[05 Dec 25 07:17 UTC] No.46157637[source]▶

>>46156722 #

Mmmm, no, because if the delivery conduit uses AV1, you can optimize for it and realize better quality by avoiding the whole degrading round of grain analysis and stripping.

"I can't ever imagine a streamer building a source material pipeline that would handle that."

That's exactly what the article describes, though. It's already built, and Netflix is championing this delivery mechanism. Netflix is also famous for dictating technical requirements for source material. Why would they not want the director to be able to provide a delivery-ready master that skips the whole grain-analysis/grain-removal step and provides the best possible image quality?

Presumably the grain extraction/re-adding mechanism described here handles variable grain throughout the program. I don't know why you'd assume that it doesn't. If it didn't, you'd wind up with a single grain level for the entire movie; an entirely unacceptable result for the very reason you mention.

This scheme loses a major opportunity for new productions unless the director can provide a clean master and an accompanying "grain track." Call it a GDL: grain decision list.

This would also be future-proof; if a new codec is devised that also supports this grain layer, the parameters could be translated from the previous master into the new codec. I wish Netflix could go back and remove the hideous soft-focus filtration from The West Wing, but nope; that's baked into the footage forever.

replies(2): >>46162600 #>>46163219 #

crazygringo ◴[05 Dec 25 16:09 UTC] No.46163219[source]▶

>>46157637 #

You're misunderstanding.

> if the delivery conduit uses AV1, you can optimize for it

You could, in theory, as I confirmed.

> It's already built, and Netflix is championing this delivery mechanism.

No it's not. AV1 encoding is already built. Not a pipeline where source files come without noise but with noise metadata.

> and provides the best possible image quality?

The difference in quality is not particularly meaningful. Advanced noise-reduction algorithms already average out pixel values across many frames to recover a noise-free version that is quite accurate (including accounting for motion), and when the motion/change is so overwhelming that this doesn't work, it's too fast for the eye to be perceiving that level of detail anyways.

> This scheme loses a major opportunity for new productions unless the director can provide a clean master and an accompanying "grain track."

Right, that's what you're proposing. But it doesn't exist. And it's probably never going to exist, for good reason.

Production houses generally provide digital masters in IMF format (which is basically JPEG2000), or sometimes ProRes. At a technical level, a grain track could be invented. But it basically flies in the face of the idea that the pixel data itself is the final "master". In the same way, color grading and vector graphics aren't provided as metadata either, even though they could be in theory.

Once you get away from the idea that the source pixels are the ultimate source of truth and put additional postprocessing into metadata, it opens up a whole can of worms where different streamers interpret the metadata differently, like some streamers might choose to never add noise and so the shows look different and no longer reflect the creator's intent.

So it's almost less of a technical question and more of a philosophical question about what represents the finished product. And the industry has long decided that the finished product is the pixels themselves, not layers and effects that still need to be composited.

> I wish Netflix could go back and remove the hideous soft-focus filtration from The West Wing, but nope; that's baked into the footage forever.

In case you're not aware, it's not a postproduction filter -- the soft focus was done with diffusion filters on the cameras themselves, as well as choice of film stock. And that was the creative intent at the time. Trying to "remove" it would be like trying to pretend it wasn't the late-90's network drama that it was.

replies(1): >>46166659 #

VerifiedReports ◴[05 Dec 25 20:13 UTC] No.46166659[source]▶

>>46163219 #

Nothing in there indicates "misunderstanding." You're simply declaring, without evidence, that the difference in quality "is not particularly meaningful." Whether it's meaningful or not to you is irrelevant; the point is that it's unnecessary.

You are ignoring the fact that the scheme described in the article does not retain the pixel data any more than what I'm proposing does; in fact, it probably retains less, even if only slightly. The analysis phase examines grain, comes up with a set of parameters to simulate it, and then removes it. When it's re-added, it's only a generated simulation. The integrity of the "pixel data" you're citing is lost. So you might as well just allow content creators to skip the pointless adding/analyzing/removing of grain and provide the "grain" directly.

Furthermore, you note that the creator may provide the footage as a JPEG2000 (DCP) or ProRes master; both of those use lossy compression that will waste quality on fake grain that's going to be stripped anyway.

Would they deliver this same "clean" master along with grain metadata to services not using AV1 or similar? Nope. In that case they'd bake the grain in and be on their way.

The article describes a stream of grain metadata to accompany each frame or shot, to be used to generate grain on the fly. It was acquired through analysis of the footage. It is totally reasonable to suggest that this analysis step can be eliminated and the metadata provided by the creator expressly.

And yes I'm well aware that West Wing was shot with optical filters; that's the point of my comment. The dated look is baked in. If the creators or owner wanted to rein in or eliminate it to make the show more relatable to modern audiences, they couldn't. Whether they should is a matter of opinion. But if you look at the restoration and updating of the Star Trek original series, you see that it's possible to reduce the visual cheesiness and yet not go so far as to ruin the flavor of the show.

replies(2): >>46168411 #>>46179045 #

1. indolering ◴[07 Dec 25 03:59 UTC] No.46179045[source]▶

>>46166659 #

Yes, it's technically possible. But what you are suggesting is basically a dynamic filter. The problem is that codes are designed for end delivery and have very specific practical constraints.

For example, we could GREATLY improve compression ratios if we could reference key frames anywhere in the file. But devices only have so much memory bandwidth and users need to be able to seek while streaming on a 4g connection on a commuter train. I would really like to see memes make use of SVG filters and the like, but basically everyone flattens them into a bitmap and does OCR to extract metadata.

It's also really depressing how little effort is put into encoding, even by the hyper-scalers. Resolution (SD, HD, 4k and 8k) is basically the ONLY knob used for bitrate and quality management. I would much prefer to have 10 bit color over an 8K stream yet every talking head documentary with colored gradient backgrounds has banding.

Finally, there is the horror that are decoders. There a reference files that use formal verification to excise every part of a codec's spec. But Hollywood studios have dedicated movie theaters with all of the major projectors and they pay people to prescreen movies just to try and catch encoding/decoding glitches. And even that fails sometimes.

So sure, anything is possible. Flash was very popular in the 56k days because it rendered everything on the end device. But that entails other tradeoffs like inconsistent rendering and variable performance requirements. Codecs today do something very similar: describe bitmap data using increasingly sophisticated mathematical representations. But they are more consistent and simplify the entire stack by (for example) eliminating a VM. Just run PDF torture tests through your printer if you want an idea of how little end devices care about rendering intent.

↑