Improving performance of rav1d video decoder

(ohadravid.github.io)

305 points todsacerdoti | 1 comments | 22 May 25 11:59 UTC | HN request time: 0s | source

Show context

tialaramex ◴[22 May 25 13:07 UTC] No.44061686[source]▶

All being equal codecs ought to be in WUFFS† rather than Rust, but I can well imagine that it's a much bigger lift to take something as complicated as dav1d and write the analogous WUFFS than to clean up the c2rust translation, if you said a thousand times harder I'd have no trouble believing that. I just think it's worth it for us as a civilisation.

† Or an equivalent special purpose language, but WUFFS is right there

replies(1): >>44061961 #

IgorPartola ◴[22 May 25 13:40 UTC] No.44061961[source]▶

>>44061686 #

WUFFS would be great for parsing container files (Matroska, webm, mp4) but it does not seem at all suitable for a video decoder. Without dynamic memory allocation it would be challenging to deal with dynamic data. Video codecs are not simply parsing a file to get the data, they require quite a bit of very dynamic state to be managed.

replies(1): >>44062041 #

lubesGordi ◴[22 May 25 13:49 UTC] No.44062041[source]▶

>>44061961 #

Requiring dynamic state seems not obvious to me. At the end of the day you have a fixed number of pixels on the screen. If every single pixel changes from frame to frame that should constitute the most work your codec has to do, no? I'm not a codec writer but that's my intuition based on the assumption that codecs are basically designed to minimize the amount of 'work' being done from frame to frame.

replies(5): >>44062055 #>>44062122 #>>44062124 #>>44062182 #>>44063139 #

dylan604 ◴[22 May 25 13:58 UTC] No.44062122{3}[source]▶

>>44062041 #

Maybe you're not familiar with how long GOP encoding works with IPB frames? If all frames were I-frames, maybe what you're thinking might work. Everything you need is in the one frame to be able to describe every single pixel in that frame. Once you start using P-frames, you have to hold on to data from the I-frame to decode the P-frame. With B-frames, you might need data from frames not yet decoded as the are bi-direction references.

replies(1): >>44063338 #

lubesGordi ◴[22 May 25 15:59 UTC] No.44063338{4}[source]▶

>>44062122 #

Still you don't necessarily need to have dynamic memory allocations if the number of deltas you have is bounded. In some codecs I could definitely see those having a varying size depending on the amount of change going on in the scene.

I'm not a codec developer, I'm only coming at this from an outside/intuitive perspective. Generally, performance concerned parties want to minimize heap allocations, so I'm interested in this as how it applies in codec architecture. Codecs seem so complex to me, with so much inscrutable shit going on, but then heap allocations aren't optimized out? Seems like there has to be a very good reason for this.

replies(2): >>44067703 #>>44067947 #

1. izacus ◴[22 May 25 22:42 UTC] No.44067947{5}[source]▶

>>44063338 #

You're actually right about allocation - most video codecs are written with hardware decoders in mind which have fixed memory size. This is why their profiles hard limit the memory constraints needed for decode - resolution, number of reference frames, etc.

That's not quite the case for encoding - that's where things get murky since you have way more freedom at what you can do to compress better.

↑