Improving performance of rav1d video decoder

(ohadravid.github.io)

305 points todsacerdoti | 1 comments | 22 May 25 11:59 UTC | HN request time: 0s | source

Show context

tialaramex ◴[22 May 25 13:07 UTC] No.44061686[source]▶

All being equal codecs ought to be in WUFFS† rather than Rust, but I can well imagine that it's a much bigger lift to take something as complicated as dav1d and write the analogous WUFFS than to clean up the c2rust translation, if you said a thousand times harder I'd have no trouble believing that. I just think it's worth it for us as a civilisation.

† Or an equivalent special purpose language, but WUFFS is right there

replies(1): >>44061961 #

IgorPartola ◴[22 May 25 13:40 UTC] No.44061961[source]▶

>>44061686 #

WUFFS would be great for parsing container files (Matroska, webm, mp4) but it does not seem at all suitable for a video decoder. Without dynamic memory allocation it would be challenging to deal with dynamic data. Video codecs are not simply parsing a file to get the data, they require quite a bit of very dynamic state to be managed.

replies(1): >>44062041 #

lubesGordi ◴[22 May 25 13:49 UTC] No.44062041[source]▶

>>44061961 #

Requiring dynamic state seems not obvious to me. At the end of the day you have a fixed number of pixels on the screen. If every single pixel changes from frame to frame that should constitute the most work your codec has to do, no? I'm not a codec writer but that's my intuition based on the assumption that codecs are basically designed to minimize the amount of 'work' being done from frame to frame.

replies(5): >>44062055 #>>44062122 #>>44062124 #>>44062182 #>>44063139 #

IgorPartola ◴[22 May 25 14:04 UTC] No.44062182{3}[source]▶

>>44062041 #

If you are doing something like a GIF or an MJPEG, sure. If you are doing forwards and backwards keyframes with a variable amount of deltas in between, with motion estimation, with grain generation, you start having a very dynamic amount of state. Granted, encoders are more complex than decoders in some of this. But still you might need to decode between 1 and N frames to get the frame you want, and you don't know how much memory it will consume once it is decoded unless you decode it into bitmaps (at 4k that would be over 8MB per frame which very quickly runs out of memory for you if you want any sort of frame buffer present).

I suspect the future of video compression will also include frame generation, like what is currently being done for video games. Essentially you have let's say 12 fps video but your video card can fill in the intermediate frames via what is basically generative AI so you get 120 fps output with smooth motion. I imagine that will never be something that WUFFS is best suited for.

replies(3): >>44062920 #>>44063296 #>>44063827 #

1. lubesGordi ◴[22 May 25 15:54 UTC] No.44063296{4}[source]▶

>>44062182 #

See this is interesting to me. I understand the desire to dynamically allocate buffers at runtime to capture variable size deltas. That's cool, but also still maybe technically unnecessary? Because like you say, at 4k and over 8MB per frame; you still can't allocate over a limit. So likely a codec would have some boundary set on that anyway. Why not just pre-allocate at compile time? For sure this results in a complex data structure. Functionally it could be the same and we would elide the cost of dynamic memory allocations. What I'm suggesting is probably complex, I'm sure.

In any case I get what you're saying and I understand why codecs are going to be dynamically allocating memory, so thanks for that.

↑