Improving performance of rav1d video decoder

(ohadravid.github.io)

305 points todsacerdoti | 1 comments | 22 May 25 11:59 UTC | HN request time: 0s | source

Show context

tialaramex ◴[22 May 25 13:07 UTC] No.44061686[source]▶

All being equal codecs ought to be in WUFFS† rather than Rust, but I can well imagine that it's a much bigger lift to take something as complicated as dav1d and write the analogous WUFFS than to clean up the c2rust translation, if you said a thousand times harder I'd have no trouble believing that. I just think it's worth it for us as a civilisation.

† Or an equivalent special purpose language, but WUFFS is right there

replies(1): >>44061961 #

IgorPartola ◴[22 May 25 13:40 UTC] No.44061961[source]▶

>>44061686 #

WUFFS would be great for parsing container files (Matroska, webm, mp4) but it does not seem at all suitable for a video decoder. Without dynamic memory allocation it would be challenging to deal with dynamic data. Video codecs are not simply parsing a file to get the data, they require quite a bit of very dynamic state to be managed.

replies(1): >>44062041 #

lubesGordi ◴[22 May 25 13:49 UTC] No.44062041[source]▶

>>44061961 #

Requiring dynamic state seems not obvious to me. At the end of the day you have a fixed number of pixels on the screen. If every single pixel changes from frame to frame that should constitute the most work your codec has to do, no? I'm not a codec writer but that's my intuition based on the assumption that codecs are basically designed to minimize the amount of 'work' being done from frame to frame.

replies(5): >>44062055 #>>44062122 #>>44062124 #>>44062182 #>>44063139 #

IgorPartola ◴[22 May 25 14:04 UTC] No.44062182{3}[source]▶

>>44062041 #

If you are doing something like a GIF or an MJPEG, sure. If you are doing forwards and backwards keyframes with a variable amount of deltas in between, with motion estimation, with grain generation, you start having a very dynamic amount of state. Granted, encoders are more complex than decoders in some of this. But still you might need to decode between 1 and N frames to get the frame you want, and you don't know how much memory it will consume once it is decoded unless you decode it into bitmaps (at 4k that would be over 8MB per frame which very quickly runs out of memory for you if you want any sort of frame buffer present).

I suspect the future of video compression will also include frame generation, like what is currently being done for video games. Essentially you have let's say 12 fps video but your video card can fill in the intermediate frames via what is basically generative AI so you get 120 fps output with smooth motion. I imagine that will never be something that WUFFS is best suited for.

replies(3): >>44062920 #>>44063296 #>>44063827 #

1. GuB-42 ◴[22 May 25 16:44 UTC] No.44063827{4}[source]▶

>>44062182 #

> I suspect the future of video compression will also include frame generation

That's how most video codecs work already. They try to "guess" what the next frame will be, based on past (for P-frames) and future (for B-frames) frames. The difference is that the codec encodes some metadata to help with the process and also the difference between the predicted frame and the real frame.

As for using AI techniques to improve prediction, it is not a new thing at all. Many algorithms optimized for compression ratio use neural nets, but these tend to be too computationally expensive for general use. In fact the Hutter prize considers text compression as an AI/AGI problem.

↑