←back to thread

548 points CharlesW | 3 comments | | HN request time: 0s | source
Show context
crazygringo ◴[] No.46155545[source]
Wow. To me, the big news here is that ~30% of devices now support AV1 hardware decoding. The article lists a bunch of examples of devices that have gained it in the past few years. I had no idea it was getting that popular -- fantastic news!

So now that h.264, h.265, and AV1 seem to be the three major codecs with hardware support, I wonder what will be the next one?

replies(10): >>46155569 #>>46155586 #>>46155655 #>>46155703 #>>46155917 #>>46156224 #>>46157698 #>>46158084 #>>46159331 #>>46176563 #
dehrmann ◴[] No.46155655[source]
Not trolling, but I'd bet something that's augmented with generative AI. Not to the level of describing scenes with words, but context-aware interpolation.
replies(5): >>46155704 #>>46155769 #>>46158043 #>>46159342 #>>46161542 #
mort96 ◴[] No.46159342[source]
I don't want my video decoder inventing details which aren't there. I much rather want obvious compression artifacts than a codec where the "compression artifacts" look like perfectly realistic, high-quality hallucinated details.
replies(2): >>46160068 #>>46169643 #
cubefox ◴[] No.46160068[source]
In case of many textures (grass, sand, hair, skin etc) it makes little difference whether the high frequency details are reproduced exactly or hallucinated. E.g. it doesn't matter whether the 1262nd blade of grass from the left side is bending to the left or to the right.
replies(1): >>46160775 #
mort96 ◴[] No.46160775[source]
And in the case of many others, it makes a very significant difference. And a codec doesn't have enough information to know.

Imagine a criminal investigation. A witness happened to take a video as the perpetrator did the crime. In the video, you can clearly see a recognizable detail on the perpetrator's body in high quality; a birthmark perhaps. This rules out the main suspect -- but can we trust that the birthmark actually exists and isn't hallucinated? Would a non-AI codec have just showed a clearly compression-artifact-looking blob of pixels which can't be determined one way or the other? Or would a non-AI codec have contained actual image data of the birth mark in sufficient detail?

Using AI to introduce realistic-looking details where there was none before (which is what your proposed AI codec inherently does) should never happen automatically.

replies(4): >>46161601 #>>46161613 #>>46162169 #>>46162464 #
1. mapt ◴[] No.46162169{3}[source]
> a codec doesn't have enough information to know.

The material belief is that modern trained neural network methods that improve on ten generations of variations of the discrete cosine transform and wavelets, can bring a codec from "1% of knowing" to "5% of knowing". This is broadly useful. The level of abstraction does not need to be "The AI told the decoder to put a finger here", it may be "The AI told the decoder how to terminate the wrinkle on a finger here". An AI detail overlay. As we go from 1080p to 4K to 8K and beyond we care less and less about individual small-scale details being 100% correct, and there are representative elements that existing techniques are just really bad at squeezing into higher compression ratios.

I don't claim that it's ideal, and the initial results left a lot to be desired in gaming (where latency and prediction is a Hard Problem), but AI upscaling is already routinely used for scene rips of older videos (from the VHS Age or the DVD Age), and it's clearly going to happen inside of a codec sooner or later.

replies(1): >>46162335 #
2. mort96 ◴[] No.46162335[source]
I'm not saying it's not going to happen. I'm saying it's a terrible idea.

AI upscaling built in to video players isn't a problem, as long as you can view the source data by disabling AI upscaling. The human is in control.

AI upscaling and detail hallucination built in to video codecs is a problem.

replies(1): >>46162459 #
3. mapt ◴[] No.46162459[source]
The entire job of a codec is subjectively authentic, but lossy compression. AI is our best and in some ways easiest method of lossy compression. All lossy compression produces artifacts; JPEG macroblocks are effectively a hallucination, albeit one that is immediately identifiable because it fails to simulate anything else we're familiar with.

AI compression doesn't have to be the level of compression that exists in image generation prompts, though. A SORA prompt might be 500 bits (~1 bit per character natural English), while a decompressed 4K frame that you're trying to bring to 16K level of simulated detail starts out at 199 million bits. It can be a much finer level of compression.