←back to thread

72 points indulona | 9 comments | | HN request time: 1.039s | source | bottom

I am working on a website that has video hosting capability. Users can upload video files and i will generate multiple versions with different qualities or just audio, thumbnails and things like that.

I have chosen the mp4 container because of how widely supported it is. To prevent users having to fetch whole files, i use the fast start option, where the container's metadata is written at the beginning of the file, instead of at the end.

Next, I have picked h264 codec because of how widely supported it is. VP8/VP9/AV1/x265/x266 are certainly better but the h264 software encoding is often beating hardware encoding due to highly optimized and time-proven code and supported hardware. And the uploaded videos are already compressed, users won't be uploading some 8k raw videos where most advanced codes would be useful for preserving "quality".

For audio, i have picked opus codec. Seems like good value over others. Not much else to add.

I run the ffmpeg to convert video with command like this:

ffmpeg -hide_banner -loglevel error -i input.mp4 -g 52 -c:v h264 -maxrate:v vbr -bufsize vbr -s HxW -c:a libopus -af aformat=channel_layouts=7.1|5.1|stereo -maxrate:a abr -ar 48000 -ac 2 -f mp4 -movflags faststart -map 0:v:0 -map 0:a:0 output.mp4

where vbr is video bitrate like 1024k(1mbps), abr is audio bitrate like 190k and HxW is video dimensions in case of resizing.

I wonder how are folks that handle video encoding process and generate their videos?

How did you pick your settings, what issues have you encountered and any tips you can share are certainly appreciated.

Quite a niche segment when it comes to operations and not being merely consumer/customer.

Show context
visualblind ◴[] No.41056046[source]
Video codec transcoding is very CPU resource expensive. If you do a lot of it, you should be looking into doing hardware-accelerated transcoding. https://trac.ffmpeg.org/wiki/HWAccelIntro

My ffmpeg how-to/examples/scratchfile can be viewed here: https://paste.travisflix.com/?ce12a91f222cc3d7#BQPKtw6sEs9cE...

replies(2): >>41056080 #>>41056126 #
1. izacus ◴[] No.41056080[source]
Hardware video encoders all - even in 2024 - produce significantly worse quality at the same filesize.

They're made to be realtime, but for any kind of delayed playback where there's time to encode, software encoders win without any kind of effort. For web delivery especially, hw encoders have no business being used because quality per expended bandwidth is paramount and costs money.

replies(3): >>41056115 #>>41056142 #>>41056295 #
2. visualblind ◴[] No.41056115[source]
I agree with your statement mostly. However I think if we're dealing with 1080p displays, hardware-accelerated transcoding produces acceptable video quality for non-4k+ watching. That's just my 2 cents.
replies(1): >>41056269 #
3. rahimnathwani ◴[] No.41056142[source]
IIRC both x264 and nvenc have multiple profiles for the tradeoff between quality and computing power.

For your comparison, are assuming that the objective is best quality, e.g. that you'd accept 10x the computation even if it gave only a 2% quality improvement?

(I can see how this could make sense, if you're encoding a file once and it will be viewed many times. But I could imagine other situations, e.g. where most files are viewed once or never, and only a few files are very popular.)

replies(1): >>41056215 #
4. izacus ◴[] No.41056215[source]
Having profiles doesn't really change the fact that even Ampere generation encoding block at slowest profile won't come close to visual quality at same output bitrate to x264s slow+ profiles (and we're not even touching on H.265/AV1 here).

> For your comparison, are assuming that the objective is best quality, e.g. that you'd accept 10x the computation even if it gave only a 2% quality improvement?

The difference is more like 150% encoding time for half the filesize at same SSIM - depending on configuration and video type of course. And that ignores the fact that a server machine with 64-core Threadripper or equivalent can handle parallel encoding of many more videos at massively lower dollar cost than using nvenc. Especially at current GPU prices and GPU power consumption.

There's a reason why all online services are encoding in software (usually with x264 & co.) for mainstream most used profiles (that is, SD/HD, many also for 4K).

It just doesn't make sense from product quality, user experience or financial perspective. It only makes sense if you never check the results of your production.

5. izacus ◴[] No.41056269[source]
Can you explain your point there though? It won't change the fact that you could deliver the same visual quality at significantly reduced bandiwdth cost FOR YOU and for the user. You're making user experience worse (e.g. on less stable 4G/5G links and limited connections), paying more for your egress bandwidth for what? Using an expensive GPU block (with most of its cores idling) to encode slightly faster than a cheaper CPU core will do?

It just doesn't make sense to save a few minutes of encoding time (a one time operation) when that costs you and users money every single time that file is streamed.

replies(1): >>41056378 #
6. GordonS ◴[] No.41056295[source]
I found this recently too - encoding my video using either AMD or Nvidia hardware encoding resulted in poor quality. But what's the reason for this?
replies(2): >>41056389 #>>41059846 #
7. visualblind ◴[] No.41056378{3}[source]
> Using an expensive GPU block (with most of its cores idling) to encode slightly faster than a cheaper CPU core will do?

Slightly faster? Sir, I do think you are mistaken.

There are other variables go that into it such as choosing between constant or variable bitrates & variable or constant frame rate, and profiles obviously. There's a crapload of other directives that you can tweak as well.

I see what you mean though, you're right if you have CPU with lot of cores/threads/horsepower then you should use that instead of hardware-accel transcoding. I don't.

8. amelius ◴[] No.41056389[source]
Probably because they're optimized for low-latency.

Anyway, by hardware encoders people typically mean dedicated hardware.

9. izacus ◴[] No.41059846[source]
In short - on the CPU the encoders spend more time looking for similarities between frames and have access to more frames of history to find repetitions. Finding similarities is where all the compression comes from.

The hardware encoders are designed to be fast (at least realtime), small (in component count) and power efficient so they usually don't have the ability to do extensive searches for similarities in frames to really squeeze out the best compression. Pretty much all of them are also largely fixed in supported operations during encode so you can't really implement better algorithms with software updates on them.

Making them better would mean growing their (sillicon) size which would make the graphics cards more expensive to produce and take space away from actual GPU cores. Most customers don't buy a GeForce for its video encoder. And once you'd grow them enough, you'd get something similar to a CPU anyway.

And this is how we get back full circle - even enterprise "encoder-in-a-box" providers have lately gone from boxes with ASICs in them to essentially seling proprietary servers with normal CPUs and proprietary OS on them. With the current prices of 64-core+ CPUs, it just doesn't make much sense to design ASICs and HW encoding blocks for these types of encodes.

Realtime encoding is, of course, another game.