WebSockets cost us $1M on our AWS bill

1. cosmotic ◴[06 Nov 24 19:25 UTC] No.42067844[source]▶

>>42067275 (OP) #

Why decode to then turn around and re-encode?

replies(3): >>42068029 #>>42068118 #>>42068185 #

2. ketzo ◴[06 Nov 24 19:36 UTC] No.42068029[source]▶

>>42067844 (TP) #

I had the same question, but I imagine that the "media pipeline" box with a line that goes directly from "compositor" to "encoder" is probably hiding quite a lot of complexity

Recall's offering allows you to get "audio, video, transcripts, and metadata" from video calls -- again, total conjecture, but I imagine they do need to decode into raw format in order to split out all these end-products (and then re-encode for a video recording specifically.)

3. pavlov ◴[06 Nov 24 19:42 UTC] No.42068118[source]▶

>>42067844 (TP) #

Reading their product page, it seems like Recall captures meetings on whatever platform their customers are using: Zoom, Teams, Google Meet, etc.

Since they don't have API access to all these platforms, the best they can do to capture the A/V streams is simply to join the meeting in a headless browser on a server, then capture the browser's output and re-encode it.

replies(1): >>42068689 #

4. Szpadel ◴[06 Nov 24 19:45 UTC] No.42068185[source]▶

>>42067844 (TP) #

my guess is either that video they get use some proprietary encoding format (js might do some magic on the feed) or it's because it's latency optimized stream that consumes a lot of bandwidth

5. MrBuddyCasino ◴[06 Nov 24 20:18 UTC] No.42068689[source]▶

>>42068118 #

They‘re already hacking Chromium. If the compressed video data is unavailable in JS, they could change that instead.

replies(2): >>42068828 #>>42069540 #

6. moogly ◴[06 Nov 24 20:28 UTC] No.42068828{3}[source]▶

>>42068689 #

They did what every other startup does: put the PoC in production.

7. pavlov ◴[06 Nov 24 21:17 UTC] No.42069540{3}[source]▶

>>42068689 #

If you want to support every meeting platform, you can’t really make any assumptions about the data format.

To my knowledge, Zoom’s web client uses a custom codec delivered inside a WASM blob. How would you capture that video data to forward it to your recording system? How do you decode it later?

Even if the incoming streams are in a standard format, compositing the meeting as a post-processing operation from raw recorded tracks isn’t simple. Video call participants have gaps and network issues and layer changes, you can’t assume much anything about the samples as you would with typical video files. (Coincidentally this is exactly what I’m working on right now at my job.)

replies(1): >>42072925 #

8. cosmotic ◴[07 Nov 24 03:04 UTC] No.42072925{4}[source]▶

>>42069540 #

At some point, I'd hope the result of zooms code quickly becomes something that can be hardware decoded. Otherwise the CPU, battery consumption, and energy usage are going to be through the roof.

replies(1): >>42074896 #

9. pavlov ◴[07 Nov 24 08:51 UTC] No.42074896{5}[source]▶

>>42072925 #

The most common video conferencing codec on WebRTC is VP8, which is not hardware decoded either almost anywhere. Zoom’s own codec must be an efficiency improvement over VP8, which is best described as patent-free leftovers from the back of the fridge.

Hardware decoding works best when you have a single stable high bitrate stream with predictable keyframes — something like a 4K video player.

Video meetings are not like that. You may have a dozen participant streams, and most of them are suffering from packet loss. Lots of decoder context switching and messy samples is not ideal for typical hardware decoders.

replies(1): >>42077155 #

10. MrBuddyCasino ◴[07 Nov 24 14:55 UTC] No.42077155{6}[source]▶

>>42074896 #

This makes sense. I find it curious that a WASM codec could be competitive with something that is presumably decoded natively. I know Teams is a CPU hog, but I don't remember Zoom being one.