WebSockets cost us $1M on our AWS bill

1. h4ck_th3_pl4n3t ◴[07 Nov 24 04:37 UTC] No.42073457[source]▶

>>42067275 (OP) #

The problem is not on network level.

The problem is that the developers behind this way of streaming video data seem to have no idea of how video codecs work.

If they are in control of the headless chromium instances, the video streams, and the receiving backend of that video stream...why not simply use RDP or a similar video streaming protocol that is made exactly for this purpose?

This whole post reads like an article from a web dev that is totally over their head, trying to implement something that they didn't take the time to even think about. Arguing with TCP fragmentation when that is not even an issue, and trying to use a TCP stream when that is literally the worst thing you can do in that situation because of roundtrip costs.

But I guess that there is no JS API for that, so it's outside the development scope? Can't imagine any reason not to use a much more efficient video codec here other than this running in node.js, potentially missing offscreen canvas/buffer APIs and C encoding libraries that you could use for that.

I would not want to work at this company, if this is how they develop software. Must be horribly rushed prototypical code, everywhere.

replies(2): >>42073690 #>>42074152 #

2. doctorpangloss ◴[07 Nov 24 05:29 UTC] No.42073690[source]▶

>>42073457 (TP) #

It’s alright.

It is difficult to say, I’ve never used the product. They don’t describe what it is they are trying to do.

If you want to pipe a Zoom call to a Python process it’s complicated.

Everything else that uses WebRTC, I suppose Python should generate the candidates, and the fake browser client hands over the Python process’s candidates instead of its own. It could use the most basic bindings to libwebrtc.

If the bulk of their app is JavaScript, they ought to inject a web worker and use encoded transforms.

But I don’t know though.

3. dmazzoni ◴[07 Nov 24 06:50 UTC] No.42074152[source]▶

>>42073457 (TP) #

Their business is joining meetings from 7 different platforms (Zoom, Meet, WebEx, etc.) and capturing the video.

They don't have control of the incoming video format.

They don't even have access to the incoming video data, because they're not using an API. They're joining the meeting using a real browser, and capturing the video.

Is it an ugly hack? Maybe. But it's also a pretty robust one, because they're not dependent on an API that might break or reverse-engineering a protocol that might change. They're a bit dependent on the frontend, but that changes rarely and it's super easy to adapt when it does change.

replies(2): >>42074757 #>>42074913 #

4. lostmsu ◴[07 Nov 24 08:29 UTC] No.42074757[source]▶

>>42074152 #

Even in this case it is non-sensical. Dunno about Linux, but on Windows you'd just feed the GPU window surface into a GPU hardware encoder via a shared texture with basically 0 data transmission, and get a compressed stream out.

5. h4ck_th3_pl4n3t ◴[07 Nov 24 08:53 UTC] No.42074913[source]▶

>>42074152 #

I'm not sure you understood what I meant.

They are in control of the bot server that joins with the headless chrome client. They can use the CDP protocol to use the screencast API to write the recorded video stream to filesystem/disk, and then they can literally just run ffmpeg on that on-disk-on-server file and stream it somewhere else.

But instead they decided to use websockets to send it from that bot client to their own backend API, transmitting the raw pixels as either a raw blob or base64 encoded data, each frame, not encoded anyhow. And that is where the huge waste in bandwidth comes from.

(The article hints to this in a lot of places)

replies(1): >>42081711 #

6. yencabulator ◴[07 Nov 24 22:17 UTC] No.42081711{3}[source]▶

>>42074913 #

They are doing e.g. transcriptions live as the stream happens, not writing a file and batch processing later.