WebSockets cost us $1M on our AWS bill

(www.recall.ai)

362 points tosh | 3 comments | 06 Nov 24 18:50 UTC | HN request time: 0.668s | source

Show context

cogman10 ◴[06 Nov 24 20:07 UTC] No.42068537[source]▶

>>42067275 (OP) #

This is such a weird way to do things.

Here they have a nicely compressed stream of video data, so they take that stream and... decode it. But they aren't processing the decoded data at the source of the decode, so instead they forward that decoded data, uncompressed(!!), to a different location for processing. Surprisingly, they find out that moving uncompressed video data from one location to another is expensive. So, they compress it later (Don't worry, using a GPU!)

At so many levels this is just WTF. Why not forward the compressed video stream? Why not decompress it where you are processing it instead of in the browser? Why are you writing it without any attempt at compression? Even if you want lossless compression there are well known and fast algorithms like flv1 for that purpose.

Just weird.

replies(5): >>42068622 #>>42068813 #>>42069256 #>>42070259 #>>42074207 #

bri3d ◴[06 Nov 24 22:10 UTC] No.42070259[source]▶

>>42068537 #

I think the issue with compression is that they're scraping the online meeting services rather than actually reverse engineering them, so the compressed video stream is hidden inside some kind of black box.

I'm pretty sure that feeding the browser an emulated hardware decoder (ie - write a VAAPI module that just copies compressed frame data for you) would be a good semi-universal solution to this, since I don't think most video chat solutions use DRM like Widevine, but it's not as universal as dumping the framebuffer output off of a browser session.

They could also of course one-off reverse each meeting service to get at the backing stream.

What's odd to me is that even with this frame buffer approach, why would you not just recompress the video at the edge? You could even do it in Javascript with WebCodecs if that was the layer you were living at. Even semi-expensive compression on a modern CPU is going to be way cheaper than copying raw video frames, even just in terms of CPU instruction throughput vs memory bandwidth with shared memory.

It's easy to cast stones, but this is a weird architecture and making this blog post about the "solution" is even stranger to me.

replies(1): >>42071222 #

1. cogman10 ◴[06 Nov 24 23:31 UTC] No.42071222[source]▶

>>42070259 #

> I think the issue with compression is that they're scraping the online meeting services rather than actually reverse engineering them, so the compressed video stream is hidden inside some kind of black box.

I mean, I would presume that the entire reason they forked chrome was to crowbar open the black box to get at the goodies. Maybe they only did it to get a framebuffer output stream that they could redirect? Seems a bit much.

Their current approach is what I'd think would be a temporary solution while they reverse engineer the streams (or even get partnerships with the likes of MS and others. MS in particular would likely jump at an opportunity to AI something).

> What's odd to me is that even with this frame buffer approach, why would you not just recompress the video at the edge? You could even do it in Javascript with WebCodecs if that was the layer you were living at. Even semi-expensive compression on a modern CPU is going to be way cheaper than copying raw video frames, even just in terms of CPU instruction throughput vs memory bandwidth with shared memory.

Yeah, that was my final comment. Even if I grant that this really is the best way to do things, I can't for the life of me understand why they'd not immediately recompress. Video takes such a huge amount of bandwidth that it's just silly to send around bitmaps.

> It's easy to cast stones, but this is a weird architecture and making this blog post about the "solution" is even stranger to me.

Agreed. Sounds like a company that likely has multiple million dollar savings just lying around.

replies(1): >>42074236 #

2. dmazzoni ◴[07 Nov 24 07:04 UTC] No.42074236[source]▶

>>42071222 (TP) #

> Their current approach is what I'd think would be a temporary solution while they reverse engineer the streams (or even get partnerships with the likes of MS and others. MS in particular would likely jump at an opportunity to AI something).

They support 7 meeting platforms. Even if 1 or 2 are open to providing APIs, they're not all going to do that.

Reverse-engineering the protocol would be far more efficient, yes - but it'd also be more brittle. The protocol could change at any time and reverse-engineering it again could days between days and weeks. Would you want a product with that sort of downtime?

Also, does it scale? Reverse-engineering 7+ protocols is a lot of engineering work, and it's very specialized work that not any software engineer could just dive into quickly.

In comparison, writing web scrapers to find the video element for 7 different meeting products is super easy to write, and super easy to fix.

replies(1): >>42074840 #

3. lostmsu ◴[07 Nov 24 08:42 UTC] No.42074840[source]▶

>>42074236 #

If they forked Chromium, they should have direct access to compressed stream of a particular video element without much fuss.

↑