WebSockets cost us $1M on our AWS bill

(www.recall.ai)

362 points tosh | 1 comments | 06 Nov 24 18:50 UTC | HN request time: 0.406s | source

Show context

sfink ◴[06 Nov 24 21:40 UTC] No.42069879[source]▶

>>42067275 (OP) #

...and this is why I will never start a successful business.

The initial approach was shipping raw video over a WebSocket. I could not imagine putting something like that together and selling it. When your first computer came with 64KB in your entire machine, some of which you can't use at all and some you can't use without bank switching tricks, it's really really hard to even conceive of that architecture as a possibility. It's a testament to the power of today's hardware that it worked at all.

And yet, it did work, and it served as the basis for a successful product. They presumably made money from it. The inefficiency sounds like it didn't get in the way of developing and iterating on the rest of the product.

I can't do it. Premature optimization may be the root of all evil, but I can't work without having some sense for how much data is involved and how much moving or copying is happening to it. That sense would make me immediately reject that approach. I'd go off over-architecting something else before launching, and somebody would get impatient and want their money back.

replies(2): >>42071574 #>>42074178 #

dmazzoni ◴[07 Nov 24 06:54 UTC] No.42074178[source]▶

>>42069879 #

The initial approach was shipping raw video over a WebSocket...between two processes running on the same machine.

That doesn't sound like a ridiculous idea to me. How else would you get video data out of a sandboxed Chromium process?

replies(1): >>42080703 #

1. sfink ◴[07 Nov 24 20:38 UTC] No.42080703[source]▶

>>42074178 #

Short answer: raw video is big.

With my mindset, you have a gigantic chunk of data. Especially if you're recording multiple streams per machine. The immediate thought is that you want to avoid copying as much as possible. If you really, really have to, you can copy it once. Maybe even twice, though before moving from 1 to 2 copies you should spend some time thinking about whether it's possible to move from 1 to 0, or never materializing the full data at all (i.e., keep it compressed, which could apply here but only as an optimization for certain video applications and so is irrelevant to the bootstrapping phase).

WebSockets take your giant chunk of data and squeeze it through a straw. How many times does each byte get copied in the process? I don't know, but probably more than twice. Even worse, it's going to process it in chunks, so you're going to have per-chunk overhead (maybe including a context switch?) that is O(number of chunks in a giant data set).

But the application fundamentally requires squishing that giant data back down again, which immediately implies moving the computation to the data. I would want to experiment with a wasm-compiled video compressor (remember, we already have the no GPU constraint, so it's ok to light the CPU on fire), and then get compressed video out of the sandbox. WebSockets don't seem unreasonable for that -- they probably cost a factor of 2-4 over the raw data size, but once you've gained an order of magnitude from the compression, that's in the land of engineering tradeoffs. The bigger concern is dropping frames by combining the frame generation and reading with the compression, though I think you could probably use a Web Worker and SharedArrayBuffers to put those on different cores.

But I'm wrong. The data isn't so large that the brute force approach wouldn't work at all. My version would take longer to get up and running, which means they couldn't move on to the rest of the system.

↑