WebSockets cost us $1M on our AWS bill

(www.recall.ai)

362 points tosh | 4 comments | 06 Nov 24 18:50 UTC | HN request time: 0.851s | source

Show context

turtlebits ◴[06 Nov 24 19:58 UTC] No.42068395[source]▶

>>42067275 (OP) #

Is this really an AWS issue? Sounds like you were just burning CPU cycles, which is not AWS related. WebSockets makes it sound like it was a data transfer or API gateway cost.

replies(2): >>42068522 #>>42068890 #

VWWHFSfQ ◴[06 Nov 24 20:06 UTC] No.42068522[source]▶

>>42068395 #

> Is this really an AWS issue?

I doubt they would have even noticed this outrageous cost if they were running on bare-metal Xeons or Ryzen colo'd servers. You can rent real 44-core Xeon servers for like, $250/month.

So yes, it's an AWS issue.

replies(1): >>42068676 #

JackSlateur ◴[06 Nov 24 20:17 UTC] No.42068676[source]▶

>>42068522 #

  You can rent real 44-core Xeon servers for like, $250/month.

Where, for instance ?

replies(3): >>42068729 #>>42068739 #>>42068788 #

Faaak ◴[06 Nov 24 20:21 UTC] No.42068729[source]▶

>>42068676 #

Hetzner for example. An EPYC 48c (96t) goes for 230 euros

replies(2): >>42068782 #>>42076033 #

dilyevsky ◴[06 Nov 24 20:24 UTC] No.42068782[source]▶

>>42068729 #

Hetzner network is complete dog. They also sell you machines that are long should be EOL’ed. No serious business should be using them

replies(3): >>42068965 #>>42069178 #>>42069210 #

dijit ◴[06 Nov 24 20:37 UTC] No.42068965[source]▶

>>42068782 #

What cpu do you think your workload is using on AWS?

GCP exposes their cpu models, and they have some Haswell and Broadwell lithographies in service.

Thats a 10+ year old part, for those paying attention.

replies(2): >>42069283 #>>42069684 #

1. dilyevsky ◴[06 Nov 24 20:58 UTC] No.42069283[source]▶

>>42068965 #

Most of GCP and some AWS instances will migrate to another node when it’s faulty. Also disk is virtual. None of this applies to baremetal hetzner

replies(1): >>42069319 #

2. dijit ◴[06 Nov 24 21:00 UTC] No.42069319[source]▶

>>42069283 (TP) #

Why is that relevant to what I said?

replies(1): >>42069607 #

3. dilyevsky ◴[06 Nov 24 21:21 UTC] No.42069607[source]▶

>>42069319 #

Only relevant if you care about reliability

replies(1): >>42069669 #

4. dijit ◴[06 Nov 24 21:25 UTC] No.42069669{3}[source]▶

>>42069607 #

AWS was working “fine” for about 10 years without live migration, and I’ve had several individual machines running without a reboot or outage for quite literally half a decade. Enough to hit bugs like this: https://support.hpe.com/hpesc/public/docDisplay?docId=a00092...

Anyway, depending on individual nodes to always be up for reliability is incredibly foolhardy. Things can happen, cloud isn't magic, I’ve had instances become unrecoverable. Though it is rare.

So, I still don’t understand the point, that was not exactly relevant to what I said.

↑