←back to thread

797 points burnerbob | 1 comments | | HN request time: 0.279s | source
Show context
throwawaaarrgh ◴[] No.36813314[source]
There's a lot of bullshit in this HN thread, but here's the important takeaway:

- it seems their staff were working on the issue before customers noticed it.

- once paid support was emailed, it took many hours for them to respond.

- it took about 20 hours for an update from them on the downed host.

- they weren't updating their users that were affected about the downed host or ways to recover.

- the status page was bullshit - just said everything was green even though they told customers in their own dashboard they had emergency maintenance going on.

I get that due to the nature of their plans and architecture, downtime like this is guaranteed and normal. But communication this poor is going to lose you customers. Be like other providers, who spam me with emails whenever a host I'm on even feels ticklish. Then at least I can go do something for my own apps immediately.

replies(6): >>36814300 #>>36814376 #>>36814608 #>>36814689 #>>36816612 #>>36817532 #
seti0Cha ◴[] No.36814300[source]
Not a great summary from my perspective. Here's what I got out of it:

- Their free tier support depended on noticing message board activity and they didn't.

- Those experiencing outages were seeing the result of deploying in a non-HA configuration. Opinions differ as to whether they were properly aware that they were in that state.

- They had an unusually long outage for one particular server.

- Those points combined resulted in many people experiencing an unexplained prolonged outage.

- Their dashboard shows only regional and service outages, not individual servers being down. People did not realize this and so assumed it was a lie.

- Some silliness with Discourse tags caused people to think they were trying to hide the problems.

In short, bad luck, some bad procedures from a customer management POV, possibly some bad documentation resulted in a lot of smoke but not a lot of fire.

replies(2): >>36814431 #>>36814471 #
CSSer ◴[] No.36814471[source]
I'm surprised by your risk tolerance. If I had any cloud service at this level in my stack go down for three days, I'd start shopping for an alternative. This exceeds the level of acceptability for me for even non-HA requirements. After all, if I can't trust them for this, why would I ever consider giving them my HA business? Just based on napkin math for us, this could've been a potential loss of nearly half a million dollars. Up until this point, I've looked at Fly.io's approach to PR and their business as unconventional but endearing. Now I'm beginning to look at them as unserious. I'm sorry if that sounds harsh. It's the cold truth.
replies(2): >>36815583 #>>36815910 #
tinco ◴[] No.36815910[source]
I think you're not exposed enough to the reality of hardware. There was no need for the host to come back online at all. I think it was a mistake of Fly.io to even attempt to do it. Just say tell the customer the host was lost and offer them a new one (with a freshly zeroed volume attached). You rent a machine, it breaks, you get a new one.

If they're sad that they lost their data, it's their fault for running on a single host with no backup. By actually performing an (apparently) difficult recovery, they reinforced their customers erroneous expectation that they are somehow responsible for the integrity of the data on any single host.

replies(3): >>36817834 #>>36820919 #>>36824212 #
1. CSSer ◴[] No.36820919[source]
In hindsight I wish I could edit because my above comment was pretty trigger happy and focused overly focused on the amount of downtime. It was colored by some existing preconceptions I had about Fly, and I'm honestly surprised it continues to be upvoted. When I made this comment I hadn't yet learned some of the bits you mentioned here at the end from another thread. Anyway, I tend to agree overall. I actually suggested Fly even reconsider offering this configuration given that they refer to it as a "single-node cluster", which is an oxymoron.