Fly.io Postgres cluster down for 3 days, no word from them about it

(webcache.googleusercontent.com)

797 points burnerbob | 3 comments | 20 Jul 23 23:42 UTC | HN request time: 0s | source

Show context

spiderice ◴[21 Jul 23 03:25 UTC] No.36809650[source]▶

There is now a response to the support thread from Fly[1]:

> Hi Folks,

> Just wanted to provide some more details on what happened here, both with the thread and the host issue.

> The radio silence in this thread wasn’t intentional, and I’m sorry if it seemed that way. While we check the forum regularly, sometimes topics get missed. Unfortunately this thread one slipped by us until today, when someone saw it and flagged it internally. If we’d seen it earlier, we’d have offered more details the.

> More on what happened: We had a single host in the syd region go down, hard, with multiple issues. In short, the host required a restart, then refused to come back online cleanly. Once back online, it refused to connect with our service discovery system. Ultimately it required a significant amount of manual work to recover.

> Apps running multiple instances would have seen the instance on this host go unreachable, but other instances would have remained up and new instances could be added. Single instance apps on this host were unreachable for the duration of the outage. We strongly recommend running multiple instances to mitigate the impact of single-host failures like this.

> The main status page (status.fly.io) is used for global and regional outages. For single host issues like this one we post alerts on the status tab in the dashboard (the emergency maintenance message @south-paw posted). This was an abnormally long single-host failure and we’re reassessing how these longer-lasting single-host outages are communicated.

> It sucks to feel ignored when you’re having issues, even when it’s not intentional. Sorry we didn’t catch this thread sooner.

[1] https://community.fly.io/t/service-interruption-cant-destroy...

replies(10): >>36809693 #>>36809725 #>>36809824 #>>36809928 #>>36810269 #>>36810740 #>>36811025 #>>36812597 #>>36812956 #>>36813681 #

mrcwinn ◴[21 Jul 23 03:39 UTC] No.36809725[source]▶

>>36809650 #

For what it’s worth, I left Fly because of this crap. At first my Fly machine web app had intermittent connection issues to a new production PG machine. Then my PG machine died. Hard. I lost all data. A restart didn’t work - it could not recover. I restored an older backup over at RDS and couldn’t be happier I left.

replies(5): >>36809880 #>>36810018 #>>36810039 #>>36810724 #>>36814012 #

steve_adams_86 ◴[21 Jul 23 04:07 UTC] No.36809880[source]▶

>>36809725 #

I left digitalocean for fly because some of their tooling was excellent. I was pretty excited.

I’m back on digitalocean now. I’m not unhappy about it, they’re very solid. I don’t love some things about their services, but overall I’d highly recommend them to other developers.

I gave up on fly because I’d spontaneously be unable to automate deployments due to limited resources. Or I’d have previously happy deployments go missing with no automatic recovery. I didn’t realize this was happening to a number of my services until I started monitoring with 3rd party tools, and it became evident that I really couldn’t rely on them.

It’s a shame because I do like a lot of other things about them. Even for hobby work it didn’t seem worth the trouble. With digitalocean, everything “just works”. There’s no free tier, but the lower end of pricing means I can run several Go apps off of the same droplet for less than the price of a latte. It’s worth the sanity.

replies(4): >>36810127 #>>36810379 #>>36813660 #>>36813890 #

NicoJuicy ◴[21 Jul 23 05:44 UTC] No.36810379[source]▶

>>36809880 #

I moved from DO to Hetzner ( cheaper), I am happy about it.

replies(7): >>36810595 #>>36810697 #>>36810760 #>>36810809 #>>36810954 #>>36812172 #>>36813077 #

YetAnotherNick ◴[21 Jul 23 06:50 UTC] No.36810760[source]▶

>>36810379 #

Does anyone know how Hetzner pricing is half of DO yet is profitable, while DO is loss making with 6% operating margin?

replies(6): >>36810793 #>>36811511 #>>36811571 #>>36811650 #>>36812116 #>>36812917 #

devjab ◴[21 Jul 23 09:04 UTC] No.36811650[source]▶

>>36810760 #

They run their own data centres and have for a while. There is a pretty big industry for that sort of thing as an alternative to “the cloud” here in Europe.

We used to use nianet to house our hardware in Denmark. Basically these companies does hardware renting and they also do hardware renting with more steps which is where you rent rack space but own the hardware. They provide the place for the hardware and they also have multiple locations so that you have both backup and redundancy, and while it doesn’t scale globally in 20 years I’ve literally never worked on anything that needed to beyond having some buffer caches for clients logging in on their vacations or something like that.

What Hetzner seems to be doing with the DO styled hosting, and this is just a guess, is that they are one or the many EU companies preparing for the big EU exodus from the non-EU cloud. Which is frankly a solid bet these days where both AWS and Azure are increasing prices and are becoming more and more unusable because of EU legislation. Part of this is privacy which Microsoft and Amazon are great with in terms of compliance, but part of it is also national security. I work in an investment bank that builds solar plants, since finance and energy are both critical sectors we risk being told that half of the finance/energy companies in the world can’t use Microsoft because the EU seems it as a single point of failure if our entire energy sector relies on Azure. Which is sort of reasonable right? But what this means for us is that we can’t vendor lock-in, not really, because we need to have up-to-date exit strategies for how we plan on being fully operation a month after leaving Azure. Which is easy when you just containerise everything and run it in VMs or similar, and really annoying if you go full in on things like AKS. Which doesn’t help our Azure costs.

Anyway, right now we are planning on leaving Azure because of cost. Not today, not next week but sometime in the next 5-10 years and a lot of these EU cloud alternatives that actually operate the hardware instead of renting it are likely going to be a very realistic alternative. And that is the private sector, I spend time in the EU public sector which is a massive amount of money and I’m guessing it’ll leave both AWS and Azure by 2050. Some of these EU cloud initiatives is going to explode when that happens, and right now, hetzner is one of the best bets.

To get back to your question, DO rents server space. I have no idea where they’d rent it in Germany but they could potentially be renting it from Hetzner.

replies(3): >>36811770 #>>36811838 #>>36813024 #

1. abwizz ◴[21 Jul 23 09:25 UTC] No.36811770[source]▶

>>36811650 #

commendable to plan a few years ahead, but betting on the state of cloud business 26years from now seems a bit over the top

replies(2): >>36812142 #>>36815807 #

2. detourdog ◴[21 Jul 23 10:26 UTC] No.36812142[source]▶

>>36811770 (TP) #

I think multi-national energy sector should be working toward the goals without the regulations. The more prep done before the change the smoother the transition.

3. devjab ◴[21 Jul 23 16:26 UTC] No.36815807[source]▶

>>36811770 (TP) #

I think you might misunderstand me. The 2050 is a guesstimate and it's just my opinion on the matter. As far as planning ahead goes, you plan for 5-10 years when you try to figure out where to "iron" your enterprise IT. This is because that's how long your hardware will last if you go the route of renting rack space with your own hardware. I think we tend to plan for 8 years, with some space for "unintended" early failures on things like controllers after 4 years. So while you can contract big-cloud vendors for shorter, I think ours is on 3 year contracts right now, you still sort of do the business case for much longer. Maybe not every 3 years, but at least every 6 years.

You do the same on the other side of the table. Companies like Hetzner knows that EU cloud sollutions are likely to see growth, so it's only natural that they invest in the tech to put themselves in a prime position to jump on the opportunity. Selling a good product while you do so is the way I would do it personally, but you also have EU cloud initiatives backed by VC money going straight for the endgame.

↑