Most active commenters
  • (6)
  • steve_adams_86(5)
  • danielvaughn(4)
  • abwizz(4)
  • vasco(3)
  • ushakov(3)
  • dpeckett(3)
  • marcinzm(3)
  • api(3)
  • tptacek(3)

←back to thread

797 points burnerbob | 155 comments | | HN request time: 1.477s | source | bottom
1. spiderice ◴[] No.36809650[source]
There is now a response to the support thread from Fly[1]:

> Hi Folks,

> Just wanted to provide some more details on what happened here, both with the thread and the host issue.

> The radio silence in this thread wasn’t intentional, and I’m sorry if it seemed that way. While we check the forum regularly, sometimes topics get missed. Unfortunately this thread one slipped by us until today, when someone saw it and flagged it internally. If we’d seen it earlier, we’d have offered more details the.

> More on what happened: We had a single host in the syd region go down, hard, with multiple issues. In short, the host required a restart, then refused to come back online cleanly. Once back online, it refused to connect with our service discovery system. Ultimately it required a significant amount of manual work to recover.

> Apps running multiple instances would have seen the instance on this host go unreachable, but other instances would have remained up and new instances could be added. Single instance apps on this host were unreachable for the duration of the outage. We strongly recommend running multiple instances to mitigate the impact of single-host failures like this.

> The main status page (status.fly.io) is used for global and regional outages. For single host issues like this one we post alerts on the status tab in the dashboard (the emergency maintenance message @south-paw posted). This was an abnormally long single-host failure and we’re reassessing how these longer-lasting single-host outages are communicated.

> It sucks to feel ignored when you’re having issues, even when it’s not intentional. Sorry we didn’t catch this thread sooner.

[1] https://community.fly.io/t/service-interruption-cant-destroy...

replies(10): >>36809693 #>>36809725 #>>36809824 #>>36809928 #>>36810269 #>>36810740 #>>36811025 #>>36812597 #>>36812956 #>>36813681 #
2. gowthamgts12 ◴[] No.36809693[source]
> While we check the forum regularly, sometimes topics get missed. Unfortunately this thread one slipped by us until today, when someone saw it and flagged it internally.

If it really got missed, then I don't understand how the thread was made private to only logged-in users?

replies(3): >>36810248 #>>36810251 #>>36810285 #
3. mrcwinn ◴[] No.36809725[source]
For what it’s worth, I left Fly because of this crap. At first my Fly machine web app had intermittent connection issues to a new production PG machine. Then my PG machine died. Hard. I lost all data. A restart didn’t work - it could not recover. I restored an older backup over at RDS and couldn’t be happier I left.
replies(5): >>36809880 #>>36810018 #>>36810039 #>>36810724 #>>36814012 #
4. bongobingo1 ◴[] No.36809824[source]
Seems like the OP should have made a HN thread in the first place instead of posting to community.stri^H^H^H^Hfly.io
replies(2): >>36811005 #>>36811202 #
5. steve_adams_86 ◴[] No.36809880[source]
I left digitalocean for fly because some of their tooling was excellent. I was pretty excited.

I’m back on digitalocean now. I’m not unhappy about it, they’re very solid. I don’t love some things about their services, but overall I’d highly recommend them to other developers.

I gave up on fly because I’d spontaneously be unable to automate deployments due to limited resources. Or I’d have previously happy deployments go missing with no automatic recovery. I didn’t realize this was happening to a number of my services until I started monitoring with 3rd party tools, and it became evident that I really couldn’t rely on them.

It’s a shame because I do like a lot of other things about them. Even for hobby work it didn’t seem worth the trouble. With digitalocean, everything “just works”. There’s no free tier, but the lower end of pricing means I can run several Go apps off of the same droplet for less than the price of a latte. It’s worth the sanity.

replies(4): >>36810127 #>>36810379 #>>36813660 #>>36813890 #
6. emmelaich ◴[] No.36809928[source]
The irony or perhaps the tragedy of building a low friction service is that you have to have experts on the lower level high friction stuff.

I would hope that after a couple of hours downtime, they'd bring up a fresh machine with Ansible or whatever. Hardware or AWS/GCP Vm.

replies(1): >>36810229 #
7. ◴[] No.36810018[source]
8. pier25 ◴[] No.36810039[source]
So you didn't have a HA setup with multiple machines and volumes?
replies(1): >>36810268 #
9. danielvaughn ◴[] No.36810127{3}[source]
I adore DO. They’re seriously underrated. I love how they’ll just give you a server and say here, have at it. No abstractions, no fancy crap, just get out of my way and let me do my thing.
replies(10): >>36810554 #>>36810628 #>>36810638 #>>36812302 #>>36813142 #>>36813668 #>>36814283 #>>36823458 #>>36827607 #>>36834710 #
10. ps ◴[] No.36810229[source]
> I would hope that after a couple of hours downtime, they'd bring up a fresh machine with Ansible or whatever.

It is not just about a fresh machine which hopefully sits in each datacenter. I can imagine they needed the clone of the system due to the design of the fly.io service and that's where the "fun" begins.

11. ◴[] No.36810248[source]
12. p-e-w ◴[] No.36810251[source]
Whoa, what? That's a much bigger red flag than the downtime itself.
replies(1): >>36810657 #
13. nerpderp82 ◴[] No.36810268{3}[source]
Is that even possible on Fly?
replies(3): >>36810569 #>>36810639 #>>36810987 #
14. yla92 ◴[] No.36810269[source]
Is it me or the page is now gone?

"Oops! That page doesn’t exist or is private."

Edit: Ok, I can see after sign up / log in.

15. teraflop ◴[] No.36810285[source]
It looks like all 166 threads with the "App not working" tag are invisible when not logged in. So I'm guessing somebody applied that tag retroactively.

https://community.fly.io/c/questions-and-help/app-not-workin...

EDIT: it now appears that the "app-not-working" tag itself has been deleted, and no longer shows up even when logged in.

replies(2): >>36810603 #>>36810620 #
16. NicoJuicy ◴[] No.36810379{3}[source]
I moved from DO to Hetzner ( cheaper), I am happy about it.
replies(7): >>36810595 #>>36810697 #>>36810760 #>>36810809 #>>36810954 #>>36812172 #>>36813077 #
17. justsid ◴[] No.36810554{4}[source]
I wish I could say the same. My ISP and DO have absolutely terrible peering, unfortunately a lot of our internal stuff is hosted there. It’s always fun to git push/pull with 40kb/s on a gigabit connection.
replies(2): >>36810619 #>>36811718 #
18. fmajid ◴[] No.36810569{4}[source]
He may have been talking about Fly themselves. Certainly having only a single machine to serve a wealthy metropolis of 8 million people seems like amateur hour.
replies(5): >>36810634 #>>36810688 #>>36810993 #>>36811724 #>>36814488 #
19. x86hacker1010 ◴[] No.36810595{4}[source]
Same here
20. buro9 ◴[] No.36810603{3}[source]
This is why companies should not run their own forums. It's cheap support and marketing, it's not really community.
replies(1): >>36810656 #
21. masklinn ◴[] No.36810619{5}[source]
Maybe you could VPN to or proxy through a box with good peering to you and DO?
replies(1): >>36810741 #
22. kipple ◴[] No.36810620{3}[source]
In another comment here, they're saying they just deleted that tag to avoid this access issue — https://news.ycombinator.com/item?id=36810393
replies(1): >>36810752 #
23. JanSt ◴[] No.36810628{4}[source]
I'm using Digital Ocean App platform, which does pretty much everything for me. It's very simple to use. I can run my app as a single developer without caring about infrastructure for 99% of the time.
replies(2): >>36811443 #>>36823542 #
24. ◴[] No.36810634{5}[source]
25. vasco ◴[] No.36810638{4}[source]
Same! I've had my first server there for 10 years now. They added a lot of stuff in the meantime, they have AWS-like things you can do. But in terms of launching a VM that just works, they are a great choice.
replies(1): >>36813291 #
26. ◴[] No.36810639{4}[source]
27. vasco ◴[] No.36810656{4}[source]
I never thought to make friends with people who's only common thing with me is that they shop at the same place. Companies creating a "community" is exactly as you described.
replies(1): >>36811396 #
28. throwawayfly ◴[] No.36810657{3}[source]
Ok as long as we’re getting conspiratorial, something similar I observed has bugged me.

About a year ago fly awarded a few people in the forums, I think it was 3, the “aeronaut” badge. Basically just pointless bling for a “routinely very helpful” person or somesuch. Still, I can imagine it was cool to get it. No, it wasn’t me.

One person I saw with it absolutely deserved it: this person is, to this day, always hopping in and helping people; linking to docs; raising their own issues with a big dose of “fellow builder” understanding and empathy; that sort of person. My own queries typically led me to a thread that this person has answered. In short - the kind of helpful, proactive, high knowledge volunteer early adopter that every community needs - and a handful are blessed to find.

Then one day I saw this same person had offered — to one random newbie with build problems in one of the many HALP threads — a reply like, “maybe Fly isn’t the best option for you. here are some other places that can host an app”.

The thread was left alone and faded, like many when a lost newbie is involved. But 1 day later, I noticed this tireless early adopter no longer had their “aeronaut” badge.

I still refuse to believe my own eyes about something that petty.

replies(2): >>36810885 #>>36810935 #
29. nerpderp82 ◴[] No.36810688{5}[source]
Fly sounds like they need some Conway's Law. A front end that designs the nice api and works on developer affordances and the backend that keeps it running and reliable.
30. brylie ◴[] No.36810697{4}[source]
I'm enjoying the DO App Platform (Heroku alternative). Do you know if Hetzner has a similar service that I could compare?
replies(2): >>36810730 #>>36857753 #
31. quickthrower2 ◴[] No.36810724[source]
Fly is in my “try later book” from a year or two ago. I remember it was hard to deploy anything due to downtime so gave up. Sad that stuff like this still happens.

You shouldn’t need to multi region a postgres yourself - they should have at least 2 data centre redundancy for the region and it just works.

Hope they get some magic sauce to become better at this.

replies(1): >>36811584 #
32. realusername ◴[] No.36810730{5}[source]
Personally I just install Dokku onto the machine, it replaced all my Heroku (and competitors) uses.

Additionally, you still keep the full ssh access to the machine if you ever need it.

33. quickthrower2 ◴[] No.36810740[source]
> We strongly recommend running multiple instances to mitigate the impact of single-host failures like this.

Make it impossible not to do so, and make it frictionless then.

replies(1): >>36810776 #
34. aidos ◴[] No.36810741{6}[source]
When I’ve run into this in the past Cloudflare Warp has been a bit of a saviour. It’s a hassle free way to flick a switch and follow a different path over the network.
35. swyx ◴[] No.36810752{4}[source]
good call out - please as an internet mob let us not ascribe to malice what can be attributed to sheer unintentional impacts of complex software
36. YetAnotherNick ◴[] No.36810760{4}[source]
Does anyone know how Hetzner pricing is half of DO yet is profitable, while DO is loss making with 6% operating margin?
replies(6): >>36810793 #>>36811511 #>>36811571 #>>36811650 #>>36812116 #>>36812917 #
37. remus ◴[] No.36810776[source]
That would presumably cost more money which is not a trade off every user would want to make.
replies(1): >>36812975 #
38. stevefan1999 ◴[] No.36810793{5}[source]
Simple, Hetzner mainly operates on Germany, the people are mostly Germans, and they automate the stuff to a point a small team could manage it well even if not remotely, so they have less cost on human resources.
replies(3): >>36811063 #>>36812572 #>>36813431 #
39. mythz ◴[] No.36810809{4}[source]
Same, been enjoying Hetzner's great value for 10 years, and now Hetzner Cloud for 2 years.
40. michaeldwan ◴[] No.36810885{4}[source]
Get out of here with this nonsense. We tell people when we’re a bad option all the time. Do you really think we have a desire (or time) to punish somebody for doing the same?

Also, here’s the long forgotten badge, still with 3 people… https://community.fly.io/badges/107/aeronaut

replies(4): >>36811286 #>>36812821 #>>36812879 #>>36813245 #
41. logeist ◴[] No.36810935{4}[source]
Conspiratorial or not that's enough for me to never use it. God forbid someone recommends another platform that handles your clear shortcomings.
replies(1): >>36811424 #
42. tacker2000 ◴[] No.36810954{4}[source]
I use both and am very satisfied, especially by Hetzner.
replies(4): >>36811042 #>>36811227 #>>36811651 #>>36857655 #
43. sanswork ◴[] No.36810987{4}[source]
That's like the main selling point of Fly.
44. sanswork ◴[] No.36810993{5}[source]
They certainly don't only have a single machine in SYD since I have a bunch of machines running in SYD that we're impacted by this one.
45. revskill ◴[] No.36811005[source]
But HN is not a customer service forum ?
replies(2): >>36812234 #>>36812539 #
46. benjaminwootton ◴[] No.36811025[source]
Should losing a single host machine be a big deal nowadays? Instance failure is a fact of life.

Even if customers are only running one instance, I would expect the whole thing to rebalance in an automated way especially with fly.io being so container centric.

It also sounds like this is some managed Postgres service rather than users running only one instance of their container, so it’s even more reasonable to expect resilience to host failure?

replies(3): >>36811755 #>>36811788 #>>36813069 #
47. candiddevmike ◴[] No.36811042{5}[source]
Only complaint with Hetzner is they don't have some kind of OAuth setup for machines or scoped API tokens, just read/write. I'd like to use the former for doing Vault authentication from instances, and the latter for writing a dynamic Vault secret provider.
replies(1): >>36812679 #
48. rahkiin ◴[] No.36811063{6}[source]
They also build their own servers in their own datacenters
replies(1): >>36811239 #
49. 5e92cb50239222b ◴[] No.36811202[source]
> ^H^H^H^H

alt+backspace will wipe that substring in most shells in one go.

replies(4): >>36811395 #>>36811438 #>>36811442 #>>36811780 #
50. throw382642 ◴[] No.36811227{5}[source]
I remember someone complaining they had to send Hetzner a passport or some other type of ID to cancel their services.

Does anyone know if that's still the case?

replies(2): >>36811419 #>>36811723 #
51. raybb ◴[] No.36811239{7}[source]
Does digital ocean not do this?
replies(3): >>36811553 #>>36811969 #>>36812259 #
52. sho ◴[] No.36811286{5}[source]
> Do you really think we have a desire (or time) to punish somebody for doing the same?

idk man, there's these awfully convenient disappearing forum threads too. The benefit of the doubt is starting to expire.

I see you're a co-founder, so presumably you have some sway on priorities and skin in the game. I think you should take the reputational damage you're accruing here much more seriously than you apparently are. A few more incidents like this and it won't just be you telling people you're a bad option.

* edited to tone down the forum thread disappearance angle. FWIW I do believe that it likely wasn't deliberate. My main point was that these things add up and "of course we wouldn't do that!" starts to ring a little hollow the 10th time you hear it...

replies(1): >>36811385 #
53. p-e-w ◴[] No.36811385{6}[source]
> you've just been caught hiding inconvenient forum threads too

FWIW, I do believe them when they say this wasn't intentional. Considering how the Internet operates, they would be incredibly stupid to do something like that on purpose.

That being said, the way the entire affair was handled certainly leaves a lot to be desired.

replies(1): >>36811496 #
54. mewmew07 ◴[] No.36811395{3}[source]
it would loose the comic appeal though
55. toyg ◴[] No.36811396{5}[source]
I am an interested party in the process space, and I think that's ungenerous. When you work with a complex tool every day, and you have to find solutions for this or that issue, develop strategies for this or that business case, etc etc, you're not really shopping - it's more like you're in the trenches. At that point, finding people who have the same issues and talking shop with them, can be great for both knowledge exchange and camaraderie. Linux wouldn't be what it is today without the LUGs era, for example.
replies(1): >>36811870 #
56. selectnull ◴[] No.36811419{6}[source]
They require passport or some sort of ID on registration, and it is weird when compared to others. I was not happy with that part, but I am happy customer since (almost a decade now).

As far as I know, they do not require any ID when canceling the service.

57. robertlagrant ◴[] No.36811424{5}[source]
> Conspiratorial or not that's enough for me to never use it

Well if it's not true then that would be a silly reason to pick to not use them.

58. alias_neo ◴[] No.36811438{3}[source]
Thank you for that little nugget. I learned something today :)
59. camgunz ◴[] No.36811442{3}[source]
ctrl-w my friend. Don't even have to put down your drink.
60. fauigerzigerk ◴[] No.36811443{5}[source]
Do they offer authentication/authorization?

This is the one thing I need in every app and don't want to do myself.

replies(4): >>36811797 #>>36813386 #>>36813744 #>>36814722 #
61. sho ◴[] No.36811496{7}[source]
I actually believe them on that too, FWIW. This time. It's just too dumb. I hope, for their sake, it's the truth.

I was really just trying to point out that this kind of good faith benefit-of-the-doubt has a limit, and fear of reaching that limit should be keeping people at fly up at night a lot more than it apparently is. I don't know how many colossal public fuckups a company can endure before its reputation is permanently ruined, but it's definitely not infinite.

62. ushakov ◴[] No.36811511{5}[source]
Me and my partner have paid a visit to their datacenter in Nüremberg. The answer is efficiency. They get more processing power than the other providers for the energy they have to put in
replies(1): >>36811622 #
63. re-thc ◴[] No.36811553{8}[source]
They don’t.
replies(1): >>36814363 #
64. ushakov ◴[] No.36811571{5}[source]
Efficiency. They get much more processing power per kw/h of energy than everybode else
65. throwawaymaths ◴[] No.36811584{3}[source]
> Hope they get some magic sauce to become better at this.

When I saw them describe their multiregion SQL replication architecture I thought "what crazy person thought this wouldn't eventually open up a spider's nest of distributed systems errors?"

replies(2): >>36813480 #>>36814468 #
66. arrowsmith ◴[] No.36811622{6}[source]
What do they do that makes them more efficient?
replies(1): >>36811798 #
67. devjab ◴[] No.36811650{5}[source]
They run their own data centres and have for a while. There is a pretty big industry for that sort of thing as an alternative to “the cloud” here in Europe.

We used to use nianet to house our hardware in Denmark. Basically these companies does hardware renting and they also do hardware renting with more steps which is where you rent rack space but own the hardware. They provide the place for the hardware and they also have multiple locations so that you have both backup and redundancy, and while it doesn’t scale globally in 20 years I’ve literally never worked on anything that needed to beyond having some buffer caches for clients logging in on their vacations or something like that.

What Hetzner seems to be doing with the DO styled hosting, and this is just a guess, is that they are one or the many EU companies preparing for the big EU exodus from the non-EU cloud. Which is frankly a solid bet these days where both AWS and Azure are increasing prices and are becoming more and more unusable because of EU legislation. Part of this is privacy which Microsoft and Amazon are great with in terms of compliance, but part of it is also national security. I work in an investment bank that builds solar plants, since finance and energy are both critical sectors we risk being told that half of the finance/energy companies in the world can’t use Microsoft because the EU seems it as a single point of failure if our entire energy sector relies on Azure. Which is sort of reasonable right? But what this means for us is that we can’t vendor lock-in, not really, because we need to have up-to-date exit strategies for how we plan on being fully operation a month after leaving Azure. Which is easy when you just containerise everything and run it in VMs or similar, and really annoying if you go full in on things like AKS. Which doesn’t help our Azure costs.

Anyway, right now we are planning on leaving Azure because of cost. Not today, not next week but sometime in the next 5-10 years and a lot of these EU cloud alternatives that actually operate the hardware instead of renting it are likely going to be a very realistic alternative. And that is the private sector, I spend time in the EU public sector which is a massive amount of money and I’m guessing it’ll leave both AWS and Azure by 2050. Some of these EU cloud initiatives is going to explode when that happens, and right now, hetzner is one of the best bets.

To get back to your question, DO rents server space. I have no idea where they’d rent it in Germany but they could potentially be renting it from Hetzner.

replies(3): >>36811770 #>>36811838 #>>36813024 #
68. kristiandupont ◴[] No.36811651{5}[source]
Do they have Terraform providers? And managed Postgres? Besides from the ability to just host a Docker container, that is all I need.
replies(1): >>36811955 #
69. abwizz ◴[] No.36811718{5}[source]
wow! sub mbps indicates that there is indeed no peering at all (political issues?) but just a transit connection via an overloaded carryall.

collect some evidence, maybe someone wants to do something about it.

70. fx1994 ◴[] No.36811723{6}[source]
Well I would appreciate that, since I was victim of russian hackers and they had access to all my servers and stuff on Hetzer, they even changed passwords and mail on Robot but i restored everything...
71. bongobingo1 ◴[] No.36811724{5}[source]
> machine to serve a wealthy metropolis of 8 million

It's actually the only region to serve the entire AU and NZ population with any reasonable latency. (Ok, Singapore can do in a pinch for at least sub 200ms.)

You'd wanna hope its more than one machine!

72. DoubleFree ◴[] No.36811755[source]
Fly postgres is not managed postgres, it's cli sugar over a normal fly app, which the [docs](https://fly.io/docs/postgres/) make quite clear. Their docs also make clear that if you run postgres in a single-instance configuration, if the hardware it's running on has problems, you database will go down.

I believe the underlying reason that precludes failing over to a different host machine, is that fly volumes are slices of host-attached nvme drives. If the host goes down, these can't be migrated. I _think_ instances without attached volumes will fail-over to a different host.

Of course, that's not ideal, and maybe their CLI should also warn about this loudly when creating the cluster.

73. abwizz ◴[] No.36811770{6}[source]
commendable to plan a few years ahead, but betting on the state of cloud business 26years from now seems a bit over the top
replies(2): >>36812142 #>>36815807 #
74. layer8 ◴[] No.36811780{3}[source]
The ^H^H^H^H above was for human readers though.
75. smallerfish ◴[] No.36811788[source]
If you lose a single instance on RDS and you don't have replication set up, you'll also have downtime. (Maybe not with Aurora?)

And +1 to the sibling comment; Fly makes it very clear that single instance postgres isn't HA, and talks about what you need to do architecturally to maintain uptime.

replies(3): >>36812045 #>>36812724 #>>36813726 #
76. spacebanana7 ◴[] No.36811797{6}[source]
I've been using Supabase for authentication/authorization in my recent side project.

The main app is node/express running on Digital Ocean and it connects to directly to the Supabase hosted Postgres for most operations, but then uses the Supabase auth API for auth related stuff.

Saves a lot of time sending password reset emails etc and the entire project costs less than $5/mo in hosting costs.

77. abwizz ◴[] No.36811798{7}[source]
i'll guess they pick optimized components for it.

like the longtime workhorse was a high performance skylake desktop cpu w/o ecc ram

replies(1): >>36813464 #
78. dpeckett ◴[] No.36811838{6}[source]
Couldn't agree more, I think Hetzner is probably Europe's best bet on a hyperscaler. One of the more telling indicators IMO is their growing market share outside of the EU/DACH.

To add on to the comments about Hetzner building their own custom hardware, they also custom built their own software stack. They rejected the hype that was OpenStack and worked diligently on their own hypervisor platform (that they are incredibly secretive about) and that appears to be paying off in spades for them. Most sovereign cloud plays end up being suffocated by the complexity, and incoherence, of the OpenStack ecosystem. It just becomes impossible to ship.

For a fascinatingly different take on how to build a datacenter: https://www.youtube.com/watch?v=5eo8nz_niiM

* Edit: remove speculation about Kubernetes and Hetzner, that was based on hazy memory.

replies(2): >>36812139 #>>36812449 #
79. vasco ◴[] No.36811870{6}[source]
We're talking about private companies running forum software instead of providing support. We're not talking about the power of IRC or mailing list communities for open source projects and the like.

If I pay for something I want the person I pay money to help me fix problems I get.

80. d_k_f ◴[] No.36811955{6}[source]
Yes and (unfortunately) no. Terraform providers are here [1] with the official documentation at [2]. Managed databases are not available, though. I think they have some sort of database offering if you select their web hosting options, but you can't just get a managed Postgres instance yourself.

[1] https://registry.terraform.io/providers/hetznercloud/hcloud/... [2] https://community.hetzner.com/tutorials/howto-hcloud-terrafo...

EDIT: For what it's worth, I have had good experiences with app servers hosted on Hetzner Cloud and managed Postgres provided by ElephantSQL (https://www.elephantsql.com/) for Germany-based apps.

replies(1): >>36812314 #
81. ◴[] No.36811969{8}[source]
82. marcinzm ◴[] No.36812045{3}[source]
Downtime but limited downtime since the data is stored with redundantly across multiple machines in the same AZ. So unless the AZ goes down (which is a different failure than what happened here) you can restart the DB on a different instance pretty quickly and I'm guessing AWS will do it automatically for you.

edit: Remove triple as not certain about level of redundancy

replies(1): >>36812543 #
83. fxtentacle ◴[] No.36812116{5}[source]
I've been with them for a long time and my guesses would be:

1. Strict rules and strict customer verification. Crypto mining that wastes SSDs is not allowed. Portscans, mass emails, etc. are not allowed. They also don't offer GPUs to the general public because it has been abused in the past. You usually need to send in ID documents just to open an account. My guess is this allows them to avoid most bad actors and, thereby, waste less money on fraud.

2. Extremely long-term investments. They typically build their own hardware and then use it over 10 years. They have their own flea market where you can rent older server models for a steep discount. That means they will have a long time where the hardware is fully paid off and still generating revenue.

3. Great service. With a mid-sized company, I can call their technicians in the middle of the night. The fact that we could call them in case of a crisis has generated A LOT of good will. But I would be truly surprised if they didn't make a profit off those phone calls, as they charge roughly 4x the salary cost.

4. High-margin managed services. In addition to just the cheap servers, they also offer a managed service where they will do OS and security upgrades for you. It's roughly 2x the price of the server and it appears to be almost fully automated. I know some freelance web designers who will insist on using Hetzner Managed for deployment for their clients, because it is just so convenient. You effectively pass off all recurring maintenance for €300 a month and your client is happy to have an emergency phone number (see #3) in case the box goes down.

84. bigjoes ◴[] No.36812139{7}[source]
Could you please elaborate how and what you know about managed Kubernetes on Hetzner?

I am asking for this since a while and was told there is no way Hetzner would offer such a service. Certain Posts on Social Media have also never been answered with any kind of indication that they are actually working on it.

Please provide some Details on this.

replies(1): >>36812348 #
85. detourdog ◴[] No.36812142{7}[source]
I think multi-national energy sector should be working toward the goals without the regulations. The more prep done before the change the smoother the transition.
86. kinduff ◴[] No.36812172{4}[source]
Same, tried a bunch before moving completely to Hetzner. I'm super happy with their service.
87. noizejoy ◴[] No.36812234{3}[source]
> But HN is not a customer service forum ?

you must be new here ;-)

88. stevefan1999 ◴[] No.36812259{8}[source]
The competitor of DO, Vultr does this IIRC, yet it is not really cheaper
89. yard2010 ◴[] No.36812302{4}[source]
I love their high value content about dev ops, I have learned most of what I know in this field tinkering with a VPS with their great tutorials on how to set up stuff.
replies(2): >>36812714 #>>36823186 #
90. kristiandupont ◴[] No.36812314{7}[source]
Got it, thanks. I've used ElephantSQL as well and I've been happy with them.
91. dpeckett ◴[] No.36812348{8}[source]
They were in person recruiting at KubeCon EU this year and were advertising a good number of Kubernetes engineering roles. Definitely gave me the impression they were taking Kubernetes seriously but looking back a managed offering was just speculation on my part.

So huge grain of salt, you are totally right. It could be internal platform work only.

92. dpeckett ◴[] No.36812449{7}[source]
For anyone interested in Kubernetes on Hetzner, there's a really interesting CAPI provider being actively developed:

https://github.com/syself/cluster-api-provider-hetzner

93. gtirloni ◴[] No.36812539{3}[source]
It's often used as an escalation point when people can't get support from certain companies (most notably, Google). If an employee lurks in here and sees your post, they might contact the right people to fix your issue.

Smaller companies also do a lot of PR damage control and constantly monitor HN for threads complaining about their services.

You're not wrong but that's how it works.

replies(1): >>36814790 #
94. truetraveller ◴[] No.36812543{4}[source]
I don't believe their RDS / EBS has 3x redundancy. With SSD, that would be super costly for them. But if that's correct, that would be incredible.
replies(1): >>36812731 #
95. KronisLV ◴[] No.36812572{6}[source]
> Simple, Hetzner mainly operates on Germany, the people are mostly Germans, and they automate the stuff to a point a small team could manage it well even if not remotely, so they have less cost on human resources.

I feel like there might be more to it, especially considering the situation with electricity prices in some places in EU recently.

I used (and still use) a Lithuanian platform called Time4VPS which was cheaper than Hetzner previously, yet had to increase their prices somewhat for that reason. Now only some of their plans are competitive with Hetzner, while Hetzner also provides some managed services as well.

Hetzner docs also went into some of the details regarding the pricing: https://docs.hetzner.com/robot/general/pricing/hetzner-prici...

And yet, I can't help but to wonder why they don't give in to the desire to maximize profit margins, like happened to say Scaleway (good platform, but as expensive as DigitalOcean).

96. oefrha ◴[] No.36812597[source]
I was confused why support for platform failure relies on a forum where employees may or may not check. After checking docs[1], apparently you have to be on a paid plan (at least $29/mo) to access email support, so you may not have it even you’re paying for resources.

I won’t be using it for side projects where I’m okay with paying $5-10/mo but don’t want to have three day outages.

[1] https://fly.io/docs/about/support/

replies(1): >>36813041 #
97. victor106 ◴[] No.36812679{6}[source]
Can’t you use a third party IAM solution for this? Like Okta or keycloak?
replies(1): >>36846920 #
98. brightball ◴[] No.36812714{5}[source]
They filled the Slicehost vacuum nicely in this area. That's where I got my start in running my own servers about 15 years ago and the tutorials were the driving factor.
99. williamdclt ◴[] No.36812724{3}[source]
> Maybe not with Aurora

If a read replica fails, I'd expect no downtime (possibly a few errors as connections get cut off abruptly). Although there's always the risk that the remaining instances aren't able to handle the additional load.

If the master fails, you'll get a ~2min downtime

100. marcinzm ◴[] No.36812731{5}[source]
May not be 3x but it is replicated so even a total instance failure would not make you lose data:

>Amazon EBS volumes are designed to be highly available, reliable, and durable. At no additional charge to you, Amazon EBS volume data is replicated across multiple servers in an Availability Zone to prevent the loss of data from the failure of any single component. For more details, see the Amazon EBS Service Level Agreement.

https://aws.amazon.com/ebs/features/#Amazon_EBS_availability...

101. kotaKat ◴[] No.36812821{5}[source]
Why are you acting so hostile? If you don't like that the community is dunking on you, then maybe posting on Hacker News isn't for you.
102. EspressoGPT ◴[] No.36812917{5}[source]
Overstaffed, overinflated and inefficient Silicon Valley startup vs. organically-grown, well-adjusted, efficient German company.
replies(1): >>36813209 #
103. thefreeman ◴[] No.36812956[source]
total shot in the dark, but, was it a transaction id wrap around?
104. marcinzm ◴[] No.36812975{3}[source]
You cannot make every user happy, and its generally better to not have a user than to have an unhappy user.
105. thejosh ◴[] No.36813024{6}[source]
Hetzner also do some crazy-cool stuff, especially around the 7950X3D, cooling, AM5 etc. (https://www.youtube.com/watch?v=V2P8mjWRqpk). They also do some amazing stuff with ARM (their cloud offering is really solid for this).
106. MuffinFlavored ◴[] No.36813041[source]
Forewarning: I am not being critical of fly.io nor their free support whatsoever when I say this.

From a technical perspective, could they have "been better" from a technical perspective? I see their name a lot on HN so I know they are doing really cool + advanced things and this is probably some super small edge case that slipped through the cracks.

Could they have added some message / do we as the HN community feel they needed to be like "we're gonna add some extra logging/monitoring going forward so it won't happen again"?

By all means, they probably don't owe anybody in terms of stability + uptime guarantees when it comes to a free tier. Sh*t happens.

replies(3): >>36813087 #>>36813165 #>>36813910 #
107. capableweb ◴[] No.36813069[source]
> Should losing a single host machine be a big deal nowadays? Instance failure is a fact of life.

Depends on where in your development cycle you are. If you just got started and haven't even figured out what you're actually building (prototyping), you shouldn't really use a hosting provider that randomly lose instances.

If you're on the other hand have done everything to improve your applications performance, had to resolve through-output issues with a distributed architecture and now running 10+ instances, then losing one host shouldn't impact you too much. But you really shouldn't start this way, it's doing web services the hard way and introduces a lot of complexity you shouldn't want to deal with when you're still trying to find product market fit.

replies(1): >>36813159 #
108. dangoodmanUT ◴[] No.36813077{4}[source]
Hetzner has a record for going silent with issues FYI, just hit their reddit to see all the horror stories
109. riwsky ◴[] No.36813087{3}[source]
They broke uptime for the paid tier, not just the free tier.

The relevance of paid/free is that free (and cheap paid) plans don’t get fly support over email

110. dbingham ◴[] No.36813142{4}[source]
I love DO for projects where I don't need control. For my side project, I eventually migrated to AWS after running into a lot of issues with DO.

Things like they don't give you the postgres root user on their managed postgres. And I ran into issues trying to capture the deployments in code. Their terraform providers are pretty good, but still leave something to be desired. For all its many warts, I'm much happier back on AWS. It did end up more expensive, but it's worth it for the fine grained control in my case.

But I spent the last 5 years as a DevOps/SRE, so... uh... I'm picky.

replies(2): >>36813306 #>>36823571 #
111. riwsky ◴[] No.36813159{3}[source]
GP is referring to fly.io architecting for single instance failures, not its customers.
112. azemetre ◴[] No.36813165{3}[source]
They may not owe anyone anything but over time these types of issues can cause a large reputation hit.

If I was just searching online or trying to find out what various communities think about Fly.io and see several threads about major outages with poor communications, do you think I will use their services? It would be an immediate pass.

It takes a long time to build a reputation, and you can lose it instantly.

113. wongarsu ◴[] No.36813209{6}[source]
Not to mention a German company that has price sensitivity in their DNA. Their first servers were just regular consumer tower PCs to drastically cut hardware costs. Now many years later it's a highly optimized mix of consumer, server and inhouse parts (e.g. they use their own racking system instead of 19", and the datacenters are built to make use of convection for a lot of the cooling). They also offer regular Dell servers for those that want them, but at 2x-4x the price of their homegrown boxes.
114. ryanrussell ◴[] No.36813245{5}[source]
Why is anyone on HN "dunking" on Fly.IO of all companies?

Michael - Don't take the bait.

As someone who has zero affiliation with Fly.IO other than a few PR's to their OSS(I don't even know Michael), I greatly appreciate the contributions they have given back to the community.

There are a lot of great hosting companies. Fly.IO stands out due to their revolutionary architecture and contributions back to the OSS community. I wish more companies operated like this.

It's understandable some are upset about an outage. But Fly is doing really interesting and game-changing things, not copying a traditional vmware, cpanel or k8s route.

Just as a reminder to what this company has offered back to everyone.

SQLite: Ben Johnson's OSS work around SQLite stands out. Fly.IO and his work have really made sqlite a contender. - https://fly.io/blog/all-in-on-sqlite-litestream/ - https://fly.io/blog/introducing-litefs/ - https://github.com/superfly/litefs - https://github.com/benbjohnson/litestream - https://fly.io/blog/sqlite-internals-wal/ - https://fly.io/blog/wal-mode-in-litefs/

Who really considered sqlite as a production option before Fly and Ben? Not me.

Firecracker: Firecracker is amazing, but difficult to debug when something bad happens. There aren't a ton of people in devops who would share what they have. If you've ever used Firecracker, you've really been helped a lot by the various guides they have provided back to the community like these: - https://fly.io/docs/reference/architecture/ - https://fly.io/blog/fly-machines/ - https://fly.io/blog/sandboxing-and-workload-isolation/

Their architecture is beautiful and revolutionary. They're probably the first or second ones to find a lot of the new edge cases as they grow.

It's a lot harder to be the first one over the wall than it is to copy. They've literally given the average developer a blueprint to build scalable businesses that compete with their own.

115. danielvaughn ◴[] No.36813291{5}[source]
Yeah I hadn't seen those newer features until recently, the one-click deployments are super cool.
116. danielvaughn ◴[] No.36813306{5}[source]
That's interesting, because granular control is why I enjoy DO, although I'm thinking about it from the server perspective. They set up a machine, give me root access, and that's literally it. I set up my own ssh keys, firewalls, and there's no additional abstraction that I have to learn. I might just be reminiscing because right now I'm on a team where we're writing terraform/helm/k8s in GCP and it makes me want to cry myself to sleep each night lol.
117. okhuman ◴[] No.36813386{6}[source]
Would you consider a project like https://github.com/authcompanion/authcompanion2 for the authentication side? Missing anything?
replies(1): >>36814350 #
118. api ◴[] No.36813431{6}[source]
I've wondered how they can host this cheap in Germany given their very high electricity prices.

Maybe that's not actually the dominant cost, or they've optimized everything else so well they can just eat the electric bill.

119. ushakov ◴[] No.36813464{8}[source]
The secret is in the cooling system. They have individual cooling systems for each server. Less heat = longer sustained loads
replies(1): >>36824593 #
120. api ◴[] No.36813480{4}[source]
CockroachDB does this, but that's the result of over 10 years of heads down hard-ass engineering and it's still slower than Postgres because distributed sync is not free. That means you have to provision it properly and with enough resources.

Their license would require a company like fly.io to pay them though, so I'm sure this resulted in fly.io instead trying to whip up an improvised infrastructure on the back of stock Postgres. I bet this cost them a whole lot more than paying CockroachDB would have, but devs have been conditioned that you should never ever pay for software even if it's the result of tons of deep engineering and solves massive brutal problems for you. I also bet there's some not-invented-here ego involved.

P.S. I don't work for CDB but I would absolutely consider them and we may end up using them at some point. They let you do a ton for free. They only charge for stuff you need if you get really really huge or if you are running a SaaS reselling DB services like fly.io would have been doing.

121. graypegg ◴[] No.36813660{3}[source]
DO actually does have a free tier! If you use their “app platform” (their equivalent to fly/heroku/render/etc) you can host 3 “static” apps for free. So if you have a Hugo/Jekyll blog or something, it’ll set up a whole little CD system for it for free.
replies(1): >>36823603 #
122. bjord ◴[] No.36813668{4}[source]
historically, I've used Vultr, but I don't see anyone talking about it—I'm curious if anyone else has thoughts on them? (I've been happy, but then again my usage has been exceedingly basic)
replies(2): >>36813883 #>>36814910 #
123. plagiarist ◴[] No.36813681[source]
Why is it my responsibility to move instances from machine to machine to mitigate a cloud host's outages? What is their utility if not performing the bare minimum of cloud host responsibilities keeping my container up?
124. api ◴[] No.36813726{3}[source]
Yeah but you won't lose your data. They have backup infrastructure and EBS is rock solid.

Down time is one thing. Data loss is something else.

125. pc86 ◴[] No.36813744{6}[source]
In addition to Supabase Auth the sibling mentions (which I played with very briefly) I've been using clerk.dev (no affiliation) and it's great. Depending on your definition of doing it yourself it could be just want you want. You have to set some things up, you're not going to get things like row-level permissions you get out of the box w/ Supabase, but if you're looking for a quick implementation where things like password reset etc. are handled for you, it might be a good fit.
126. tedchs ◴[] No.36813883{5}[source]
I've used Vultr for several years (hobby projects) with no issues. My favorite feature is having a BGP session from my VM, which is unusual among cloud providers. I have an AS and am able to advertise my own IPs from multiple Vultr instances (anycast).
replies(1): >>36814660 #
127. no_wizard ◴[] No.36813890{3}[source]
I'm a fan of Linode as well.

I want to like Fly, but the reliability is one of those were I feel like every time I investigate moving workloads over I'm disappointed by these stories over and over again.

128. elderlydoofus ◴[] No.36813910{3}[source]
FWIW: I am on the bottom tier of the paid plans ($29/mo) so I could get access to the email support, and even with that their response time is still not great.

I have an ongoing issue with one of my PG clusters where one of the nodes was failing and all my attempts at fixing it are failing (mainly cloning one of the other machines to bring the cluster numbers back to normal).

I emailed my account’s support email mid Friday morning last week and did not hear back until this past Monday night.

Sucks, because like a lot of others in this thread I like what Fly is trying to do and am rooting for them, but IMO they should use a significant chunk of that funding they just received on hiring a ton of SREs and front line customer support.

EDIT: I should add, the past times I have emailed them the response time was good. It's just this most recent time was so egregious (3 days!) to get even that initial response that I bring it up.

129. ◴[] No.36814012[source]
130. eduction ◴[] No.36814283{4}[source]
I went to DO's site due to your comment and I don't see anywhere where I can just get a server. Do you mean a VPS/Droplet? (I'm looking under Products and Solutions.)
replies(2): >>36814600 #>>36814680 #
131. fauigerzigerk ◴[] No.36814350{7}[source]
No I would not.

I don't like self hosting anything that requires its own process. And if I did decide to self host I would choose a more mature project.

This is a very young one man project delegating the heavy lifting to another one man project. And it doesn't appear to support social logins.

replies(1): >>36819868 #
132. cutemonster ◴[] No.36814363{9}[source]
Where do DO get their servers and data centers from? ... Apparently they run on AWS, I'm surprised
replies(1): >>36834390 #
133. tptacek ◴[] No.36814468{4}[source]
Our multiregion SQL replication architecture is the standard Postgres multiregion replication architecture. We do single-write-leader, multiple reader replicas, like everybody else does.
replies(1): >>36823069 #
134. tptacek ◴[] No.36814488{5}[source]
Obviously, we have a bunch of machines, both workers and edge servers, in Sydney. The whole Sydney region didn't go down; one worker did.
135. gregsadetsky ◴[] No.36814600{5}[source]
Not GP, but yes -- Droplets are DigitalOcean's "servers" (virtual, but nonetheless).

You boot one up in less than 30 seconds, and get ssh access to it almost immediately. It's very BS-free.

136. LtdJorge ◴[] No.36814660{6}[source]
How do you get an AS?
137. danielvaughn ◴[] No.36814680{5}[source]
The other commenter was correct - I meant a droplet. Should have been more explicit, apologies. But yeah if you're looking to learn how to work with backends, going through a droplet set up is by far the best way to get started IMO.
138. LtdJorge ◴[] No.36814722{6}[source]
I like https://github.com/goauthentik It has Helm charts and a Terraform provider.
139. tptacek ◴[] No.36814790{4}[source]
That's not what happened here. We're talking about an outage that was resolved days ago, long before this thread went up.
140. creeble ◴[] No.36814910{5}[source]
Have used both DO and Vultr for years. Put simply, DO is better, but Vultr isn’t terrible.

Higher number of outages at Vultr over 5 years, but none longer than a few hours. I can’t remember the last DO outage lasting more than a few minutes.

Experienced a Vultr routing problem that lasted several hours; they communicated about it, but it was still a long time to fix.

DO once did an auto-migration of a server to another cluster with an attendant outage that lasted a few minutes at most. No IP changes, completely transparent.

141. devjab ◴[] No.36815807{7}[source]
I think you might misunderstand me. The 2050 is a guesstimate and it's just my opinion on the matter. As far as planning ahead goes, you plan for 5-10 years when you try to figure out where to "iron" your enterprise IT. This is because that's how long your hardware will last if you go the route of renting rack space with your own hardware. I think we tend to plan for 8 years, with some space for "unintended" early failures on things like controllers after 4 years. So while you can contract big-cloud vendors for shorter, I think ours is on 3 year contracts right now, you still sort of do the business case for much longer. Maybe not every 3 years, but at least every 6 years.

You do the same on the other side of the table. Companies like Hetzner knows that EU cloud sollutions are likely to see growth, so it's only natural that they invest in the tech to put themselves in a prime position to jump on the opportunity. Selling a good product while you do so is the way I would do it personally, but you also have EU cloud initiatives backed by VC money going straight for the endgame.

142. okhuman ◴[] No.36819868{8}[source]
thanks for the feedback.
143. throwawaymaths ◴[] No.36823069{5}[source]
This is not standard. I see now that it is legacy, but I think it still demonstrates a bit of poor judgement. I believe it was before you were at fly, tptacek

https://fly.io/docs/getting-started/multi-region-databases/

144. xp84 ◴[] No.36823186{5}[source]
Seriously! They have an amazing article I followed one time to set up a k8s cluster to run any container I wanted with full automatic ssl provisioning/management and dns. Make a quick little yml file that includes what subdomain it wants to be and kubectl apply. The cluster was like $100 a month all-in and performed like a beast at huge traffic levels, and all I did was follow a tutorial.

I know that’s probably pretty easy for many, but I was pretty new to k8s and it felt like magic.

145. steve_adams_86 ◴[] No.36823458{4}[source]
I agree. I can either abstract with the app platform or kubernetes, or I can go straight into the box myself and do whatever needs doing. It has been a real pleasure.

I think fly’s tooling feels better than doctl, but the infrastructure is incomparable at the end of the day. doctl has improved over time too, and with added pressure from newcomers I don’t doubt that it’ll continue to improve.

146. steve_adams_86 ◴[] No.36823542{5}[source]
Same, it works really well.

Part of what inspired me to give fly.io a shot was that I didn’t love the monorepo deployment story on the app platform. Fly doesn’t have a solution to that, but I suppose I felt less tied to DO at the time because I wasn’t totally content anyways. I’ve discovered since then that I was actually doing it wrong, so I’m way happier. I’m pretty big on monorepos so their whole system fits my workflow remarkably well now.

I’d like to figure out how to prevent deployments when my code doesn’t change in one app, but does in another. At the moment, pushing anything at all will trigger all apps to rebuild and deploy again. Not a huge deal and several orders of magnitude less painful than not being able to deploy at all, haha.

147. steve_adams_86 ◴[] No.36823571{5}[source]
Those are good things to know. I’ve been wondering about their managed databases recently, so I’ll keep that in mind.

I’m nowhere near as picky as you are, but maybe I’ll need to be at some point. As it is I mostly just build stuff and send it to the internet. If it builds and it does what I expected, I’m pretty happy! I don’t often need anything too special.

148. steve_adams_86 ◴[] No.36823603{4}[source]
You’re totally right. I kind of forgot about this, in part because I’m over their free limit. I think their static sites are still dirt cheap once you hit that limit, though. I find their pricing totally reasonable for what I need.
149. abwizz ◴[] No.36824593{9}[source]
pardon my ingorance but i cannot quite see how cooling individual machines vs. the hole rack or row makes a difference in total heat production per machine
150. apple4ever ◴[] No.36827607{4}[source]
I really love DO except for one thing - you can't run your own firewall/router there (like opnSense). Really hard to link systems together.
151. re-thc ◴[] No.36834390{10}[source]
> Apparently they run on AWS, I'm surprised

They don't run on AWS. Not sure what sort of rumors are running :(

> data centers from?

The major players e.g. Equinix, Coresite, etc. Varies per location. Even AWS don't build most of their data centers.

152. michaelcampbell ◴[] No.36834710{4}[source]
I find myself going to DO docs on various setup things even when I'm not using said thing on DO (although I'm also a DO customer, and love them for the reasons you've stated).
153. mffap ◴[] No.36846920{7}[source]
zitadel supports service users with rbac. maybe give it a look/try: https://github.com/zitadel/zitadel
154. hardwaresofton ◴[] No.36857655{5}[source]
Hey would you be into trying a manage service platform I'm building for Hetzner? It's called Nimbus[0].

I'd love some feedback, specifically:

- Which services do you most want to use/have managed

- What databases do you find yourself using the most

- Concerning caches, do you use memcached or mostly Redis?

[0]: https://nimbusws.com

155. hardwaresofton ◴[] No.36857753{5}[source]
Hey I'm building a managed service platform (not quite an app store!) on top of Hetzner -- would you be interested in trying it out?

Contact is in my profile but I'd love to have some more people kick the tires and tell me what they want built the most.