←back to thread

797 points burnerbob | 1 comments | | HN request time: 0.264s | source
Show context
tptacek ◴[] No.36810326[source]
Y'all, this is going to be deeply unsatisfying, but it's what I can report personally:

I have no earthly clue why this thread on our community site is unlisted.

We're looking at the admin UI for it right now, and there's like, a little lock next to do the story, but the "unlist story" option is still there for us to click. The best I can say is: I'm reasonably sure there wasn't some top-down edict to hide this thread (the site is public, anybody can sign up for an account and see the thread).

Say what you want about us, but hiding out from stuff like this isn't one of our flaws. When I find out more about what happened with this thread, I'll let you know (or Kurt will reply here and tell me I'm wrong).

I don't know enough about what happened with this Sydney server to be helpful to people who had instances running on it. When I know more about it, I'll be helpful, but I'm just learning about this stuff right now, after getting back in from a night out.

Almost immediately afterwards

It looks like... all the posts in the app-not-working category are "private"? Like it's some setting on the category itself? "Private" here means you need to have signed up for a Discourse account to see them?

replies(12): >>36810339 #>>36810345 #>>36810393 #>>36810467 #>>36810497 #>>36810498 #>>36810755 #>>36810983 #>>36812367 #>>36812723 #>>36812856 #>>36834726 #
1. gerhardlazu ◴[] No.36834726[source]
I really like the work that you're doing Thomas, this is the right approach. FWIW, https://fly.io/blog/carving-the-scheduler-out-of-our-orchest... is one of my favourite posts on your blog.

For everyone else reading this, we have been running https://changelog.com on Fly.io since April 2022. This is what our architecture currently looks like: https://github.com/thechangelog/changelog.com/blob/master/IN...

After 15 months & more than 100 million requests served by our Phoenix + PostgreSQL app running on Fly.io, I would be hard pressed to find a reason to complain. - Some deploys failed, and re-running the pipeline fixed it. - Early July 2023, 9k requests from Frankfurt returned 503s. Issue lasted 10 seconds. - While experimenting with machines, after many creations & deletions, one volume could not be deleted. Next day, the volume was gone.

That's about it after 15 months of running production workloads on Fly.io.

We mention about our Fly.io experience often in our Kaizen pod episodes, which we publish every ~2 months: https://changelog.com/topic/kaizen. For anyone curious, this is the episode in which we announced the migration: https://changelog.com/shipit/50. There is a detailed PR which goes with it: https://github.com/thechangelog/changelog.com/pull/407. We've been talking about our migration plan from apps v1 (Nomad) to apps v2 (flyd) recently: https://changelog.com/friends/2#transcript-138

I'm sorry to hear that many of you didn't have the best experience. I know that things will continue improving at Fly.io. My hope is that one day, all these hard times will make for great stories. This gives me hope: https://community.fly.io/t/reliability-its-not-great/11253

Keep improving.