Most active commenters
  • tptacek(10)
  • teraflop(3)
  • michaeldwan(3)
  • xupybd(3)
  • (3)

←back to thread

797 points burnerbob | 56 comments | | HN request time: 1.82s | source | bottom
1. tptacek ◴[] No.36810326[source]
Y'all, this is going to be deeply unsatisfying, but it's what I can report personally:

I have no earthly clue why this thread on our community site is unlisted.

We're looking at the admin UI for it right now, and there's like, a little lock next to do the story, but the "unlist story" option is still there for us to click. The best I can say is: I'm reasonably sure there wasn't some top-down edict to hide this thread (the site is public, anybody can sign up for an account and see the thread).

Say what you want about us, but hiding out from stuff like this isn't one of our flaws. When I find out more about what happened with this thread, I'll let you know (or Kurt will reply here and tell me I'm wrong).

I don't know enough about what happened with this Sydney server to be helpful to people who had instances running on it. When I know more about it, I'll be helpful, but I'm just learning about this stuff right now, after getting back in from a night out.

Almost immediately afterwards

It looks like... all the posts in the app-not-working category are "private"? Like it's some setting on the category itself? "Private" here means you need to have signed up for a Discourse account to see them?

replies(12): >>36810339 #>>36810345 #>>36810393 #>>36810467 #>>36810497 #>>36810498 #>>36810755 #>>36810983 #>>36812367 #>>36812723 #>>36812856 #>>36834726 #
2. sho ◴[] No.36810339[source]
> I have no earthly clue why this thread on our community site is unlisted.

Maybe it's hosted in the SYD region

replies(1): >>36810414 #
3. teraflop ◴[] No.36810345[source]
There's also a lock icon next to the "App not working" category in the header, which I took to mean that that entire category is hidden from logged-out users (which experimentally seems to be the case).
replies(1): >>36810355 #
4. tptacek ◴[] No.36810355[source]
I have the impression from this thread that this thread was public (as in, would work if you just linked to it from something like HN) earlier, and now it isn't?

Obviously, deliberately hiding a negative story on our Discourse is a little like deleting a bad tweet; it's just going to guarantee someone captures and boosts it. We have a lot of flaws! But not knowing how the Internet works probably isn't one of them. No idea what's going on here, still trying to work it out.

replies(1): >>36810381 #
5. teraflop ◴[] No.36810381{3}[source]
Yes, from the Google-cached version, it appears that the thread previously didn't have the app-not-working tag; it was only tagged with "rails".

Not going to try and guess why or when that tag change happened. Personally, I'm less concerned with this particular thread than with the apparent decision to systematically hide all potentially-negative threads from search engines.

replies(1): >>36810486 #
6. michaeldwan ◴[] No.36810393[source]
I don't know why the app-not-working category effectively delists threads, but until we find out, I just removed it so this thread is public again.
replies(1): >>36810968 #
7. tptacek ◴[] No.36810414[source]
It's hosted by Discourse.
replies(1): >>36810432 #
8. NBJack ◴[] No.36810432{3}[source]
Another good reason to avoid that platform like the plague.
replies(2): >>36810466 #>>36810731 #
9. sethherr ◴[] No.36810466{4}[source]
What are the other good reasons? All my experiences with Discourse have been great.
replies(1): >>36811378 #
10. subarctic ◴[] No.36810467[source]
Glad to see you commenting here about this, I literally just posted a comment about how it's really messed up that you guys would do that
11. michaeldwan ◴[] No.36810486{4}[source]
That category was added after one of our support folks replied, likely for tracking. I don't know why it's private. They may not even know this category is private. Hiding negative shit wasn't a deliberate decision... we're aware of google cache and we don't need to give HN another reason to dunk on us.
replies(1): >>36810511 #
12. xupybd ◴[] No.36810497[source]
From this my take away is that I could get fired for picking Fly.io for work. Not because there was an outage but because days could pass before getting support.

What assurances could you give the community here that the support would be better next time?

replies(4): >>36810530 #>>36810775 #>>36811112 #>>36812013 #
13. throwaway220033 ◴[] No.36810498[source]
It looks like being authentic is valued over anything else at Fly. I can’t explain how a company responds this immaturely to incidents like these.
replies(4): >>36810505 #>>36810750 #>>36811004 #>>36811644 #
14. tptacek ◴[] No.36810505[source]
We're just people. We don't have the part of the company that keeps us from communicating like people in public. Maybe we'll grow it someday.
replies(2): >>36812086 #>>36812430 #
15. teraflop ◴[] No.36810511{5}[source]
> That category was added after one of our support folks replied

FYI, this doesn't appear to be strictly accurate. The OP commented at 23:52 UTC saying that the thread had been made private, and the reply from "Sam-Fly" was not posted until 02:36 UTC.

replies(1): >>36810582 #
16. tptacek ◴[] No.36810530[source]
This is our public site, for people who don't have support plans with us.

It's difficult for me to say more about what happened here and how you might have handled it, because I don't know what happened with this SYD host, because it's 1AM and the people who worked on it are, I assume, asleep. When I know more, I'll do my best to get you a postmortem.

replies(1): >>36810889 #
17. michaeldwan ◴[] No.36810582{6}[source]
My point was that the app-not-working category is used in conjunction with support/our team getting involved. I assume this is what Sam meant by "flagged it internally", which was followed by investigation, then a post. I don't see how the timestamps uncover something nefarious.
18. ◴[] No.36810731{4}[source]
19. subarctic ◴[] No.36810750[source]
If you're talking about the comment you're replying to, tbh I found it was way more relatable than a more "professional" PR-speak response. Maybe you were talking about something else
replies(1): >>36812031 #
20. ◴[] No.36810755[source]
21. tinco ◴[] No.36810775[source]
Try filing a bug with any of the big three cloud vendors when you're on their free plan. It's really not different, the thing that is going to get you fired is not realizing you're not paying a couple hundred bucks per month for premium service on the infrastructure that is mission critical to your company.
replies(2): >>36810915 #>>36814436 #
22. xupybd ◴[] No.36810889{3}[source]
>This is our public site, for people who don't have support plans with us

To be honest, that's enough for me. Sorry I didn't pick up on that.

23. xupybd ◴[] No.36810915{3}[source]
Funny story, when I started my current role I researched our hosting provider. I couldn't find the matching invoices in the accounting system. So I called the vendor, a local company. They'd not set our account up correctly, billing was not enabled. Since then we've been billed. I'm glad we sorted it but it wasn't a good look to start my role by increasing our spending.
replies(2): >>36811268 #>>36811427 #
24. gowthamgts12 ◴[] No.36810968[source]
may be it's to avoid search engines to not scrape these threads?
replies(1): >>36814292 #
25. tacker2000 ◴[] No.36810983[source]
You might be right, but in light of this whole disaster it doesn’t sound too convincing and doesn’t make your company look good.
26. freilanzer ◴[] No.36811004[source]
I'd rather take this response and see that they're working on it than "Oopsie poopsie, our machine elves have messed up!" or corporate newspeak saying nothing.
27. ajsharp ◴[] No.36811112[source]
Lots of experience with Fly's paid support here. tl;dr Absurdly good.

FAR better wrt both response times and technical expertise than you'll get with any large public cloud provider.

I was dealing with some annoying cert + app migration stuff (migrating most of an app from AWS to Fly), and Kurt (CEO) was personally sending me haproxy configs bc I'm not smart enough to know how to configure low-level tcp stuff in haproxy. Not to put him on the spot here -- I doubt he'll have time to do that level of support going forward -- but that's my experience of the company's dedication to support and technical expertise.

28. CameronNemo ◴[] No.36811268{4}[source]
My neighbor once had a gardener who delivered no bill. For years! Then out of the blue, $4k invoice.

Trust me, you did the business a favor.

29. inferiorhuman ◴[] No.36811378{5}[source]
The interface itself?

For instance one of those things I've noticed is that most Discourse instances have those nag banners if you're not logged in begging you to log in – and that's one of the least objectionable things they do IMO. I discovered recently that Discourse also blacklists all but the most recent browsers (because Discourse is designed for the next ten years!) and serves up a plain text version on anything older… but not without a nag banner of its own admonishing you for not using a supported browser.

The infinite scrolling… ugh. I'm not a huge fan of XenForo, but as a successor to vBulletin it seems to be far more user friendly.

30. mst ◴[] No.36811427{4}[source]
I feel like starting your role by discovering a crucial service wasn't being paid for and therefore was at risk of suddenly going away should be a pretty positive thing.

However 'should' is pretty load bearing there and actual results are probably heavily dependent on management culture and the current state of office politics.

replies(1): >>36811920 #
31. arrowsmith ◴[] No.36811644[source]
Eh, I like it. It's refreshing to see a company representative communicate like an actual human being instead of the usual meaningless corporate robot-speak.
32. tinco ◴[] No.36811920{5}[source]
We had a customer once that our automatic billing system tried to reach for 3 months about failing credit card charges (<$5k/mo). Our system stopped the service.. I'm pretty sure their subsequent outage cost their customers millions. Lessons about what it means to have (and be) enterprise customers were learned. Unfortunately the lady who was ignoring our e-mails in her inbox got fired.
33. throwaway290 ◴[] No.36812013[source]
You can/should get fired for picking any plan without proper support guarantees for something serious, regardless of provider.
replies(1): >>36812443 #
34. heartbreak ◴[] No.36812031{3}[source]
Unfortunately PR-speak exists for a reason.
replies(1): >>36813363 #
35. hug ◴[] No.36812086{3}[source]
Please don’t.
replies(1): >>36812402 #
36. ◴[] No.36812402{4}[source]
37. yard2010 ◴[] No.36812430{3}[source]
Please don't.
38. yard2010 ◴[] No.36812443{3}[source]
People here said they have specifically paid for a higher support tier and got no responses.
replies(1): >>36812663 #
39. throwaway290 ◴[] No.36812663{4}[source]
If you are on paid plan and generally followed proper procedures on picking suppliers then you have no reason to be worried about getting fired.
40. solarkraft ◴[] No.36812723[source]
Thanks for publicly responding to the criticism, that can't be taken for granted. I hope you'll manage to actually address them.
41. marcinzm ◴[] No.36812856[source]
Honest advice, probably to Kurt rather than you, is you need better processes, accountability and (probably) communication in your company. The tone of your reply (and other communications from fly.io) is reflective of the lack of those things given the public sentiment regarding fly.io. At 60+ employees and so many issues that tone goes from humanly endearing to indicative of a non-scaling business. Other replies indicate you don't want the things (process, oversight, etc.) that a growing B2B business needs to really succeed which is not a good sign. Sure there's a cost to that corporate-ness and you want to minimize that cost but it's also a necessary evil for the business you're in at the scale you're at.

If something breaks once it's an accident, if it breaks twice it's bad luck but if it breaks down three times it's broken processes. Based on the comment here things break at fly.io a lot more often than three times.

replies(2): >>36814258 #>>36820707 #
42. mewmew07 ◴[] No.36813358[source]
you have no idea wtf you writing about; it's been a few hours now and it's become clear that someone tagged the post as 'app-not-working, which made the post got 'private' and only available for logged-in users. it's also become apparent that the linked post in on a community forum for users without a support plan.

the dramatic tone and accusations in your reply are not warranted anymore

43. arrowsmith ◴[] No.36813363{4}[source]
But is it a good reason?
44. tptacek ◴[] No.36814258[source]
I'm just a person on Hacker News that happens to be at Fly.io; as I've said before, it's probably reasonable to think of me as an HN person first, and a Fly.io person second. My tone is my tone, and has been for the many years I've participated in this community. I got back from an evening out, saw that we were on the front page, poked around a little to find out what the hell was going on, and did my best to add some context. That's all.

If you're reading my comments on HN as some kind of official response from the company, you've misconstrued them.

replies(3): >>36814552 #>>36815698 #>>36816713 #
45. tptacek ◴[] No.36814292{3}[source]
My understanding is that it was causing support problems, because people were Googling for solutions to problems with their apps (because of the Heroku diaspora, we have a lot of first-time Docker users), finding old stale threads on our forum that looked related, and then reviving them.

I think we can just `noindex` the category instead of making it private?

replies(1): >>36820120 #
46. ctvo ◴[] No.36814436{3}[source]
> Try filing a bug with any of the big three cloud vendors when you're on their free plan.

A host being down for 3 days isn’t a bug. And you can contact AWS support, even on the free plan, and get a reply. Try it yourself. The great thing about AWS and the other cloud providers? If a host has issues they email all customers with workloads on it so you don’t need to refresh or check a forum.

I understand fly is a community darling. They’re unreliable, with poor support currently. Maybe the dev experience is great and that makes up for it, but pretending like everything else is equally shitty? Not true.

47. Aurornis ◴[] No.36814552{3}[source]
> If you're reading my comments on HN as some kind of official response from the company, you've misconstrued them.

For what it’s worth, this is the reason most companies eventually restrict their employees from making statements about the company; It doesn’t matter if you thought it was clear that is was unofficial, any statement from an employee in a position of power (such as someone with access to the control panel) will be perceived as a communication from the company.

You may have intended it to be a personal remark about your job, but there are a lot of people in this thread looking for any communication they can get about the company.

When you step in to fill that void as a person who appears to have access and power within the company, you are the official communication whether you intend to be or not.

replies(1): >>36814658 #
48. tptacek ◴[] No.36814658{4}[source]
Maybe I'll get restricted someday!
replies(1): >>36814947 #
49. urduntupu ◴[] No.36814947{5}[source]
For the sake of fly.io, you should either restrict yourself and not respond or, if you can't resist, make it crystal clear, that you DO NOT represent fly.io. Your first message can and will be misunderstood and it DOES throw a poor light on fly.io.

I am a paying customer of fly.io, on the Scale plan.

replies(1): >>36816090 #
50. fdsadsaf ◴[] No.36815698{3}[source]
TBH I thought you were replying as the CEO of fly.io since 1) I've seen them post here before, 2) I have no idea how big fly.io's staff is and 3) your post didn't otherwise describe who you were. It doesn't look like I was the only one to be confused.

If you had said "thoughts are my own; I just work there" or something I think it would have been more clear.

51. tptacek ◴[] No.36816090{6}[source]
Please feel free to reach out directly with your concerns. I'll certainly read any email you send me.
52. marcinzm ◴[] No.36816713{3}[source]
It seems you took my comment personally but it was about not just your comments but the overall tone of the fly.io communication (see recent blog post regarding funding) and approach to issues (three days of silence on a dead instance). You view processes and guidelines as chains versus as a ladder to help you climb a cliff. If the processes and communication was good then you'd know when you should self-restrict and when you shouldn't. You'd be empowered to make decisions within a framework that benefits fly.io the most versus being left to guess yourself. You'd understand why you should do that sometimes and why it's a better option for everyone.
replies(1): >>36816754 #
53. tptacek ◴[] No.36816754{4}[source]
I don't, but that's fine: it's not important that we understand each other all that clearly here, since all I'm talking about is how our public forum works.
54. yencabulator ◴[] No.36820120{4}[source]
So the tagged posts were intentionally hidden, then.
55. camgunz ◴[] No.36820707[source]
For an opposing viewpoint: I don't want HN to become the place where corporate comms comes to bullshit us. I want engineers who work there to talk to us as peers, which seems like what's happening here. I get candor and humility (and playfulness, sure) from Fly's tone, which I appreciate.

I get stuff like this is frustrating. But I bet Fly staff are pretty frustrated too.

56. gerhardlazu ◴[] No.36834726[source]
I really like the work that you're doing Thomas, this is the right approach. FWIW, https://fly.io/blog/carving-the-scheduler-out-of-our-orchest... is one of my favourite posts on your blog.

For everyone else reading this, we have been running https://changelog.com on Fly.io since April 2022. This is what our architecture currently looks like: https://github.com/thechangelog/changelog.com/blob/master/IN...

After 15 months & more than 100 million requests served by our Phoenix + PostgreSQL app running on Fly.io, I would be hard pressed to find a reason to complain. - Some deploys failed, and re-running the pipeline fixed it. - Early July 2023, 9k requests from Frankfurt returned 503s. Issue lasted 10 seconds. - While experimenting with machines, after many creations & deletions, one volume could not be deleted. Next day, the volume was gone.

That's about it after 15 months of running production workloads on Fly.io.

We mention about our Fly.io experience often in our Kaizen pod episodes, which we publish every ~2 months: https://changelog.com/topic/kaizen. For anyone curious, this is the episode in which we announced the migration: https://changelog.com/shipit/50. There is a detailed PR which goes with it: https://github.com/thechangelog/changelog.com/pull/407. We've been talking about our migration plan from apps v1 (Nomad) to apps v2 (flyd) recently: https://changelog.com/friends/2#transcript-138

I'm sorry to hear that many of you didn't have the best experience. I know that things will continue improving at Fly.io. My hope is that one day, all these hard times will make for great stories. This gives me hope: https://community.fly.io/t/reliability-its-not-great/11253

Keep improving.