Reliability: It’s not great

(community.fly.io)

1226 points bishopsmother | 2 comments | 06 Mar 23 17:47 UTC | HN request time: 0.445s | source

Show context

throwawaaarrgh ◴[07 Mar 23 04:27 UTC] No.35051550[source]▶

I've been doing reliability stuff for near two decades. The one thing I am sure of is there is no way to just engineer your way to reliability. That is to say, no person, no matter how smart, can just invent some whizbang engineering thing and suddenly you have reliability.

Reliability is a thing that grows, like a plant. You start out with a new system or piece of software. It's fragile, small, weak. It is threatened by competing things and literal bugs and weather and the soil it's grown in and more. It needs constant care. Over time it grows stronger, and can eventually fend for itself pretty well. Sometimes you get lucky and it just grows fine by itself. And sometimes 50 different things conspire to kill it. But you have to be there monitoring it, finding the problems, learning how to prevent them. Every garden is a little different.

It doesn't matter what a company like Fly does technology wise. It takes time and care and churning. Eventually they will be reliable. But the initial process takes a while. And every new piece of tech they throw in is another plant in the garden.

So the good news is, they can become really reliable. But the bad news is, it doesn't come fast, and the more new plants they put in the ground, the more concerns there are to address before the garden is self sustaining.

replies(7): >>35051647 #>>35052736 #>>35052993 #>>35053029 #>>35053323 #>>35056046 #>>35056972 #

TheDong ◴[07 Mar 23 09:20 UTC] No.35053323[source]▶

>>35051550 #

> The one thing I am sure of is there is no way to just engineer your way to reliability. That is to say, no person, no matter how smart, can just invent some whizbang engineering thing and suddenly you have reliability.

It's seems true for fly's problem space, but in many problem spaces there really are easy engineering solutions to reliability problems.

For a very easy example, I once worked on a rails app that crashed frequently and managed 5 req/s at best. It turns out the app only loaded static data from hardcoded json files on disk and templated that into stuff. In other words, it was a static site. Replacing it with an actual static site + nginx and a cdn instantly fixed all reliability issues for that website forever, and made it easier to maintain the content to boot.

replies(1): >>35053748 #

1. machinawhite ◴[07 Mar 23 10:28 UTC] No.35053748[source]▶

>>35053323 #

I'm actually surprised such a simple app would have such bad performance and crash at all?

replies(1): >>35063977 #

2. TheDong ◴[08 Mar 23 01:28 UTC] No.35063977[source]▶

>>35053748 (TP) #

I don't think the fact that it did effectively:

    data_1 = `cat ./data1.json | grep "city" | awk ....`
    data_2 = `cat ./data2.json | grep "city" | awk ....`

was exactly helping it to perform well. I'm sure rewriting the rails app to load all the data at startup, not to read each file via several hundred subshells on each request, would have made it perform well enough.

However, pretty much no matter how well or poorly the rails site is built, a static site will be easier to run reliably.

↑