Every time I’ve tried Fly (trust me I’ve wanted to love it), there’s always a rough edge or the service breaks for me.
First time I tried it, the web panel wasn’t even loading. Second time, months later, everything was 500ing and I couldn’t find a way to SFTP into a disk (!!!). Total dealbreaker.
This was easily done in Render.com with an even more magical experience. Deploy from a GitHub repo and I was live in minutes. Upload the files from local and done.
I want to love Fly so much. I align with their mission. I love their first class Elixir support. But so far I’m not impressed.
It looks to me like Render is seriously taking the PaaS crown at the moment, with innovation after innovation, affordable pricing and excellent user experience.
I needed to move the Ghost assets directory for all the posts.
This was a ten minute thing in Render. SOL in Fly.
Until another service has all of this, we’re sticking with Heroku.
It's not really an excuse, just a reason it's taking longer to solve than we'd like.
The errors on the web UI sucked. These have improved drastically in the last three months (because we have smart, dedicated people working on fullstack for us now).
Will keep an eye on the changelogs to see when I can test deploy my apps.
To me, flyctl deploy is perhaps even better, because it is VCS agnostic and integrates with existing pipelines. I think it is fully worth it.
Fly.io seems cutting-edge but I feel I would not profit from their multi-region, close to the user infrastructure. So what are their tradeoffs? Render.com appears more complete (?) and cheaper. But they don't have the same elegant database backups or the pipeline with review apps.
I hear complaints about chat distractions and see engineers create those distractions. I’m at a loss why we want to do that to ourselves?
Nevermind it’s one more pipeline for messages to lost in. It’s needless complexity and configuration too.
edit: As a note, I have not tried Render. I am sure it is fine too. I found Fly first and it satisfies my needs well enough that I don’t feel it is necessary to keep searching, though I wouldn’t mind checking it out just to see what it has to offer.
One thing for me - fly.io is cool, does a lot of cool fancy things. However, the basic PaaS stack from Heroku gets a bit lost / not always fully there for me. For a while they didn't really talk about their container deploy story (AWS App Runner / AWS Fargate, AWS ECS). Digging in, it's all doable, but they do so many cool things it sometimes isn't obvious to me how the basics go. That's changed in last year (at least looking at the docs)? I also had some bobbles on UI in past.
I don't need multi-region in my case, but I do want low latency for example. Easy to find this on aws (https://www.cloudping.info/) - a bit harder to get all regions with a ping endpoint on fly.io - I went looking, they have a demo app you can I think find what they see as closest region, but I wanted to roll my own cloudping.info style approach and it wasn't obvious how to do so. I'm getting about 8ms from my home computer to my closest AWS region.
The basic story need for me is github commit -> goes to dev for review -> promote to production? I happen to use containers now (even if inefficient) because getting stuff "working" often is easier -> if it's working on the container on my machine, it'll work up in the cloud.
That said, there is definitely I think a crop of AWS / Heroku competitors that are going to start to be competitive (think cloudflare on primitives, fly.io and render etc on PaaS style).
Sounds like mission accomplished; the elusive 1.0.
We (Fly.io) intentionally didn't build a pipeline replacement. We exist for one reason: to run apps close to users. We're just now to the size where we can do that well. It'll be a while before we can get good at a second thing. Heroku shipped them something like 8 years after they launched.
At the same time, GitHub actions and Buildkite are _very_ good. They're less opinionated than Heroku Pipelines, but I don't regret figuring out how to use them for my own projects.
I think there's a chance that emulating Heroku that closely is a path to failure in the 2020s.
If I could sum it up, it would be that the dev ux needs a lot of work, and it seems like they are mostly focused on the fundamentals of the platform first.
Following their guide you get postgres not spinning up and linking to your app correctly and you have to nuke your entire app.
The billing UI is weird and feels cobbled together.
I don't feel secure using Fly right now. But again, they are doing cutting edge shit and are probably focused on the underlying bedrock of the platform. They can always circle back to polish the dev ux.
Right now we're on Render.com and it does absolutely everything we want with wonderful dev ux.
In my mind it's a race: can fly catch up to render's UX before render can catch up to flys global mesh load balancing? We'll see who wins.
very satisfied so far and would definitely deploy there next time!
the killer feature i like the most is automatic prometheus metrics collection
one thing i don't really like about fly.io is the fact that they charge money for free Let's Encrypt SSL certs
1. Multi-region deployment only work if your database is globally distributed too. However, making your database globally distributed creates a set of new problems, most of which take time away from your core business.
2. File persistence is fine but not typically necessary. S3 works just fine.
It's easy to forget that most companies are a handful of people or just solo devs. At the same time, most money comes from the enterprise, so products that reach sufficient traction tend to shift their focus to serving the needs of these larger clients.
I'm really glad Heroku froze when it did. Markets always demand growth at all costs, and I find it incredibly refreshing that Heroku ended up staying in its lane. IMO it was and remains the best PaaS for indie devs and small teams.
On Slack and other chat solutions, it's possible to set up a #operations style single-pane-of-glass channel with deploy notifications, error alerting, and customer communication platforms all pushing to the same place. If an incident occurs, engineers, product, support, etc. can all collaborate around what's going on in real-time without needing to ask "hey has someone updated the trust site yet?" or click into 10 different tabs.
It's honestly pretty good when it works well and really bad and noisy when it doesn't, but it has a place.
Non-engineering are usually in Slack too, which really helps when support, product, or the field need quick answers to easy questions like "has this commit been deployed yet today?"
I hate services that don't put a price on things like bandwidth (because there's always a price!). So we priced bandwidth and made it transparent. You can put an app on Fly.io and server petabytes of data every month, if you want. We'll never complain that you're serving the wrong content type.
But the reality is – having an unlimited bandwidth promise is perfect for for a fire and forget blog site. We're not doing ourselves any favors with scary pricing for that kind of app.
Guess what? fly.io offers a turnkey distributed/replicated Postgres for just this reason. You use an HTTP header to route writes to the region hosting your primary.
https://fly.io/docs/getting-started/multi-region-databases/
You do still need to consider the possibility of read replicas being behind the primary when designing your application. If your design considers that from day 1, I think it takes less away from solving your business problems.
Alternatively, you can also just ignore all the multi-region stuff and deploy to one place, as if it was old-school Heroku :-)
if something goes crazy and you end up using a wild amount of outbound data, it looks like the next jump up is only to $12
[0] - https://docs.celeryq.dev/en/stable/userguide/periodic-tasks....
Doesn't this take away a lot of the benefits of global distribution?
For example if you pay Fly hundreds of dollars a month to distribute your small app in a few datacenters around the globe but your primary DB is in California then everyone from the EU is going to have about 150-200ms round trip latency every time you write to your DB because you can't get around the limitations of the speed of light.
Now we're back to non-distributed latency times every time you want to write to the DB which is quite often in a lot of types of apps. If you want to cache mostly static read-only pages at the CDN level you can do this with a number of services.
Fly has about 20 datacenters, hosting a small'ish web app that's distributed across them will be over $200 / month without counting extra storage or bandwidth just for the web app portion. Their pg pricing isn't clear but a fairly small cluster is $33.40 / month for 2GB of memory and 40GB of storage. Based on their pricing page it sounds like that's the cost for 1 datacenter, so if you wanted read-replicas in a bunch of other places it adds up. Before you know it you might be at $500 / month to host something that will have similar latency on DB writes as a $20 / month DigitalOcean server that you self manage, Fly also charges you $2 / month per Let's Encrypt wildcard cert where as that's free from Let's Encrypt directly.
A feature that could help would be giving people the option to set a cost limit, where if their site surpasses that limit in a given month you just pull it offline instead of charging more money. That's what I'd want for my blog site, and I've heard others request such a feature from other cloud providers
You'd still be sending writes to a single region (leader). If the leader is located across the world from the request's origin, there will be a significant latency. Not to mention you need to wait for that write to replicate across the world before it becomes generally available.
So, yes, some users will have have to pay for certificates, but it seems extremely reasonable to me.
Cloudflare shouldn't restrict media (video, images, and audio) from its unlimited bandwidth promise for Workers and R2 (though, ToS doesn't yet reflect that).
https://news.ycombinator.com/item?id=28682885
> But the reality is – having an unlimited bandwidth promise is perfect for for a fire and forget blog site
I think, an auto flyctl pause -a <myapp> when myapp exceeds $x in charges (with an auto-resume when the billing rolls over) may serve as a viable interim solution. May be this is already possible with fly.io's graphql api?
Things like alerts are fine, professionally, but not for things like running a small app, blog or whatever, that you’re not sure where is heading.
I don’t think anything I’ve build on my own time has ever ended up breaking my bank, but signing up my credit card is a risk I’m never going to take, and I’m fairly certain I’m not alone in that. Of course I have no idea if there are enough of us to make small scale fixed prices products profitable at scale.
HTTP requests that write to the DB are basically the same speed as "Heroku, but in one place". If you're building infrastructure for all the full stack devs you can target, this is a good way to do it.
Distributing write heavy work loads is an application architecture problem. You can do it with something like CockroachDB, but you have to model your data specifically to solve that problem. We have maybe 5 customers who've made that leap.
In our experience, people get a huge boost from read replicas without needing to change their app (or learn to model data for geo-distribution).
> I think there's a chance that emulating Heroku that closely is a path to failure in the 2020s.
I'm not sure I agree, considering that a different platform emulating this exact setup with ~zero configuration is basically everything we want! GitHub actions is (I agree) really great and very versatile, but I'll take Heroku's UI over digging through actions plugin documentation for hours any day.
Also any app that has global clients and terminates ssl likely benefits from edge compute.
More than a bit.
Simply give people the option to put a charge limit and let the app be offline when that limit gets hit. Don't make it the default, but do allow people to do it.
This would resolve 99% of the fear people have. And most people wouldn't set the limit anyway. However, your knowledgable people might set it, and those are the ones you're most trying to attract.
No one took us up on it. What we found is that the majority of people want their stuff to stay up, and the right UX for "shut it down so you don't get billed" is not obvious.
We ended up implementing prepayment instead. If you sign up and buy $25 in credit, we'll just suspend your apps when the credit runs out.
Bandwidth is weird because we have to pay for it (as does every other provider). We aren't yet in a position where we can just make it free without limits. Maybe next year. :)
And why would you get 20 instances, all around the world right out of the gate? 6-7 probably do the job quite well, but maybe you don’t even need that many. Depending on where most of your customers are, you could get good results with 3-4 for most users.
We've just now grown large enough to have people focus full time on the in browser UX. If you feel like fiddling around again in a few months, let me know and I can hook you up with some credits. :)
The billing UI is definitely cobbled together. This is because I built it over a weekend and it's marginally better than "no billing UI". I have learned that if I'm building features, they probably aren't gonna be very good.
It's understandable if your usage data showed the fee-capping feature just wasn't popular enough to be worth maintaining, though that would surprise me based on this thread (but possibly HN just isn't representative of the whole market)
I think that Render solved "Cron as a Service" beautifully:
Thanks, can you give an example of how that works? Did you write your own fork of Postgres or are you using a third party solution like BDR?
Also do you have a few use cases where you'd want writes being dependent on another write?
> 6-7 probably do the job quite well
You could, let's call it 5.
For a 2gb set up would that be about $50 for the web app, $50 for the background workers, $160ish for postgres and then $50 for Redis? We're still at $300+?
I was thinking maybe 5 background workers wasn't necessary but frameworks like Rails will put a bunch of things through a background worker where you would want low latency even if they're happening in the background because it's not only things like sending an email where it doesn't matter if it's delayed for 2 seconds behind the scenes. It's performing various Hotwire Turbo actions which render templates and modify records where you'd want to see those things reflected in the web UI as soon as possible.
In order to increase transparency on Hacker News, it would be nice it the title was changed to include the fact that's it's backed by YCombinator
https://www.ycombinator.com/companies/fly-io
--
I personally don't think it's better than Heroku, you have much less features, Heroku is much cheaper + they have an unbeatable free tier
* Render.com charges $0.60 per custom domain after the first 25
* Heroku gives you "free" custom certificates once you're on a $7/mo minimum.
Review apps on Render are called Preview Environments: https://render.com/docs/preview-environments
For larger teams having a well defined API that delineates applications from infrastructure that doesn't require extreme specialist knowledge (it still requires some specialist knowledge but vastly less than direct manipulation of resources via something like Terraform) is a massive productivity boost.
Of course none of that matters if you have 4 developers like OP but for folks like myself that routinely end up at places with 300+ engineers then it's a huge deal.
I think if fly succeeds, they need to figure out edge IaaS, and not put all their eggs into edge PaaS. And I hope they do! I'm curious what a successful edge IaaS looks like!
I have the same complaint all the way down to simple sysadmin tasks. Ex: MS365 has a lot of churn on features and changes. It’s like they think everyone has a team of admins for it when in reality a lot of small businesses would be satisfied with a simple, email only product they can manage without help.
I thought Cedar was going to fall over years ago but ironically I think people migrating off the platform are helping it stay alive.
I guess that's acceptable because people don't really look for the feedback; why do users add the same thing to the list twice, why does everyone hit the refresh button after adding an item to the list, etc. It's because the bug happens after the user is committed to using your service (contract, cost of switching too high; so you don't see adding the cache layer correspond to "churn"), and that it's annoying but not annoying enough to file a support ticket (so you don't see adding the cache layer correspond to increased support burden).
All I can say is, be careful. I wouldn't annoy my users to save a small amount of money. That the industry as a whole is oblivious to quality doesn't mean that it's okay for you to be oblivious about quality.
(Corollary: relaxing the transactional isolation level on your database to increase performance is very hard to reason about it. Do some tests and your eyes will pop out of your head.)
Trouble is that Erlang ran all the important Cedar code (it might still today) and the Erlang engineers didn't particularly like the news that Erlang code was essentially deprecated so they left and nobody knew how to maintain the stack. This definitely wasn't the only problem we had but it was a big one.
What do fellow Herokai think? Was Dogwood a fool's errand? Or did we just not get enough staff to build it properly?
I think this is the first time I've heard somebody say one of the benefits of kubernetes was productivity.
In about 15 minutes I was able to take my site from localhost to a custom domain with SSL with just a little more than a git push. I can't think of many solutions that are simpler than that.
Right now your choices are: run a database in on region and:
1. Use the weird HTTP header based cache API with a boring CDN
2. Write a second, JS based app with Workers or Deno Deploy that can do more sophisticated data caching
3. Just put your database close to users. You can use us for this, or you can use something like Cloud Flare Workers and their databases.
My hot take is: if something like Fly.io had existed in 1998, most developers wouldn't bother with a CDN.
Weirdly, most Heroku developers already don't bother with a CDN. It's an extra layer that's not always worth it.
Do they just decide to not profit from bandwidth or are they doing something special that allows them to be so cheap?
In either case I think the price still doubles because both your web app and worker need memory for a bunch of common set ups like Rails + Sidekiq, Flask / Django + Celery, etc..
I'm so glad you pointed this out. Cloud-native development is an important factor in newly architected systems. Defaulting to an S3 API for persistent I/O brings loads of benefits over using traditional file I/O, and brings significant new design considerations. Until a majority of software developers learn exactly how and why to use these new designs, we'll be stuck with outmoded platforms catering to old designs.
Hope that helps!
On top of that, most replication systems are brittle and create logistical and administrative headaches. If you can get by with just rsync, do.
Was there a reason for not using something similar to kata containers where you run a microvm but still use containers inside them? It seems like it would make such things easier while getting the isolation of a VM.
On a single server VPS I'd use Docker Compose and up the project to run multiple containers.
On a multi-server set up I'd use Kubernetes and set up a deployment for each long running container.
On Heroku I'd use a Procfile to spin up web / workers as needed.
The Fly docs say if you have 1 Docker image you need to run an init system in the Docker image and manage that in your image, it also suggests not using 2 processes in 1 VM and recommends spinning up 1 VM per process.
I suppose I was looking for an easy solution to run multiple processes in 1 VM (in this case multiple Docker containers). The other 3 solutions are IMO easy because once you learn how they work you depend on the happy path of those tools using the built in mechanisms they support. In the Fly case, not even the docs cover how to do it other than rolling your own init system in Docker.
If you have root, can I run docker-compose up in a Fly VM? Will it respect things like graceful timeouts out of the box? Does it support everything Docker Compose supports in the context of that single VM?
Their app platforms' bandwidth pricing is pretty painful at 0.10$/GB. With these prices and considering the app platform lacks functionality like multi-regional droplets or VPC integration, they are a subpar choice even compared to Firebase or Amplify.
Disclaimer: I've never actually used it myself. That's mostly just what I've read and heard from people who use kubernetes.
1. Put servers where bandwidth is cheap (not Sydney, for example)
2. Constrain throughput per server
3. Buy from cheap transit providers like Cogent
Hetzner does all three. Bandwidth in the US/EU is very cheap. They meter total throughput on their services. And they use cheap providers. None of these are bad choices, just different than ours.
Our product has multiple layers, too. When you connect to a Fly app, you hit our edge, then traffic goes to a VM that's probably in another region. When you hit a hetzner server, there are no intermediate hops.
We usually pay that three times as data moves from customer VMs to our edges to end users (out from our edge to worker vm, out from worker vm to our edge, out from our edge to end user). Or 10x, in some cases, if data moves from Virginia to Chennai to an end user.
We pay $0.005/GB in the US and $0.9/GB in Chennai. You can see how this might add up. :)
There's no reason I can see why you couldn't run a VM that itself ran Docker, and have docker-compose run at startup. I wouldn't recommend it? It's kind of a lot of mechanism for a simple problem. I'd just use a process supervisor instead. But you could do it, and maybe I'm wrong and docker-compose is good for this.
What you can't do is use docker-compose to boot up a bunch of different containers in different VMs on Fly.io.
Also, I think if we start doing that to blog titles, people will complain about the opposite sort of shenanigan.
#!/usr/bin/env bash
/app/server &
/app/server -bar &
wait -f -n -p app ; rc=$?
printf "%s: Application '%s' exited: status '%i'\n" "$0" "$app" "$rc"
exit $rc
Most people run workers in their primary region with the writable DB, then distribute their web/DB read replicas.
Currently considering switching from Heroku, but fixed pricing is a must. I‘d rather they shut down my apps temporality in case something is out of control, then get broke ;-)
Any other recommendations besides fly.io?
Heroku made it easier to deploy, but now it feels a tad bit bit more frictional than other services, including fly.io and these mentioned above. It is probably a bit outdated in that regard.
You have enumerated a lot of alternatives so far (prepaying, waiving attack costs) but you still haven't addressed the number one scenario that everybody has been asking about, and which was the reason why Heroku was such a hit: do you offer a flat fee which, if exceeded, simply shuts down the app until the next billing cycle?
We've switched to Discord (because reasons. Slack is much better for work though) and I rewrote a bunch of Slack hooks to get Dicord notifications.
> I hear complaints about chat distractions and see engineers create those distractions.
People will complain no matter what.
So it ended up hardly making sense to deploy anything less than like 80 racks per site, at which point it's basically a small region minus a few small pieces.
Then there's just the risk that the people who wanted whatever special GPU or SSD combination would quit wanting them and they'd just sit there unused indefinitely after that. Or stockouts when demand rose due to a conference or whatever that would tarnish the brand. And of course nobody wanted to pay more than like 10% markup. They were more amenable to long term contracts though. It was just hard to figure out the right use case and make it profitable.
Seemed like what customers really wanted out of them were nearby replacements for pieces of their own datacenter. It was exactly the opposite direction of where I was hoping things would go, which was something between fly and cloudflare workers. Not sure what they're doing now; I left about 18 months ago.
It's a platform as a service. It'll take your docker application (usually a web application) and run it on the internet. It'll also allow you to run it in many locations around the world transparently (using the same IP address / anycast) so that users closer to those locations will get better/lower latency.
It's a CDN for your application.
(There's more to it than this, but as far as elevator pitches go, sufficient?)
I have used multi-region for every production database I've deployed in the last ~8 years, and it took < 10 seconds of extra time. It's a core feature of services like RDS on AWS.
There is a benefit if you're multi-region (but not global) because individual regions go down all the time.
It costs more every month, but if you have a B2B business, it's worth the extra cost.
For a toy app, 10k db rows (across all tables) from Heroku was enough to get the app running and have a public URL to share, and I miss those days.
I'm working on a fresh Rails7 toy app to try out some new features, and my current thinking is to use sqlite, and add an initializer early in the stack to migrate+seed the db if it's missing. If it's ephemeral that's fine, I just want some baseline data to interact with the app at a distance beyond localhost.
I am not an ops guy.
Um, how do I find this out? Preferably historical usage.
And, like I said earlier, I hope to see what a real edge IaaS solution looks like too, if such a thing is even possible. Maybe the IaaS that would allow a build-your-own-CDN.
Multi-region databases with read replicas face the same issue
1. https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_...
Fargate with auto-scaling plus Aurora for the DB is pretty great in terms of a nearly zero-maintenance setup that can handle just about any type of service up to massive scale.
Unfortunately getting all the networking and IAM stuff right is still pretty thorny if you want to do things the correct way and not just run it all in public subnets. And the baseline costs are significantly higher than a paas.
I apologize for being skeptical but the wording in the contract seems to be extremely handwavy and they don't give me any confidence that if my bill goes from $4 to $200 that month because I made it to the front page of reddit, my bandwidth will magically be waived.
Right now, I feel 100% sure that my credit card would get charged.
Obviously, you would need to manage servers,colo,network and keep it up on your own, or pay for it. And cloud providers offer alot of value as well. but if you are at the right scale you can diy it (in-house ops/network team) you can save ALOT of money.
For instance I've seen PHP -> nodejs transition while moving to micrservices, and while the ideas made sense on paper:
- It didn't come from the engineers at large. Most weren't phased by the prospect and the main engine for the change was the top architects and the engineering management.
- target architecture and language were very popular, and easy to hire. Incidentally salaries would be cheaper for same level of experience engineers.
Predictidbly a ton of the existing engineers left and new blood came in, and it was mostly according to plan from what I gathered. Some of the "old folks" stayed at a premium once they were the only ones left with enough knowledge, but they got relagated in side "experts" roles and the product as a whole saw a big change in philosophy (I think mentally what was seen as the "core" also became "legacy" in everyone's mind as engineers moved away)
And for customers, it's far easier to negotiate billing disputes then to try and recover from an account deletion because of spending caps (and there have been plenty of examples of companies shutting down because of such a mistake here).
Of course, believe what you want. I am only words on a screen. I can't tell you want to think about things.
With Heroku it is stupidly simple to get things going, no credit cards, "fire and forget". Works great for hobby projects, and examples you want to show.
> we'll just suspend your apps when the credit runs out.
This sounds great! I've looked at Fly.io before but didn't realise this was a thing so didn't go past looking. I'll definitely give Fly.io a test run now. :)
I ended up swallowing the bill but will not be using them again since this is plain scary.
Edit: funny thing - to add insult to injury, I've started to hear from their sales people on "my growth plans", even though I've had a support ticket hoping to resolve this to no avail.
- Argent Aspen
- Badious Bamboo
- Celadon Cedar
I can’t remember what to color name for the Dogwood stack was meant to be, we mostly only ever referred to them by the tree name and dropped the color. Suffice to say that Dogwood was meant to be the 4th major evolution of what Heroku was.
If I can't have that, then I just can't take the risk of my site getting hammered (by attackers, getting linked on HN, whatever) and racking up some kind of bill I can't pay. I realize the chances of that are small but that's scary.
I realize that, obviously, "stay up at all costs" is what a business needs and that's where the money is. Fly.io won't get rich off of personal projects. But, I do think they serve to get developers onto the platform.
Your bandwidth allocations and rates seem very fair and generous, btw. I do really want to check out your platform.
The only things I can see missing are automated Redis hosting by the platform.
There have been so many times I wanted some simple key value store which I do not have to bother about setting up and taking care of. Something like "ambient Redis". It's OK not to have crazy scaling promises. You just enable an API (maybe for a small fee) and just use it.If and when you get big enough you switch to a setup you bother about setting up and taking care of.
Am I making sense to anyone?
If anything, I'd prefer they moved their PaaS more towards serverless and managed offerings than towards IaaS.
Yes, it’s automation you have to own, but it changed very little once it was done and provided phenomenal value. I wouldn’t let those features hold you back from exploring other options.
While not as "Redis-y", there are some decent KV-ish managed databases out there:
- Firestore - Cloudflare Workers KV - DynamoDB - Bigtable
https://community.fly.io/t/can-we-get-an-update-on-managed-d...
My money is on Supabase building for Fly as a target / default IaaS, and that's as much close to managed services we are going to get, given Fly's insistence that they're not really good at (or want to) building (and maintaining) managed services.
We launched it on Heroku using Nodejs and sent billions of requests a day to it. The thing ran on like a few dozen dynos, but it was responsible for like 20-30% of all of Heroku's bandwidth. Fantastic value for us. Immediate headache for Heroku.
I just noticed I formulated it wrong, my apologies. What I meant is that the replicating regions don’t need to wait for the primary writes to go through before they respond to clients. They will still be read-only Postgres replicas, and info could be shuttled to primary in a fire-and-forget manner, if that’s an option.
Whenever an instance notices that it‘s not primary, but it is currently dealing with a critical write, it can refuse to handle the request, and return a 409 with the fly-replay header that specifies the primary region. Their infra will replay the original request in the specified region.
> Did you write your own fork of Postgres or are you using a third party solution like BDR?
When using fly.io, the best option would probably be to use their postgres cluster service which supports read-only replicas (can take a few seconds for updates to reach replicas): https://fly.io/docs/getting-started/multi-region-databases/
> For a 2gb set up would that be about $50 for the web app, $50 for the background workers, $160ish for postgres and then $50 for Redis? We're still at $300+?
Maybe. A few thoughts:
- Why would you need 5 web workers, would one running on primary not be ideal? If you need so much compute for background work, then that’s not fly‘s fault, I guess.
- Not sure the Postgres read replicas would need to be as powerful as primary
- Crazy idea: Use SQLite (replicated with Litestream) instead of Redis and save 50 bucks
#!/usr/bin/env bash
set -euo pipefail
QUERY=$(cat <<EOF
{
"variables":{
"app":"$1",
"start":"$(date +%Y-%m)",
"end":"$(date -d "$(date +%Y%m01) +1 month -1 day" +%Y-%m-%d)"
},
"query":"query (\$app: String!, \$start: ISO8601DateTime!, \$end: ISO8601DateTime!) { app(name: \$app) { organization { billables(startDate: \$start, endDate: \$end) { edges { node { app { id } category quantity } } } } } } "
}
EOF
)
curl https://api.fly.io/graphql \
-H "Authorization: Bearer $(fly auth token)" \
-H 'content-type: application/json' \
--compressed \
-X POST \
--data-raw "$QUERY" \
--silent |
jq '[.data.app.organization.billables.edges[].node | select(.app.id == "'"$1"'" and .category == "data_out") | .quantity] | add'
Service providers come to the opposite problem. Local infra sucks, the market is full of incumbents, and india is generally protective of those markets.
Very similar dynamics with china as well.
Render and others are interesting but K8S is still fundamentally better considering all the DX progress there making it pretty easy to get a container running in a cluster.
Of course, this quickly stops working once your small projects grow to have multiple collaborators, a staging environment, etc. - but at that point you're running a proper business
I feel like this setup might make quite a lot of sense if you have a bunch of micro services that are small enough that they can share resources.
So I created a Postgres Key Value Ruby library. https://github.com/berkes/postgres_key_value
I see this over and over again where the first step of using something supposedly user friendly is to install a lot of command line cruft on your snow flake laptop. It only goes downhill from there. Great for amateurs but usually left as an exercise for the reader to beat some sense into bad practices like deploying things from a laptop.
IMHO it should start from the other end and actually provide you with a sane CI/CD pipeline out of the box. Google cloud run does this right. You just give it a repository url and it sets up the rest. I used that two years ago and I was up and running in about 2 minutes. Every commit to master went live. CI/CD out of the box. No magical incantations on the command-line. Just develop as you do normally. Zero fuss.
But as it is often mentioend, as companies and apps grow, so do their requirements and the need for flexibility. I can't think of a more flexible deployment target that handles a lot of the PaaS concerns than Kubernetes, warts and all.
Fly is surprisingly great, volumes + containers is very close to universal for a PaaS, but it certainly can't cover everything that is running in my workplace's kubernetes.
Essentially it _is_ free.
K8S offers the same primitives and more, with progressive complexity as you need it. You can deploy a single container with a 1-line command or an entire PaaS subsystem. It's also more portable than any single platform and you can run it on a few VPS instances or your own bare metal. I've also found it far more reliable than all these PaaS services that have their own issues with no visibility.
K8S experience is also more valuable and useful in future projects and there are plenty of nice DX tooling to make deploys easy if that's the blocker.
Anything with production usage and pay-as-you-go pricing means data at rest still costs money - and requires deleting to avoid accruing new charges. Do you want your databases and volumes and object storage deleted when your app stops?
And if this was offered then there would be a whole new class of mistakes leading to lost data. Like I said, billing is easier to negotiate than deleted accounts.
It's common to separate them due to either language limitations or to let you individually scale your workers vs your web apps since in a lot of cases you might be doing a lot of computationally intensive work in the workers and need more of them vs your web apps. Not just more in number of replicas but potentially a different class of compute resources too. Your wep apps might be humming along with a consistent memory / CPU usage but your workers might need double or triple the memory and better cpus.
"I want to cap the amount of data I store to 50GB and the amount of traffic I serve to 1000GB"
Of course there is an obvious problem with this, the pricing structure would become transparent and you don't want that as a cloud provider. You want your customer to just pay his bills and not even know why it costs this much.
Using the rounding error logic, how do you feel about companies adding $1.99 "convenience fees" or "administrative fees"?
To be clear a bandwidth limit would be awesome! And the pricing may not be for everyone. However there is a large amount of leeway as evidenced by the community forum posts.
That is to say, for most of what fly gives you there is no K8s equivalent.
The exodus
I don’t understand why cloud providers will not accommodate this basic “prepay to X and allow me to use the credit” model.
1. Fairly low traffic (requests per minute not requests per second except very occasional bursts)
2. Has somewhat prematurely been split into 6 microservices (used to be 10, but I've managed to rein that back a bit!). Which means despite running on the smallest instances available we are rather over-provisioned. We could likely move up one instances size and run absolutely everything on the one machine rather than having 12 separate instances!
3. Is for the most part only really using queue-tasks to keep request latency low.
Probably what would make most sense for us is to merge back in to a monolith, but continue to run web and worker processes separately I guess. But in general, I there is maybe a niche for running both together for apps with very small resource requirements.
A lot of people try to get hobby users on a platform as a form of PR -- if you don't have spending caps, you are probably going to scare a lot of them away.
And I've never heard a company shutting down because they exceeded a limit -- I've only heard of people being surprised by unintended extremely high bills.
And storage costs are simple to predict -- as soon as you see the cap would be exceeded, stop accepting new data.
Kubernetes does have federation/multi-cluster abilities, and global routing/load balancing is available from every CDN and cloud. It's more work to setup but not by that much.
need a garbage collected native compiled language?/
- ocaml fits the bill and provides better type checking
need a fast as possible system?
- rust is faster and has much better abstraction for type checking. unless you're writing a throwaway or a very short lambda function, rust is almost always a better choice here as handling errors and maintaining the code is going to be more important overtime and go is just now getting its generics story straight
need a networking application?
- elixir (and erlang) do this so much better. it has 30+ years of high reliability networking built in and its about as fast as go. additionally, fault tolerance and error handling is so much better. I have real parallelism out of the box and async primitives that make goroutines look like a joke.
additionally, all 3 (ocaml, rust and elixir) give you proper tools for handling error branches. go downgrades you back to c style which works but means your code is going to evolve into a god damn mess as there's no way to separate your error path from your happy path
Literally the only place I see go making sense are small scripts that need to be written asap and wont' need much long term maintenance. for everything else, go seems woefully inadequate.
They actually do:
> You can configure fly.io apps with a max monthly budget, we'll suspend them when they hit that budget, and then re-enable them at the beginning of the next month.
From their Launch HN: https://news.ycombinator.com/item?id=22616857
It's not ideal due to some frameworks using background jobs to handle pushing events through to your web UI, such as broadcasting changes over websockets with Hotwire Turbo.
The UI would update when that job completes and if you only have 1 worker then it's back to waiting 100-350ms to reach the primary worker to see UI changes based on your location which loses the appeal of global distribution. You might as well consider running everything on 1 DigitalOcean server for 15x less at this point and bypass the idea of global distribution if your goal was to reduce latency for your visitors.
> Crazy idea: Use SQLite (replicated with Litestream) instead of Redis and save 50 bucks
A number of web frameworks let you use Redis as a session, cache and job queue back-end with no alternatives (or having to make pretty big compromises to use a SQL DB as an alternative). Also, Rails depends on Redis for Action Cable, swapping that for SQLite isn't an option.
Out of the box from day one, Go is great at writing HTTP services including proxies, which is a large part of what Heroku needed. Ocaml is harder to use and not a popular choice for such things. Go has easy to follow docs and tonnes of useful contemporary libraries. Go is especially easy to pick up for anyone with older C experience.
I've found Go excellent for the long term, when you come back to something that hasn't been touched for years have passed, it compiles and run quickly and easily. I wouldn't have thought it was any good until I actually used it for something.
Also, concurrency in Go is braindead easy, there are multiple choices of "worker pool" libraries and queue/messaging choices. You don't even have to know about channels to do work across cores.
EDIT: Having said that, if you already had Erlang and an experienced team, you wouldn't ditch that for Go. Why do companies do this? Is there some famous historical case where keeping and growing a highly experienced team has backfired?
This kind of pacing and billing buffer is an immense amount of complexity at scale for very little benefit (even if an individual user might like it).
SaaS companies manage to pull of ridiculously complicated things, but coming up with a billing scheme that does fuck over the customer is asking too much?
The simple truth is that usage based pricing is designed to be unpredictable, and surprising customers with high bills is probably considered a feature, not a bug.
You don't get promoted until you do the new shiny thing. This seems to be especially prominent in Google and the likes. Big companies do not give out bonuses/promotions for those who quietly sit and work with existing, proven stack, maintaining it and fixing bugs. Shiny new things get noticed.
It might also be caused by staffing issues? Erlang/Elixir to this day are pretty niche stacks. It's much easier to find Go developers(or developers willing to switch to Go), than Erlang.
And hard-to-debug deploy-time (and sometimes runtime / uptime) issues are a major sticking point. It is hard to know what or who's at fault without asking for it in the forums, since you can't stackoverflow much.
Their support (granted it's free) is a bit of a hit and a miss; while it isn't clear what really is on their roadmap (or important) and what isn't (even though, they are more than transparent than other providers I've interacted with).
These are but pains that come with adopting a nascent ecosystem, I suppose. I still persist with Fly because it is still simpler to build certain apps on it than on any other BigCloud (bar Cloudflare).
The billing scheme is very transparent and friendly by being pay-as-you-go. It doesn't "fuck over the customer".
Your entire complaint isn't about the pricing scheme but about an additional feature to stop billing at some point - which I've explained is not easy to calculate precisely because charges accrue on time and adding even more complexity for calculations and potential for mistakes is not worth it for the many reasons I outlined previously.
> "The simple truth"
You keep repeating that word. There is nothing simple about this. Let's end this here.
How is the pricing structure not currently transparent? What number are you missing exactly?
Since data accrues charges over time, there is no alternative if you want a hard cap. Which is why none of this is very simple at all and requires a tremendous amount of planning and complexity just to implement, let alone all the possible new issues and mistakes it creates for customers.
Again, this is the typical "write some code in a weekend" approach that's missing all context of what it actually takes. And as I mentioned before, it's far easier to just negotiate billing then to deal with the aftereffects of whatever service and data disruption this feature would cause for the tiny fraction of customers that end up with this problem.
Waving your $5 fee is far easier and cheaper than spending millions on trying to avoid it in the first place, only to get replaced with potential complains that a production account was suspended or deleted.
Yes, but it's pretty predictable. Once there's enough data in your bucket that the monthly cost would go over the limit, just stop accepting new data.
Nobody cares if the spending limit is accurate to the cent. What people care about is not being surprised by huge invoices.
> The billing scheme is very transparent and friendly by being pay-as-you-go.
Have you looked at eg. the glacier pricing scheme, or at lambda pricing? It's almost impossible to know how much it's going to cost you ahead of time. The only thing you know is that if you happen to use it differently than anticipated, it's going to be expensive.
It quite literally isn't, otherwise there would be no billing surprises in the first place. Your entire argument about predictability is counter to the problem of unpredictable charges.
> "just stop accepting new data"
This is still effectively data loss and a major problem in production. Customers would rather negotiate a bill than lose data.
> "Nobody cares if the spending limit is accurate to the cent. What people care about is not being surprised by huge invoices."
Then it's a soft-cap, and if that's all you want then you already have billing alarms. Otherwise what's the buffer amount? What overage is acceptable? Is there a real hard cap? What if that's reached? You didn't actually provide any solution here.
> "the glacier pricing scheme, or at lambda pricing ... It's almost impossible to know how much it's going to cost you ahead of time."
How so? AWS is completely transparent about pricing. The calculations for it might be hard, but that's an entirely different issue. There are plenty of tools you can use if you don't want to do it yourself, however this is another logically incongruent point where you claim billing is easy enough to calculate and predict accurately for caps yet simultaneously hard enough that it's "almost impossible".
The biggest problem are mainly exorbitant bandwidth costs, and those are trivial to cap -- just stop serving requests.
Also, billing alarms are not a soft cap. They don't prevent you from waking up in the morning to a 5000€ bill.
> You didn't actually provide any solution here.
I'm commenting on the internet, I don't need to come up with a way for AWS to implement billing caps, especially since they have designed their service pricing in a way that makes estimates really hard.
But for most services, billing caps really aren't that hard, especially since the company we are discussing here (fly.io) apparently already allows billing caps if you prepay (according to other comments here).
You're just repeating this. Predictable is the opposite of surprise.
Even if storage use was very stable, so what? The overall bill is the problem so where the charges come from doesn't matter, only that eventually a limit is crossed. An overage is still an overage and the only way for billing to stop immediately is to delete and drop everything. This is the fundamental issue that you're not considering. It's what happens at the limit, not about how you get there.
> " billing alarms are not a soft cap"
Soft caps that don't actually stop anything are effectively nothing more than billing alarms. What else is their purpose?
> "I don't need to come up with a way for AWS to implement billing caps"
I didn't ask for implementation, I'm inquiring as to what logically is supposed to happen in the scenarios that occur based on your proposed "pretty simple" solution. If you can't answer then it's not so simple is it? You either haven't thought it through entirely to conclude that it's not actually possible to do that way.
> "designed their service pricing in a way that makes estimates really hard"
How so? You also keep repeating this without evidence. How is providing numbers on exactly what they charge for make it difficult? It's as transparent as it gets. They also have a calculator on their site. What more are you expecting?
> "for most services, billing caps really aren't that hard"
The nature of the service changes everything. Fly.io doesn't have billing caps, they just stop the apps when the credits run out and eat the bandwidth cost for now. The economics of scale can change that answer drastically, however even Fly repeats what I've said before: "the majority of people want their stuff to stay up" and "shut it down so you don't get billed" is usually not the preferred solution compared to negotiating a large bill.
Here's the simplest solution: If the limit is reached, stop serving requests, stop accepting new data, but don't delete any data. Allow static storage costs to go over the limit. That is probably what 99% of people who ask for a budget cap want, and it's the most logical thing to do because typically 99% of the charges are for bandwidth/requests/processing and only 1% for storage. If I set a limit at 10€ and amazon ends up charging me 10.2€ I can live with that.
The next simplest solution would be to look at how much is currently stored, multiply that with the storage cost per hour, multiply that with the remaining hours in the month, then subtract that from the monthly budget, and stop serving requests or accepting new data as soon as this lower limit is reached. This will guarantee that you never go over the limit without having to delete data. If data in storage is deleted before the end of the month, you'll end up spending less than the limit.
Now if you consider this basic math too complicated for a software engineer making $300000 a year, you could do something even simpler: allow the customer to set a limit on each resource. Eg. let the customer say, I want to limit S3 to 100GB of storage and 5TB of bandwidth and 2 million requests (or whatever). Of course that would be a bit of a hassle for the customer, but it would be very effective at preventing surprise charges.
> the majority of people want their stuff to stay up
At any cost? That's unlikely. I'm pretty sure that every company has some limit where they'd prefer the service to be down rather than pay the bill.
But if you go back up the thread you'll see that this discussion is about hobby users and tinkerers, and people who just sign up to try out the tech. These people absolutely want a hard cap on potential charges, and if you don't offer that you might scare them away.
There was zero need to switch because even though elixir/BEAM allows for a pile of processes, postgres does not.
If I had to guess, the switch was made because someone saw something shiny.