I’m not sure it is for 100% of early stage startups, but I guess it is once you exceed some minimum usage threshold.
That said, definitely appreciate the detailed explanation.
I’m not sure it is for 100% of early stage startups, but I guess it is once you exceed some minimum usage threshold.
That said, definitely appreciate the detailed explanation.
Similar to this post he commented a week ago:
> In a year we'll either be ahead of those, or not growing anymore due to ongoing capacity issues. I'm hoping for the former.
I am rooting for Fly! Great team. The company reminds me of early HashiCorp.
Maybe they were _too_ ambitious at the start? They have a hard road ahead of them, and competition like Render.com and Northflank have provided me with solutions to all of my problems. Great dev ux, great prices and predictable solutions. They also keep pushing out very useful features. A third competitor also sprung up Railway! There's certainly blood in the water.
Will they catch up to others before the competition solves the "global mesh" unique value proposition Fly.io currently has? That's the $1MM question.
Not the end of the world, but mildly disappointing. At least they are all in with Postgres and Linux, a great foundation.
I could see them building something RDS-like on their own, but if they're trying to go further than that I wonder if they'll buy or partner with other companies rather than doing it themselves. Neon strikes me as a Postgres-as-a-service that could pair well with Fly.
But the post makes it clear that the issue isn't that they had problems with new services. It was rapid customer growth before they had time to scale up the infrastructure as they had planned to do.
is it not agnostic about things like Elixir etc, at the tech level, though they've got super nice documentation for those tools you mentioned?
> The second problem we have with Postgres was a poor choice on my part. We decided to ship “unmanaged Postgres” to buy ourselves time to wait for the right managed Postgres provider to show up.
> We’re going to solve managed Postgres. It’s going to take a while to get there, but it’s a core component of the infrastructure stack and we can’t afford to pretend otherwise.
+1 to Neon seeming like a good fit, but it's also very much a beta (alpha?) both as a product and company (at least from my impression). I'm not sure that's a bet they'd want to make right now given the context of this post.
I don't use Fly but would consider them in the future even given their recent issues.
I look at this in contrast to Twitter who had/has? an outage today. Their leadership is opaque and doesn't take responsibility for the issues they are causing.
Fly (to my understanding) at its core is about edge compute. That is where they started and what the team are most excited about developing. It's a brilliant idea, they have the skills and expertise. They are going to be successful at it.
However, at the same time the market is looking for a successor to Heroku. A zero dev ops PAAS with instant deployment, dirt simple managed Postgres, generous free level of service, lower cost as you scale, and a few regions around the world. That isn't what Fly set out to do... exactly, but is sort of the market they find themselves in when Heroku then basically told its low value customers to go away.
It's that slight miss alignment of strategy and market fit that results in maybe decisions being made that benefit the original vision, but not necessarily the immediate influx of customers.
I don't envy the stress the Fly team are under, but what an exciting set of problems they are trying to solve, I do envy that!
Eh? Unless you are consuming something as a service and it actually advertises it as a feature, nothing is ready for 'global deployment'.
If you have a 'centralized' secret storage, then you have made it tied to a region. Want to have redundancies and lower latency? You'll have to distribute it. Vault has docs about this: https://developer.hashicorp.com/vault/tutorials/day-one-raft...
I’ve heard (on HN) of a dozen different companies vying for the heroku replacement spots and yet Fly seemed to capture the attention. I couldn’t name another one off hand.
What I truly want and probably lots of other people too is Flyctl (and workflow) for AWS. The same simplicity to run as fly, but give me something cheap in Virginia or the Dalles.
If I were them I'd focus as many resources as possible on making the stack rock-solid, and away from acquiring more customers or adding more capabilities.
In fact I'd try to down-scope some features if at all possible, like the example they give of disabling app deploys while they're doing platform updates.
We use fly.io at a small scale and it's worked really well for us, but the money is in customers at a larger scale who must have 100% reliability.
I think reliability is the #1 feature at any stage because if you're unavailable, you're at best useless and more than likely you are actively harmful because your users have an expectation.
However, if you're unavailable outside of times customers don't expect you to be there then you're not actually unavailable. This is more likely for an early stage start-up, but you don't typically choose or know when you're expected to be available nor do you always get to choose when you're unavailable.
This is likely the biggest culprit for a lot of these companies. Too many of us have grown up in the culture of getting hosting and platform for "free", but at some point the companies providing it still have to pay the bills. There has to be a better pricing model that let's someone deploy their relatively small, low-traffic app for $10s/month or even $200 - $300 / year for the basics (e.g. - Heroku free tier type capabilities). It's not going to save these companies but it would limit excessive growth of their own costs from a free tier while at the same time still being affordable for 1 - 2 person teams who are trying to get something in front of users.
Google Cloud. It is painfully easy to spin up managed postgres, super easy to deploy gcp cloud functions or gcp cloud run. It isn't expensive either and just works.
We've had a number of customers that use us for the database and fly for the app. We had a user benchmark a number of heroku alternatives with various database providers and we were actually better response time than the unmanaged instances on fly themselves in addition to all other providers they tested - https://webstack.dancroak.com/
I won't speak for Fly, but we're big fans of them and think we pair quite well together.
Also:
> The Heroku exodus broke our assumptions. Pre-Heroku, most of the apps we were running were spread across regions. And: we were growing about 15% per month. But post-Heroku, we got a huge influx of apps in just a few hot spots — and at 30% per month.
I hadn't before seen anyone with a big picture view confirm a heroku exodus was happening, although a lot of people suspected it or had anecdotes.
But if fly is seeing a pretty enormous number of customers moving from heroku to fly... oh wait, now I'm wondering, is this mainly a result of heroku ending free services, and those are free customers coming to fly for free services?
If so... that's a pretty big burden to take on without revenue to match, it does seem kind of dangerous for fly.
I do wonder however if they'd be better off using less l33t tech - do almost everything on Postgres vs consul and vault, etc. Scaling, failover, consistency, etc is a more well-known problem and there are a lot of people who've ran other DBs at tremendous scale than the alternatives.
Simplicity is the key to reliability, but this isn't a simple product, so idk.
If it's sorting and sifting and clicking a bunch of stuff in the console, that's not painfully simple. If it's some easy cli commands, I think that's in the ballpark...
I get that growing is super hard. And maybe fly will grow up to be a good platform some day. But that's the future. Today, they're flying by the seat of their pants and I mostly feel sorry for people who were tricked into thinking this platform is ready for production use.
Pardon the ignorance, is this not the Amplify CLI [1] ?
> I’m not sure it is for 100% of early stage startups,
I mean, it probably depends on the nature of the startup? Platform-as-service seems particularly sensitive to reliability (whether or not it's "#1 feature"), in a way that might not be true of startups in other spaces.
I might be paranoid, but I just don't feel comfortable when there's so much in play.
For a startup that is hosting other people's production application/data then this is absolutely true. Less than 100% always needs to be addressed.
For a startup that is selling bingo cards then reliability probably isn't nearly as important. I'm guessing there were certain holidays that were more important than others as far as reliability goes though? Maybe patio11 can chime in :)
I take it that it’s far more important that the local region know about changes than a remote region, which makes a mastered store in one location as the source of truth problematic.
I also wonder why these companies don’t backstop themselves on the public cloud? Failing into an AWS seems better than running out of capacity and some its services could be used in circumstances where an open source technology isn’t ready.
> Flyctl for AWS
Have you tried AWS Copilot? I’m having good success with it. Probably not quite as simple as flyctl, but still it’s only one command to deploy a container.I would really like fly.io to overcome these hurdles. I bet they will.
The PG issues hit me two times in the previous weeks but other than that it's been working great for me.
With the move to v2 apps (using their new machines infra) things are actually faster and smoother than ever.
About a year or so ago their CLI was quite buggy but I haven't really hit any bugs in months.
I will remain with Fly for the time being. Hopefully they don't close shop!
I feel like Next.js is in a similar position. While their main vision is SSR, I wonder if they are missing out on a chunk of the market that simply doesn't want to think about infra. We use them because we just don't have to worry about webpack or fiddling with deployment and hosting. We could care less about SSR and in fact we disabled it app-wide.
From the perspective of a recent founder, it's downright spooky to build around any SaaS, considering how few of them have been around for 10+ years, when that is certainly what our business is aiming for.
I know (and share the feels): Devs tend to get excited about the new thing – but if Google Workspace shut down next month, we would be in so much operational trouble. When other peoples fancies stand in the way of the entire operation you are responsible for, it actually begs the question how much closed source SaaS you can allow before it starts to be quite frankly irresponsible.
We are not imagining things. SaaS of all sizes shut down all the time, and when you are heavily relying on them and building software around them to run a business the prospect is spooky as hell.
But our team, who has used Heroku for over a decade, got bit multiple times by Heroku having a free tier.
Why were we impacted by other apps? Because Heroku’s load balancers are shared amongst all their apps. That includes all the sketchy apps running on the platform.
If Heroku could somehow isolate us from everyone else? Great - and they offered that for awhile with a reasonably-priced Add-On supported by them called SSL Endpoint. It cost about $15/month and put us into a pool that was shared with other folks willing to spend that much per month to run their app.
I understand that’s not great for a hobby project. But for those of us trying to run a large product on Heroku and not have to spend multiple extra thousands of dollars every month for a Heroku Private Space, this was a great way of pooling: put a small fee in place for one pool of resources. Not many malware writers or other misbehaving app creators will probably want to spend that much per month.
But they axed that a few years ago. Only a couple months after when we were thrown back into the load balancer pool with all the other free apps, one of the IPs was marked as spam and we had to figure out a kind of janky solution.
Additionally, Heroku seemingly spent a ton of resources on free tier support, malware fighting, etc. I hope to see more features on Heroku since they’ve dropped that support… but I haven’t seen much evidence of that in roughly six months since they did that. But we’ll see.
The Fly.io of 2023 looks almost nothing like that of 2021 (all for the better), and it's not obvious to our users what's changed. We've been doing a shitty job of communicating, and we're taking our licks for it now.
In March 2021, someone asked a question about carbon emissions of their data centres. They said they hosted on both GCP and AWS, but mentioned they were interested in moving to their own bare metal [1].
In April 2021, I asked a question about egress fees to Google, and they walked back a bit the comment about moving to bare metal [2].
As of March 2022, they're still in AWS/GCP [3].
As of September 2022, workloads for new users deploy into AWS, even in regions that were previously served by GCP [4].
[1]: https://community.render.com/t/does-render-use-green-energy/...
[2]: https://community.render.com/t/is-render-com-hosted-in-googl...
[3]: https://community.render.com/t/are-your-servers-owned-by-you...
[4]: https://community.render.com/t/which-render-regions-map-to-w...
And then I was “Huh, these technical challenges are actually pretty difficult”
And then I was all “crap, these are a bunch of technologies I was about to add to our stack”
Thanks heaps fly.io people; having the humility to honestly talk about the challenges and failures massively helps people such as myself as we navigate new unfamiliar technologies. If more companies were willing to do this, it’d be a lot easier to avoid common pitfalls.
If you offer data volumes, the low water mark is how EBS behaves. If you offer a really simple way to spin up Postgres databases, you are implicitly promising a fully managed experience.
And $deity forbid, if you want global CRUD with read-your-own-writes semantics, the yardstick people measure you against is Google's Spanner.
I wish I shared your enthusiasm for where Heroku could go but I have a few friends at Salesforce I've asked about how they see Heroku internally and it really doesn't seem like it is going to get much love. Hope to be wrong though.
Heroku PostgreSQL is very simple, yes. But once you need non-trivial scale it's expensive and extremely non-performant. Even a medium-sized RDS will outperform Heroku's most expensive database offering by 20x in my experience. My company doesn't even run PG on Heroku anymore. We have a VPC/Private Space connection to AWS Aurora because the cost/performance difference is so extreme.
We often see folks wanting a mix of both. For example, maybe the /about page is static, but the home page is dynamic and personalized based on the visitor. You can do all of this with Next.js. Our future direction is adding even further granularity, enabling this decision at the data fetch level, allowing you to cache results across deployments[3].
[1]: https://beta.nextjs.org/docs/rendering/edge-and-nodejs-runti...
[2]: https://beta.nextjs.org/docs/rendering/static-and-dynamic-re...
Though to be fair, even if render collapsed overnight, I think I’d still be equally satisfied after moving to fly.
I think the community would really love to see a direct Fly+Crunchy integration!
I've been lucky, in the past, but a lot of that, is because I have "overengineered," and the tools/frameworks have advanced to meet the new demand.
I am in the middle of a complete, bottom-to-top rewrite of the app we've been developing for the last couple of years. It's going great, but making this leap was a fraught decision.
It's mainly, so I wouldn't have to write a post like that, in a year or two.
We spent all the time refining it, until we had what we wanted, and it worked great on our small test team.
Then, I loaded up a test server with 10,000 fake users, and tossed the app at that. To be fair, we don't think we'll have even that many users for quite a while. It's a very specialized demographic.
* SOB *
It no do so well.
At that point, I had to decide whether to fix the issues (they were quite fixable), or revisit the architecture.
The main issue with the architecture, was that it was an "accreted" app, with changes gradually being factored in, as we progressed. The main reason for this, is because no one really knew what they wanted, until we ran it up the flagpole (sound familiar?).
The business logic was distributed throughout the app. That was ... ugly.
I envisioned myself, a year or two down the road, sucking on a magnum, because the app had turned into a Cruftosaurus, and was coming for me in my nightmares.
So I decided to rewrite, as we hadn't done any kind of MVP or public beta, so we actually had the runway to do this.
I refined the entire business logic of the app into a single, platform-agnostic SPM module, which took just over a month, and have started to develop the app around that. It's pretty much a rewrite, but I am recycling a lot of the app code. We also brought in our designer, and he's looking at every screen I make. It's working well for him.
Like I said, it's going great. Better than I expected.
I know that I have a huge luxury, and I'm grateful. I can credit a lot of that, to doing some stress-testing before we got to a point where we had a bunch of users to support. I was able to go in, and go all Victor Frankenstein on the model.
The result, so far, is that this thing screams, and you don't really even notice that there's that many users on it. The model has already been proven (that SPM module), and all we're doing, is chrome (which is a ton of work).
Your post implies corporate messaging is bad. And anything posted by a company—or at least I don't know where you draw the line—can be considered corporate messaging. Am I just reading too much into your phrasing?
Sure you don't need it for 99% of usecases, but if it just works using familiar architectures then it is also strictly better for 99% of usecases so you might as well, and people will naturally want it.
That 'familiar architectures' part is the hard bit, though.
If you don't have SLOs and SLAs, then you get what you get, essentially. Even a company with a great reputation can completely reverse course with a single bad incident, and you get nothing in return if there's not a contract.
They are being open and transparent (afaik) even if carefully worded, which I also don’t blame them for.
Long story short, it's completely over-engineered by a bunch of intellectual engineers with no focus, no discipline, and no oversight. It ended up not delivering on any promises it made, and there were a lot of them.
I was warned left and right before presentations and meetings, "this customer hates your product because of ...." I started off every meeting with saying, "we're rearchitecting the product, this is how we're doing it, this is the tech we are using." Immediately there was a sense of relief from customers, followed by questions like, "why can't <current product> deliver <feature> that was promised?" I'm completely honest with bad decisions that were made and how it impacted the feature. Sure, there is skepticism on what we are doing, and I tell them they should absolutely be skeptical based on our track record. The result has been customers who have hated my product now offering to work with us on development.
I've also been completely forthcoming on configuration, security, resources, and setup issues I am finding, many of them are absolutely freakin' insane. I've flat out told customers it's frankly embarrassing and never let us do something like this in the future. The best feedback on this was, "At least you're telling us something. We usually get silence from this team."
God, this is the most depressing job ever.
Edge compute can be helpful for static or quite cachable content. But often this is handled as well or nearly as well by a caching CDN.
So that leaves a few cases where edge compute is useful. Where you are globally distributing the data itself (and ideally moving the data around as your users travel or move) which is incredibly rare and expensive to build, and when you need pure computation that needs no request to your backend and if 50ms of latency is important for a pure computation most of the time you can just move it to the client. In my experience these tend to be rare. I would estimate that edge compute is actually helpful for 1-5% of projects, not 99%.
Fly's been many things over the course of its lifetime [0], but I believe their latest pivot (on what they call "Machines") is pretty darn good. I've been using Machines since Oct last year, and things have gotten better week-over-week. Like with any platform, Fly has its own idiosyncrasies, which don't take much to get hang of. That said, I am the only person in my tech shop that deals with Fly. Some orgs with larger teams and heavier apps that deploy frequently or run DBs / disks on Fly (I don't) have had a rough few months; so that's there too.
We are launching our paid tier March 15th and will be production ready shortly after. We are running 20K+ databases and measuring reliability and uptime.
Generally reliability is a function of architecture (we are solid there), good SRE practices, and a long tail of event you live through, fix, and make sure they never happen again. The bigger the fleet the faster the learning.
At Coherence (withcoherence.com) we're focused on a developer experience layer on top of AWS/GCP. You might describe it as flyctl for AWS.
The first releases of EBS weren't very good and took a while to get to where we are. Some places still avoid using EBS due to bad experience back in 2011 when it was first released.
Companies that engage in this kind of candor are careful not to disclose those things that would really hurt their business. Those things are still kept secret. If the CEO accidentally sexually harassed an employee that's not getting disclosed. A mea culpa is only offered for the issues that are already known regarding scaling, downtime, and missing features. Struggles they have because they're choosing to grow so fast.
I selfishly hope Fly put all their focus toward becoming Heroku 2.0. I’m sure some people care about all the edge latency stuff but I don’t know many of them.
1. Security
2. Durability
3. Availability
4. Speed
Similar: https://twitter.com/colmmacc/status/1071088017190711296
Free tier is a GTM motion which makes sense for novel tech products like Fly because: https://en.wikipedia.org/wiki/Technology_adoption_life_cycle
https://news.ycombinator.com/item?id=32955520
If they are a multiplier for a whole portfolio, there's not much reason for any particular branch to purchase them.
(This post seems like some evidence they might actually be building the wrong thing, though.)
It gets significantly more challenging when you grow, either in feature complexity or scale complexity - and then very few services can offer what AWS/GCP/Azure offer - albeit at the increased engineering/monetary cost of using them.
We're building a different kind of approach[0] that aims to absorb the mechanical cost of using public cloud capabilities (that are proven to scale) without hiding it altogether.
I am sympathetic with much of Kurt's post. We spent a long time building solutions to several of the areas highlighted (managed PG, persistent volumes, secret management and service discovery).
Making radical changes to architecture on a live cloud platform is always a challenge.
On the front-end Northflank is a next-gen PaaS built for high DX, speed, and powerful capability (real-time UI, API, CLI, GitOps, IaC).
Our backend is built using Kubernetes as an OS: providing a huge amount of flexibility on service discovery, load-balancing, persistence/volumes and scale.
The benefit of using Kubernetes is a universal API across all major cloud providers. We can scale clusters and regions across EKS, GKE and AKS in seconds, either in our managed PaaS or inside our customer's own cloud account.
Our managed dataservices: MySQL, Postgres, Redis, Mongo, Minio are all built using Kubernetes Operators with a small but mighty team.
From a generous free tier to autoscaling to managed postgres and other advanced PaaS/DevOps automation workflows Northflank offers something unique.
I'm very explicit both internally and externally that an acquisition is a failure mode for Render. We're building this for the very long term and plan to keep it that way.
(Timescale -- I think i know, it adds features specifically about storing time series? But I don't think crunchy has additional domain-specific stuff like this? What are the pretty useful features folks find in crunchy that RDS lacks?)
As someone that has started tons of Consul clusters, analyzed tons of Terraform states, developed providers and wrote a HCL parser, I must say this:
HashiCorp built a brand of consistent design & docs, security, strict configuration, distributed-algos-made-approachable... but at its core, it's a very fragile ecosystem. The only benefit of HashiCorp headaches is that you will quickly learn Golang while reading some obscure github.com/hashicorp/blah/blah/file.go :)
I let them know they need to demonstrate that to me. They have a roadmap [1], but it seems to have barely anything moving forward, including some really important concepts like http/2 support.
I let some folks at Heroku know this who are product managers, and they are investigating it… but I would be shocked if Heroku gets a big performance improvement anytime in, say, 2023.
20X seems like a lot for RDS, though I’d be curious to learn more! We are switching to Crunchy because of that clear cost/performance difference you mention.
One thing we've noticed, though, is that people do actually want Heroku but close to users. It's not exactly edge compute. In some cases, it's "Heroku in Tokyo". In others it's "Heroku, but running in all the english speaking regions".
I think the thing that ate up most of our energy is also the thing that might actually make this business work. We built on top of our own hardware. That's the thing that made it difficult to build managed Postgres. We put way more energy into the underlying infrastructure than most nü-Heroku companies. The cost was extreme, but I'm like 63% sure this was the right choice.
https://www.oracle.com/cloud/free/
It's still just a free tier so you can't expect good support, but, it's there.
I remember my first few days on the job just being ripped to shreds by our customers who (understandably) were slighted. Don't miss those days at all.
I have only positive things to say about every HashiCorp product I've worked with since I got here.
I want to gently note since I see a lot of misunderstanding around Spanner and global writes: Global writes need at least one round trip to each data center, and so they're still subject to the speed of light.
fly.io charges an outrageous 2 cents/GB. Google is over 4x that.
At fly.io rates, 1Gbps average over a month is $6400/mo. Google is tiered and you’re looking at over $10k/mo.
For comparison, a cheap managed switch that can handle 1Gbps costs about $100, maybe a bit more if you want a nice one. A nice router is more. You can rent an entire rack, including power, cooling, and an unmetered 1Gbps for $300-$1k/mo (with maybe some wiggle room on both ends). You can buy a pretty nice server, amortize the price over a week or two, and still come out ahead.
You certainly get considerable value from a major cloud provider, and a lot of their other services are reasonably priced, but, depending on your workload, the egress prices and the corresponding Hotel California factor may make using a major cloud provider a poor proposition.
I'm curious what you'd like to do next. You could probably have a great career doing these sorts of turnarounds repeatedly across companies, maybe even as a consultant, but would you want to?
> I also applied a few months ago while I was in the middle of my job search. For one, I couldn’t really answer their "favorite syscall question" because I’ve never dealt with syscalls :) so maybe I just wasn't a good fit.
Surely, everyone's favourite syscall is exit()
Honesty pays off in the long run, but it's something businesses quickly forget past a certain stage.
> We’ll readily admit our docs still have a Django-shaped hole in them.
https://twitter.com/flydotio/status/1578039196618575874?t=nu...
Also they basically only use OSS versions, they could go give Hashicorp some money to solve their Vault problems. They could probably partner with SecondQuadrant for PG as two examples. That might not make sense for their business though.
Hard problems are hard no matter the choice.
Sqlite https://github.com/tomwojcik/django-fly-sqlite-template
Psql https://github.com/tomwojcik/django-fly-postgres-template
And may I also plug Lunni, a self-hosted Docker Swarm-based PaaS I'm working on right now: https://lunni.dev/
Both work pretty well on $5 servers.
Machines isn't that. From the documentation, it appears as though it's "just" a VM pinned to a single region and none of the "magic" of Fly really applies. If the server your VM is hosted on goes down, Fly won't redeploy your container. It's just downtime. Spinning up in other regions is something you have to think about and actually do. It seems closer to Heroku than it does Fly.
Maybe I am totally misunderstanding Fly Machines and their use-case, maybe they're aiming to close the gap between Machines and Fly apps. It's just a bit of a bummer to see something that looks like walking back the original "promise" of Fly and makes me question whether or not Fly is going to just become like every other PaaS (even if it's a really good one).
If anyone else is looking.
Even without autoscale, spinning up Machine clones in any of the 30+ Fly regions is as easy an instant scale-out you'll likely come across on any of the NewCloud platforms.
They can trot out a low level person to stall you with questions, or an AI question generator that maximizes the amount of time you waste on your end, and call that "SLA met".
And even if they DON'T meet the SLA on occasion, you built your stack on AWS. You are laying in the bed you made.
SO, what, AWS throws some free credits (that their 30-40% margin easily absorbs)?
The only big stick in these types of things is having dual-cloud capability, where you can move your service quickly from one cloud to the other. Stateless API servers? Maybe. Database servers? ouch. Cassandra could reliably span two clouds, man would AWS kill you on their ludicrously overpriced network costs.
Has anyone does Postgres replication across providers as a useful production system? Doubt it.
I used to use Dokku but I personally liked the GUI from Coolify so I've been using that. Nice to see that you have a GUI as well, makes configuring apps a lot easier.
It's a delicate balance based on the locations that rows are being read and written. In the case where a row being repeatedly written from only one location and not being read from different location, the writes can be significantly faster than would be naively expected.
If Salesforce kept investing in heroku, it might not. But there is a huge loss of confidence in heroku's future going on among heroku's customers right now, which is part of what you're seeing, as I'm sure you know. (Also I think to some extent you are being political/kind towards heroku... if heroku's owners were still investing in heroku for real, adding 'edge' functionality like fly.io is focusing on is what one would probably expect...)
And frankly... your tool seems more mature and... not to be rude to competitors but seems to have more of that certain `je ne sais quoi` of Developer Experience Happiness that heroku _used_ to have and other potential heroku competitors don't really quite seem to have yet. Does what you expect/need in a polished and consistent way.
I think work you put into the underlying infrastructure definitely shows there, and was the right choice. Tidy infrastructure helps with tidy consistent developer experience.
So I understand why people are looking to you as a heroku replacement. I am too! (And I don't really need the edge compute stuff; although I could potentially see using it in the future, and it shows you folks are on top of things).
And while I kept reading fly staff saying on HN comments that you didn't want to be a heroku replacement, so were unconcerned with the few places people were mentioning where you still felt short of it -- when I saw your investment in Rails documentation and tools (and contribs back to Rails), I thought, aha, i think they've realized this is a market looking for them, which they are only a couple steps from and it would make sense to meet.
When you mention in OP a "heroku exodus" to you... I'm curious if that was all people who left when heroku ended free tier stuff, and they've all come to you for your free tier stuff... becuase that does seem dangerous, such a giant spike in users who are not paying and don't bring revenue with them! I don't personally use very much heroku free tier stuff. I hope if that's a challenge, it's one you can get over. I don't think you are under any obligation to offer free stuff that can be used for real production workloads indefinitely -- although, as I'm sure you know, free stuff is huge for allowing people to try _before_ they buy, and whatever limits you put on it to try to prevent indefinite production use get in the way of someone's "try before you buy" too... and at this point, _reducing_ your free offerings is a lot harder PR-wise than having started out with less in the first place. :(
An OSPF router uses those updates to do build a forwarding table with a single-point shortest path first routine, but there's nothing to say that you couldn't instead use the same notion of publishing weighted advertisements of connectivity to, for instance, build a table to map incoming HTTP requests to backends that can field them.
The point is, if you're going to do distributed consensus, you've got a dilemma: either you're going to have the Ents moot in a single forest, close together, and round trip updates from across the globe in and out of that forest (painfully slow to get things in and out of the cluster), or you're going to try to have them moot long distance (painfully slow to have the cluster converge). The other thing you can do, though, is just sidestep this: we really don't have the Raft problem at all, in that different hosts on our network do not disagree with each other about whether they're running particular apps; if worker-sfu-ord-1934 says it's running an instance of app-4839, I pretty much don't give a shit if worker-sfu-maa-382a says otherwise; I can just take ORD's word for it.
That's the intuition behind why you'd want to do something like SWIM update propagation rather than Raft for a global state propagation scheme.
But if you're just doing service discovery for a well-bounded set of applications (like you would be if you were running engineering for a single large company and their internal apps), Raft gives you some handy tools you might reasonably take advantage of --- a key-value store, for instance. You're mostly in a single data center anyways, so you don't have the long-distance-Entmoot problem. And HashiCorp's tools will federate out across multiple data centers; the constraints you inherit by doing that federation mostly don't matter for a single company's engineering, but they're extremely painful if you're servicing an unbounded set of customer applications and providing each of them a single global picture of their deployments.
Or we're just holding it wrong. Also a possibility.
We're inherently faster than other "serverless" platforms due to the scale and homogeneous design of our network, and that network has presence in nearly 50% more cities than it did just 3 years ago. We were plenty fast enough then and we're even faster now.
Other things that customers (still) really care about: developer experience, ease of use, and cost. Nobody likes paying the AWS tax to move data around—they just want to use the best solution from the best cloud provider. Workers and the associated storage primitives allow them to pick and choose from the best that AWS, Azure, Cloudflare, GCP, et al. have to offer.
(Disclaimer: I'm a long time Cloudflare employee focused on App Sec, and I speak to customers regularly who look to Workers largely for compliance reasons, but I don't work on the Developer Platform business. Am sure my Dev Platform peers will chime in with more nuanced answers!)
I guess, you and GP are in agreement for the strategic part of the argument at least, if not the genuine part of it.
As someone who's been active on Fly's community forums for close to 18 months now, I think Fly employs some of the most genuine and helpful engs you'll see, so I'll give them the benefit of the doubt.
1. They have vastly different ongoing capital and cashflow requirements than you do.
2. They have all the leverage when it comes to the question of your continued operations on their cloud.
I'm also curious if they have already offered to just buy you out since you're clearly succeeding where they seem to just be treading water. (But not expecting you to answer this question. :) )
In my experience people who ran Postgres distributed across a WAN tended to use obscure third-party plugins at best, more often a pile of dodgy Perl scripts. Using something designed from the ground up to be clustered seems to have a much better chance of working out than trying to make something that's been built as a single-instance system for decades work across the internet.
Is the actual "production" workflow still pasting a Docker Compose file in? I would much rather have an automated deployment process that doesn't require human input, that way it can be scripted as part of CI/CD, etc.
Personally, I fell in love with `git push production` (naming a git remote `production`) to trigger a deploy. Ironically I didn't like this back when I first tried Heroku, but it's grown on me since. As of now, I have a custom git receive hook on my server (building a NAS from "scratch" using IaC on my home server) that triggers a redeployment using Docker Compose.
Also, you mention Swarm... what does Lunni bring with Swarm as opposed to simple Docker Compose? Does it distribute across multiple systems?
The problem with running your own servers in data centers as a startup is that elasticity is genuinely a difficult problem to solve if you don't have a large budget for unused compute, storage, and so on. As we are seeing in Fly.io's case.
Ultimately, my bet is that both startups end up as Heroku-like acquisitions for some large cloud company or another. I think that render will sell for a lot more because the value it provides is agnostic to the underlying cloud infrastructure.
Sure, no doubt. My point wasn't really about the particularities. It was around the mistaken idea that I see sometimes where people believe that TrueTime allows for synchronized global writes without any need for consensus.
If the market is big enough to support AWS/GCP/Azure as $N00B businesses each, it’s not a leap to imagine a future where both Fly and Render are incredibly successful, loved, and independent businesses spanning decades. Let's keep at it.
https://github.com/nathants/libaws
companies like fly are fantastic.
they provide a good service, and they put market pressure on aws.
a free tier isn’t important anymore. with usage based pricing for lambda/dynamo/s3, an app with usage approaching zero has no cost.
[0]: https://github.blog/2022-04-15-security-alert-stolen-oauth-u...
(Bridged to Matrix: https://matrix.to/#/#lunni:matrix.org)
Lunni is actually pretty young in terms of community (just me and a few friends now :-), so just a room in Telegram was sufficient so far, but I think it's a good time to start something more official.
Would it help to replace Corrosion with a simpler "Here's my local known state" blob that is POST'd to blob storage (for example) on a major cloud provider, and have another service read that at intervals? Just to make it really simple.
There will be a better way than that, but my thought is if you can make it simpler (known state is always just pushed, so missing updates auto-recovers and avoids corruption) then you can be building on top of a more stable service discovery system.
Centralized secret storage, can you keep the US instance read/write, but replicate read-only copies (a side-car tool that copies the database to other regions at various intervals?) so each region can fetch secrets locally?
Or perhaps both can be solved with a general "Copy local state to other regions" service that is pretty simple but gives each region its own copy of other region's information (secrets, provisioning states, ...).
I've needed to do similar things for some of the apps I've built, where a service needed another (simpler) service in front of it to bear the traffic load but was operationally simple (deferred the smarts to the system it was using as the source of truth) and automatically recovered from failure due to its simplicity.
I wonder why they didn't try to use Serf[1] for this, since they were so into HashiCorp tools. It also uses the gossip protocol.
Swift Package Manager.
https://littlegreenviper.com/series/spm/
The only dumb question, is the one I don't ask.
https://lunni.dev/docs/deploy/from-git/#github vs https://docs.coollabs.io/coolify/sources
> There's no support for a single dedicated IP address for your application. With Heroku, your application's CPU resources are mostly located in one datacenter. Heroku doesn't support HTTP2 or Brotli compression and it doesn't do Edge TLS termination. And it doesn't run your applications on dedicated MicroVMs. These are all things that Fly's Global Application Platform does.
The other comment that mentions Heroku dropping low cost plans is the reason for the explosion in growth as I understand it though.
Whenever a US law and a foreign law conflict, the US law always wins when you are in the United States. Complying with US laws is also a perfectly valid defense if a European citizen or state ends up bringing action against you in a US court.
Not Anurag, but as ex-Stripe himself, he may appreciate AWS Flexible Payment Service vs. Stripe parallels here: https://news.ycombinator.com/item?id=34513430
There is truly no compression algorithm for experience.
One of the hardest lessons every business needs to learn is how to say no to the users they don't need.
Unless somebody natively implements clustering in Postgres I don't see that happening anytime soon, all existing tools require way too many moving parts.
I don't know why they'd even want Postgres for the type of service they are offering, KV or maybe SQLite seem like a better fit.
Unlike Kubernetes or Nomad though, it uses mostly the same concepts Docker Compose does, to the point that your development docker-compose.yml file will likely just work there (with some minimal tweaks). I love this website that talks more about it: https://dockerswarm.rocks/
Edit: As opposed to `docker compose up`, when running on a single server: not much. It will restart on server reboot by default, and allow you to run multiple replicas of a service (deprecated in Docker Compose), but that's it. Most important though, it would allow you to add more nodes later on, and it will then scale your services across the whole swarm – so you can start with just one server and scale to hundreds if needed.
> I would much rather have an automated deployment process that doesn't require human input, that way it can be scripted as part of CI/CD, etc.
This is almost doable with Lunni. This guide will walk through setting up a CI for a typical webapp that packages it in a Docker image and pushes to a registry: https://lunni.dev/docs/deploy/from-git/ (currently for GitLab CI and GitHub Actions only)
As for the continuous delivery, we're gonna have a webhook that you can call when your CI pipeline is finished. It's not exposed in the UI yet but I'll try to prioritize it (now that I remember I wanted to do it :')
`git push production` feels a bit easier, but I'm a bit concerned about bloat: for this to work, we'll have to bundle some sort of CI and container registry with Lunni itself. I think sticking with third-party CI is a more elegant approach here. What do you think?
I moved one app successfully from heroku to fly and attempted to move a few others. These are my experiences (both good and bad):
Great:
- The load time on the pages is insanely faster on fly than heroku. Sometimes I thought I was on the localhost version of the app, it was that snappy.
- Love that it uses a Dockerfile
- Love paying for what I use (compared to Heroku's rigid minimum of $16/month for hobby dyno w/ postgres for baby apps, or $34/month just to get a second web dyno for toddler apps). The same apps are <$5/month each on fly.
Not great:
- I find the fly.toml file hard to understand and use, and the cycle time slow to fix or tinker with it. It's partly (entirely?) a 'me' problem because I haven't spent a huge amount of time reading the documentation.
- I found scheduling a rake task in a rails app time consuming (~days) the first time, but very easy (15 minutes) the second and subsequent times, once I knew a way that worked (cron didn't work; had to use a tool I hadn't used before 'supercronic').
- Deploys sometimes time out with `Error failed to fetch an image or build from source: error rendering push status stream: EOF`. Most layers copied, but randomly, some layers wouldn't. All I could do is keep trying until it worked, which it did, 2 hours later. Not the end of the world, but an annoying complication when you're already trying to solve complex problems.
- I followed a youtube video on how to move a rails app from heroku to fly, and it worked on a modern app, but I couldn't quite get fly happy when moving the older app - something to do with postgres versions, and I didn't want to spend all day figuring it out. I'm not hugely experienced with docker, it could have been an easy fix for someone more experienced.
On reflection, 3 of the 4 negatives above are solvable by me reading the docs more thoroughly and getting more proficient with docker.
I look forward to continuing using and exploring fly, and can't be happier with the directness, transparency and care from fly staff. A platform with huge potential.
What I mean is that these scaling problems don’t have much to do with app logic but having nice core is good so :thumbsup:
By GitHub app I meant that Coolify makes you install its own custom GitHub app that then allows you to git push to deploy.
Are there any examples where the capitalism bottom line is ignored and a company keeps growing with extremely premium generous acquisition offers on the table? I can't think of any, but there could be a few. However, I expect it's pretty rare.
For companies with such tremendous growth, the venture capitalist firms are primarily looking to make their <big-multiplier> return and push priorities accordingly (understandably).
The only constant in life is change, it's best to focus on what you can do right now, today, and only put out promises or commitments that you have the necessary influence to follow through on. Some things are bigger than each of us.
Best wishes and godspeed to you and fly.io!
Re: business model: both Coolify and Dokku are open source, so even if their development stops, you can continue to use them no matter what. (You do have to pay for your own servers though :-) So it's not a PaaS in the traditional sense (like Fly.io or Heroku), but more like “build your own PaaS” thing.
Many times I've had to read all the docs then use a system for several months before the epiphany hits me.
The issues you ran into with older versions of Rails was probably because the Dockerfile that `fly launch` generated was for new versions of Rails. We switched to https://github.com/rubys/dockerfile-rails to streamline Dockerfile generation and support older versions of Rails.
If you try it again and run into issues you can open an issue at https://github.com/rubys/dockerfile-rails/issues or post in https://community.fly.io and somebody will help get that sorted out.
The more versions of Rails we can deploy the better!
The goal should be to make the backend as simple as possible, but no simplier. Complexity here leads to operational burden and toil. But that's why you hire good SREs and treat them well. What's more important is frontend complexity, aka how difficult it is for customers to use. Backend and frontend complexity aren't necessarily linked, which, imo, fly.io achieves, downtime aside.
I don't really have a business model. I do take donations from Open Collective (and Github Sponsors, which funnels to OC) and there is Dokku Pro, but those don't collect anywhere near the funds I'd need to stop my dayjob (at least now. Maybe someday?).
My business model is that code releasing is something I'm pretty passionate about. Dokku isn't even originally my project (Jeff Lindsay started it, I just took it over), but I've been working on it for almost a decade. It's open source and fairly simple, so even if something happened to me, others could theoretically continue the project on as desired (or build on top of it if need be).
I'd be interested in hearing any of your other concerns though :)
For proxying requests, Dokku currently supports:
- nginx on the host (default)
- traefik (via docker labels)
- caddy (via docker labels)
- haproxy (via docker labels)
We'll also soon support nginx via docker labels, which will work around issues where Docker sometimes assigns random IP addresses (and unlock TCP/UDP proxying as well).I can't say anything else about Coolify since I haven't used it in a while, but I'd be curious as to what other parts are more modern about Coolify than Dokku.
I fundamentally don't understand why people are in such a big hurry to get 'famous'. I've worked a couple of places where the marketing side was working as hard as they could to make sure that our heads were on fire at all possible moments. At one job I had a (very, very junior) manager come up to me and say great news we landed <big customer> and my immediate reply was, "fuck me". We were already running to stay upright and now we're about to have twice as much scrutiny. Wonderful.
If you push hard enough, eventually everyone looks like an idiot. The number of humans for whom that is not true could fit into a book. Both alive and deceased. They most definitely do not work for the companies I've described, at least not enough of them so you'd notice.
Dokku doesn't have an _official open source_ UI. There are a few unofficial OSS ones (Ledokku is the latest) that I'm aware of.
There is have a commercial offering in Dokku Pro (https://pro.dokku.com). It's paid (one-time lifetime license) but only so that I can at least partially cover my development time on it. The project is enough work on top of Dokku that I feel it is justified, especially as there is nothing stopping others from doing so, OSS or otherwise.
I encountered a few issues. One was definitely something to do with older postgres. From memory, I tried downgrading it using apt, but then other things played up and I put the project aside.
Another rails 6 app I tried to move into fly encountered this: https://community.fly.io/t/rails-app-problem-with-node-modul...
I followed Sam's suggestions to regenerate the Dockerfile with dockerfile-rails (btw, thanks for your work on dockerfile-rails, super excited about it) and solved a couple more issues, but I again ran out of steam when new issues kept coming and coming. I'm sure when I'm more comfortable with docker these will become trivial to solve.
These were not super determined attempts by me, more playing around. I look forward to more serious attempts when I'm more capable with docker.
About the modern part: that was my opinion based on the way I recall Dokku and Coolify, and a quick scroll through the docs of both, so I might be really wrong here! I definitely need to check out both Dokku and Coolify again sometime.
Usually this results in me jumping on new platforms and then abandoning them once they add too much complexity.
I took a quick look and couldn't find them. Do they have any documented service limits?
A google search turned up [0] which does not inspire optimism.
> ...there isn’t a limit to number of apps from a billing standpoint...
[0] https://community.fly.io/t/free-tier-limits-and-quota-needs-...
to be pithy about it, going full-bore gossip protocol is like going full-bore blockchain: solves a problem, introduces a lot of much more painful problems, and would've been solved much more neatly with a little bit of centralization.
One of the main features of Dokku is it's extensibility. You can cut one part out and replace it with another quite easily, and proxying is an example of that. I think that flexibility allows folks to use it in more situations than one otherwise would, though at the cost of being more difficult to maintain (and harder to have cohesion between parts of the system at times).
I’m curious why you think it isn’t? On a long enough timescale all good things seem to be acquired by large megacorps for a fuckton of money.
Slack, Linode, Minecraft, the list goes on. Eventually they all make the thing less than it was before under the founders’ vision. At least from my perspective.
It won’t stop me from cheering them on, but I’m still very skeptical of them not being bought out in 10 years.
I’m trying to build more of an intuition around distributed systems. I’ve read DDIA and worked professionally on some very large systems, but I’m wondering what resources are good for getting more on the pulse of what the latest best practices and cutting edge technologies are. Your comment sounds like you have that context so any advice for folks like me?
We will have an option to not scale all the way to 0 to support this scenario.
I use caddy as the proxy, since I found the traefik configuration absolutely incomprehensible. Now I use only 2 labels to proxy instead of 15.
The thing that worries me about these incidents is they haven't been, like, full service outages. A small subset of users talking about issues in forums. This makes me just feel like Fly has an immense amount of issues.
At least if like 50% of fly goes down then it feels like a config fat finger. When it's a bunch of tiny issues now all my ops debugging has to start with going to the fly forums (and it's _always been issues on fly's side_).
The price is "right" (though like with all PaaS the gaslighting about running multiple processes in one container makes me feel bad about the state of cloud computing). And I really like the CLI stuff mostly! But I extremely don't care about edge computing so for me fly is just heroku and I would love to feel more confident on that end.
(EDIT: the nice thing is I get email support with a bit of cash. This is a thing that will go away when they get bigger but it's here while things are still breaking often)
I used to work for a company that built deployment platforms for law firms. All our deployments where on prem and we had the same complexity with kubernetes. We had similar setup with vault and stolon for HA PG. More moving parts you have in infra, more permutations and combinations of failure modes you have.
What these guys are building is something I have seen in many orgs trying to do it internally and fail. PaaS is a hard problem if you want to solve it "reliabily"
Can't you make the argument that Heroku got out of this market on purpose? I know they were bought out by "ye old corporate greedy meanie overlord" or whatever but... I'm sure there is data that showed it made sense from a "make money business" perspective to not be in that market.
Something like linkerd on Kubernetes would be stronger, I suspect. But I don't know the exact nature of your problems.
This kind of frank and human communication is vulnerable, but it’s good for establishing credibility… with me at least!
Think it’s bad to potentially technically move your solution from $CLOUD vendor? Wait until you turn around and realize you have at least one full time hire who’s entire role is “$BIGCLOUD Certified Architect” (or whatever) and your entire dev staff was also at least partially selected for experience with the preferred cloud vendor. At any kind of scale you have massive amounts of tooling, technical debt, and institutional knowledge built around the cloud provider of choice.
Then there’s all of the legal, actually understanding billing (pretty much impossible but you’re probably close by now), etc elsewhere in the org. At this point you’ve probably utilized an outside service/consultant or two from the entire cottage industry that has sprung up to plug holes in/augment your cloud provider of choice.
After realizing their cloud spend has ballooned well beyond what they ever anticipated plenty of orgs get far enough to investigate leaving before they realize all of this. Most decide to suck it up and keep paying, or try to somehow negotiate or optimize spend down further.
Cloud platforms are a true masterclass in customer stickiness and retention - to the Oracle and Microsoft level (who also operate clouds).
It’s interesting here on HN because while MS and Oracle are bashed for these practices AWS and GCP (for the most part) are pretty beloved for what are really the same practices.
The app (and the demographic that it Serves) are very security and privacy-conscious, so security and privacy are the main coefficients. Most folks would be disappointed in how few features the app boasts, as each feature is a potential compromise. I'm glad to have low usage, as a result.
I wrote the backends, as well as the frontend, and have avoided third-party dependencies, all around.
It seems one of the first casualties of fast scaling is security.
I really didn't want to take any chances. An overly-complex architecture is begging for security compromises.
As a future representation of past me, I can tell you:
1. Everything it’s making you feel is valid.
1b. If you’re feeling burnt out, please listen to it. It gets worse if you let it.
2. While I can’t hire you now, I can already tell you’re eminently hireable. If you have any cautious inclination to move, you will probably be better served by greener pastures.
3. Just take care of yourself.
4. When 3 contradicts 2, favor 3.
Goodbye Heroku. :(
Sure, both are examples of "self-serving corporate communication" - but it's clear that the way Fly communicate here is more valuable and trustworthy than so many other examples of this kind of thing handled poorly.
I like Amplify and use it often. However, it isn't well integrated with "normal" backends, so if you want to keep a backend and frontend deployed together you either have to use their Amplify backend API or work out your own deployment.
Have you in all honesty and with first hand experience, deployed and supported in prod on swarm over hundreds of servers?
Of course, OSPF has topology and aggregation, too.
At any rate: I didn't design the system we're talking about.
It's hinted by the C-level that if I can pull this off, it would be nothing short of a miracle. I'm pretty sure I can negotiate salary, education, bonus, and what not if I can pull this off.
As far as next, I've thought about that. It would be funny to call myself a turnaround specialist. This would be quite a remarkable feat, but I really don't know if I would have taken this job if I knew what a mess this was...
- Machines seem like a waste of time
- Access directly to VMs is being removed (and doesn't support TCP over IPv4, or UDP over IPv6)
- The CDN is nice but should support private networking too.
- Volume management is deficient: It should be possible to access and fix volumes outside the context of an its app instance.
- Egress traffic should be free between apps over private networking, at least in the same DC.
https://github.com/google-github-actions/deploy-cloud-functi...
First off, it helps I've been 15+ years as an engineer, 5 as an engineering manager, and throughout have the community contributions in the field on my resume. I instantly spotted the problems when I was given an architecture diagram on day 1 and discussed what I would do differently. All that gives credibility.
If it's internal audience, I am brutally honest. The organization needs to know this wasn't happenstance and bad luck that put us where we are now. It was a deliberate series of bad decisions based on a poor engineering and product culture. Now, for better or for worse, we are tasked with paying the debt.
There's a certain class of customers that are sister companies under the same parent. I'm honest with them, too, but go on the offensive. They have abused my team and our company in the past, and unfortunately, we have let them. I am more than happy to fire back and go toe to toe with bad behavior, and at the same time working to fix critical support issues.
For external customers, I've had remarkably good response in listening to their complaints. I am honest in discussing, in deep engineering detail, how the new product will address their problems, where issues might still be, and development timeline. I like to think the credibility portion comes into play here. In the past, customers were just told, "We'll look at it" and "We'll fix it" but nothing was ever planned.
Like, it cannot possibly run on a single machine by definition because the product they're building is running customer code on edge compute nodes distributed across the globe.
The main selling point of their service is that it's _not_ a single machine.
Reliability is a thing that grows, like a plant. You start out with a new system or piece of software. It's fragile, small, weak. It is threatened by competing things and literal bugs and weather and the soil it's grown in and more. It needs constant care. Over time it grows stronger, and can eventually fend for itself pretty well. Sometimes you get lucky and it just grows fine by itself. And sometimes 50 different things conspire to kill it. But you have to be there monitoring it, finding the problems, learning how to prevent them. Every garden is a little different.
It doesn't matter what a company like Fly does technology wise. It takes time and care and churning. Eventually they will be reliable. But the initial process takes a while. And every new piece of tech they throw in is another plant in the garden.
So the good news is, they can become really reliable. But the bad news is, it doesn't come fast, and the more new plants they put in the ground, the more concerns there are to address before the garden is self sustaining.
I checked the repo and yeah, it checks out, Dokku _is_ pretty manageable with a decently small codebase. Having a low bus factor is really important for me. I'll check it out soon, and hopefully leave a donation to help you keep the project going too :)
there's some benefits to static stability and grey failure, but sure, whatever. the important bit is to have clear paths of aggregation and dissemination in your system.
that being said
> it doesn't matter what some server in SJC says it's hosting
it kind of does matter doesn't it? assuming that server in SJC is your forwarding proxy that does your global loadbalancing, what that server is aware of is highly relevant to what global actions you can take safely.
That said, I'd imagine with large enough scale, these sorts of features break anyways.
The cost aside, I'm wondering how fly or heroku support their customers when they grow to microservices ecosystems.
The problem shifts from deploying easily to deploying reliably meaning one release of a service should not break the other services. Other problems appear too, like service discovery, peer authentication, gateways, test and staging environments where there are downstream dependencies, etc.
Are customers supposed to leave when they grow to this level? Or are there solutions for these?
There's a lot of Nomad at , just won't get any publicity but that's different.
My boss is supportive, but he's also under heavy fire. Like I mentioned, my peers are rightfully skeptical. My team are a bunch of sharp, good guys, but haven't had any good guidance of mentorship in years, if not decades. They're all different, but what the have in common is that they've been screwed and judged unfairly thanks to past incompetence. That just pisses me off.
There's hope from those around me, but it's a pretty darn lonely job. You just gave me the fuel to not feel already beat up when I walk in the door tomorrow.
I'm not that familiar with setting up global network infrastructure but I imagine there are similar choices that can vastly affect initial reliability.
A master in bash will build more reliable API (in bash no less!) than a beginner in Rust, simply because of experience and knowing their way around the tools they're using. Newer/different technologies won't simply solve a problem unless the person has some sort of domain knowledge of said problem.
For example I have learned that the first step to reliability is removing as many hashicorp products from your stack as possible though. Appears I am not the only one.
Seems like they have a good understanding what the problems are so they will most likely be solved sooner or later.
Good work and keep honesty as open as you've done so far :)
What you want to know is the probability of a small, independent, high quality provider remaining independent, high quality and not bankrupt.
It does seem to be rare in the tech space, especially in the US. Becoming one of the largest public corporations on earth is one way to do it, as you suggested, but the odds of that happening are miniscule.
i do enjoy programming, working with other devs, but as soon as i stepped into a product management role it's a hellishly different set of problems. you're in the middle of the tech, the developers, the problems, and the customers. lots of lessons in there. tiring, but worthwhile.
It's seems true for fly's problem space, but in many problem spaces there really are easy engineering solutions to reliability problems.
For a very easy example, I once worked on a rails app that crashed frequently and managed 5 req/s at best. It turns out the app only loaded static data from hardcoded json files on disk and templated that into stuff. In other words, it was a static site. Replacing it with an actual static site + nginx and a cdn instantly fixed all reliability issues for that website forever, and made it easier to maintain the content to boot.
Solving hard problems like this seems interesting.
On the other hand it could be a giant shit show of micromanagement and toxicity, who knows really.
At the moment they aren't hiring though so that's that.
Do this up front. Do it as soon as you possibly can. You will lose a huge amount of negotiating leverage if you "wait until you show them". I cannot stress this enough.
I initially fought to get SSR working, fixing hydration errors and making sure our code was isomorphic.
I later realized that I can just use the parts of Next.js we need and turned off SSR. It wasn't a big value add for our particular product.
But doing this wasn't straightforward. I hadn't even realized it was a possibility until I stumbled across a blog post.
I had to copy a NoSSR implementation of the internet. It wasn't just some flag I could toggle for a page.
I've also found myself recommending Next to folks saying "Use Next.js, but btw you don't need use SSR. Make sure the trade-offs make sense."
I'm curious if I'm in the minority of Next.js users. What percentage of them don't need SSR but value everything else?
docker buildx build --push ...
curl -X POST https://lunni.example.com/api/webhooks/c8aaa9b8-1bda-4a99-820c-36a75d31f8a7
that will rebuild a Docker image, then trigger the redeploy.(I didn't find disabling SSR straightforward.)
I wonder if Vercel is underestimating the size of the market that just wants a "Heroku for React".
I know that it is possible to outgrow Swarm – I think that's a nice problem to have actually. We might include some tools for “graduating” from Lunni to something more serious like Kubernetes at some point.
Kinda like when crypto exchanges tweet "Yo we're definitely not blocking withdrawals, we're perfectly healthy".
You know what would make me consider a company? The fact that they don't have a bad reputation to begin with, and don't need to make posts like that to try to save their reputation.
Yeah, this is how you get companies to have the "well, it SEEMED they were having lots of issues, now it's clear that they indeed did, moving off of them is priority #1".
That said, the Biden administration's latest proposal might pass muster if the proposed redress mechanism were truly independent as part of the Judicial Branch of the United States as opposed to the current proposal which is still part of the Executive and thus conflicted in ruling against surveillance decisions of the Executive Branch and its agencies:
https://www.whitehouse.gov/briefing-room/statements-releases...
https://noyb.eu/en/open-letter-future-eu-us-data-transfers
That said, even US citizens don't enjoy meaningful protection against warrantless wiretapping that clearly violates the Fourth Amendment due to the deference the judiciary has given to the executive, so I am not optimistic.
Also, brand new software in general is like a new hybrid plant. How does it behave in this environment compared to other plants? Does it attract more bugs? Does it need different care? We don't know yet; it's new.
And even for an old well known plant, if the gardener hasn't gotten to know it yet, it's easy to make a mistake with its care. But a well known plant with a gardener who's grown it before is the most likely to work without issue.
That is not the same as using us based products
You can use Google Search and be 100% compliant, because Google doesn't see any customer data. Google chrome isn't even a service, I can't imagine how you'd manage to stick customer data in there.
And if you think there are no companies without AWS and Microsoft 360, you need to expand your horizon. I work for one such company, and so do many of my peers.
Where does this sentiment come from? Cost of compliance for Facebook is many orders of magnitude higher than cost of compliance for a website for your hairdresser or a restaurant.
In my startup, GDPR was barely a blip on our radar. We had to delete website logs and that's about that. You have to keep record of customers/payment information for laws that supersede GDPR, and that's it if you run a legitimate business not reliant on stealing.
Google Search will see PII go by if your marketing team is researching leads on LinkedIn for example.
> And if you think there are no companies without AWS and Microsoft 360, you need to expand your horizon. I work for one such company, and so do many of my peers.
And that's great.
What is the services stack your company is implementing?
What kind of alternatives do you use for your email, browser, centralised data storage, etc. ?
Partially so I could learn from mistakes and partially since I’m a sucker for post-mortems :)
When working at AWS, a large part of the convincing for an MS shop would be around showing that we can offer a lower price than the 'discounting' that MS provides. Oracle was all about contract expiry.
While there's some complexity around migrating a workload, regardless of where it's at, many places are going into cloud migrations hoping to remain relatively platform agnostic. I've seen many successful migrations to and from different vendors, and often at an SMB or ME scale, in weeks not years.
You just won't know until you fall off the cliff. The armchair quarterback can opine that you should have just hired experts in XYZ domains from the start to design robust systems that can scale to arbitrary sizes, but most orgs don't need to scale to arbitrary sizes so this is highly likely to be wasted effort.
Besides, I’m not even necessarily talking about hiring here - even consulting would have been sufficient to avoid this catastrophe.
This becomes very relevant for things like archiving data. If you generate data outside of a major cloud, you can pay a major cloud a very reasonable fee to archive it for you. But if you ever download your archive, it will cost you about half the price of buying an external disk to store it on.
(To be fair, object storage is rather more reliable than a single crappy external drive. But if you access the data more than once, maybe you should have a colo or on-prem copy too.)
I noticed you mention Vault lives in the US, I'm sure you've already heard of this pattern, but Vault (Enterprise) supports [multi-region clusters for performance and DR](https://developer.hashicorp.com/vault/tutorials/day-one-raft...)
But, yes TrueTime will not magically allow data to propagate at faster-than-light speeds.
Of course, IANAL, do your own research, etc.
I've only used Fly.io for a personal app but I think it's a great option so I hope they keep growing.
Portainer is also cool – we're using it internally as an API, actually! I've been using it before starting Lunni and my only objection is the UI. Portainer is kinda like a Swiss army knife for containers, but with this power comes the complexity, too.
For example, to see service logs, you have to pick an environment, then go to Stacks, find your stack, find the service you need, open it and then you'll see the service logs button. In Lunni, you open your stack right from the dashboard and click logs button right beside your service name: https://u.ale.sh/lunni-screenshot-logs-button.png
Decent analogy. And of course if you have people of vaguely similar skill levels then the brick house is going to be way more robust. Which was my point.
If I had a bad day and didn't get to complete something within my estimate, I'll tell my boss I had a bad day and ask for more time. Does that mean I have ulterior some ulterior motives? No, I just had a shit day, and needed some compassion.
They have been going through a rough patch recently with their scalability problems. And they realised they might not address it as easily or as quickly as they'd like. So they just wanted to buy time. I think that's better than "bunkering" and not letting your customers know what's up.
They do have the benefit that their audience is tech savvy as they are, so they can go into more details (and be less formal, I suppose) to get some understanding from their customers. As in, most devs have struggled at some point with a problem that exceeded the initial scope/time estimate. It sucks, and we know it sucks. So, why not give them the benefit of the doubt here?
Like, I think understand what you mean: their goal was to buy more time, and they achieved that. But even though it was corporate messaging, I still think it was genuine. I assume they felt a bit like "ok shit, we gotta talk to our customers, they deserve to know what's going on".
I guess they wouldn't air most their internal issue, since those don't aren't felt by the customers. So there's no need to apologise and explain themselves.
That's just to take a database log, put it into JSON, and zip it up.
That all for just one step in the data pipeline. The others are slightly less hairy, but 75% of this pipeline is literally just moving data around. It goes from JSON to MySQL to Postgres to Parquet. There is no data enrichment at all during these steps. It literally just unpacks from one format, packs to another and repeat.
The whole fucking thing is just one big masturbation circlejerk for a bunch of engineers that have thankfully been RIFed/forced out...
data_1 = `cat ./data1.json | grep "city" | awk ....`
data_2 = `cat ./data2.json | grep "city" | awk ....`
was exactly helping it to perform well. I'm sure rewriting the rails app to load all the data at startup, not to read each file via several hundred subshells on each request, would have made it perform well enough.However, pretty much no matter how well or poorly the rails site is built, a static site will be easier to run reliably.
If the destination bank account is outside the EU, they can't touch it without cooperation from the defendant countries courts - which requires you to file in the defendants venue. If an EU country unilaterally seized intra-bank remittance they would be cut off from the international banking system without hesitation.
You seem to really be grasping at straws here, but the EU is not some all powerful entity that can enforce its laws outside its jurisdiction.
> What kind of alternatives do you use for your email, browser, centralised data storage, etc. ?
There are plenty of browser alternatives (firefox, safari, vivaldi, even chromium).
There are dozens if not hundreds of email providers, and you can even provide your own.
You can 'centralize data storage' on disks on hardware you own, on premises or colocated. You could even use one of the dozens to hundreds of managed service and cloud providers.
One trick you might try: write future press releases. This helps you look beyond the immediate problems and focus on the destination. For example:
“Q3 2023: ACME CO released version Z today, which dramatically simplifies our engine to focus on core user needs. ‘It does thing I want an doesn’t crash anymore’ says Key Buyer #1”
By writing this down, you can put the vision in front of everyone. Then check it against actual progress to see how you’re doing.
Of course you can, you simply reach for assets within the border of said member country or the EU. As I mentioned in my previous comment, you can for example get the funds from outgoing payments by customers of said company. You can also freeze accounts, prevent ownership or investments by any citizen of that country as well.
> If the destination bank account is outside the EU, they can't touch it without cooperation from the defendant countries courts - which requires you to file in the defendants venue. If an EU country unilaterally seized intra-bank remittance they would be cut off from the international banking system without hesitation.
There is nothing unilateral about a country seising money as payment of a fine from a company. This is a standard tool that every countries' IRS equivalent agency have in their tool belt.
> You seem to really be grasping at straws here, but the EU is not some all powerful entity that can enforce its laws outside its jurisdiction.
I never said that EU is all powerful, however, if business is done within the EU, EU countries have the power to access any and all funds going to the US for companies that do not comply.
They can also decide to block said service as a punitive measure.
I meant both clouds and managed email / storages services.
> safari
Don't both Firefox and Safari have telemetry and various ping back services?
> There are dozens if not hundreds of email providers, and you can even provide your own.
> You can 'centralize data storage' on disks on hardware you own, on premises or colocated. You could even use one of the dozens to hundreds of managed service and cloud providers.
Sure you can, I'm just saying that it is rarely if ever done in medium to large companies.
Basically this is an argument around so-called premature optimization. Good to have issues now while it is mostly enthusiasts that are the customers. Guessing that this bump will be forgotten in five years? And not like AWS et al don't have outages occasionally that they learn from.
I am not sure how much you could negotiate but you can have something like that and being metric based. X% customers happy, x% rating change, x% customers retained when they were close to leave. Then you make the math of the revenue and profit and it’s hard to say no.
But when you're working on distributed systems that span the planet (say multi-master setups where ~every region can read and even write with low latency), you start thinking of the distance between your datacenters not in miles or kilometers but in milliseconds. The east coast and west coast of the US are at least 14 milliseconds apart:
% units "2680 miles" "c ms"
2680 miles = 14.386759 c ms
and that's not counting non-optimal routing, switching delays, or the speed of light in fiber (only 70% of c). Half of the circumference of the earth (~12500 miles) is likewise 67 milliseconds away absolute best case (unless you can somehow make fiber go through the earth).Which is exactly what I said. If the US company has an EU subsidiary you sue in that venue that can grant you relief. There are US tax implications of holding foreign assets, so the 1% of US companies with overseas interests create a foreign subsidiary, the other 99% have absolutely nothing within the reach of the EU.
> There is nothing unilateral about a country seising money as payment of a fine from a company.
Funds in transit belong to the sender until they arrive in the destination account. The EU would be seizing the funds of an innocent third party (the customer), and the target company would just shrug and say "your payment didn't arrive send it again." The EU cannot seize a transaction in flight and also compel the target company to honor it against their books.
> if business is done within the EU, EU countries have the power to access any and all funds going to the US for companies that do not comply.
See above. Taking money from random EU customers I guess is something they could do, but I imagine their citizenry would be none too pleased about it.
Let me try to simplify it for you: the EU cannot take what is not in EU jurisdiction without the cooperation of the foreign court. If a company says they were complying with their domestic law which violated EU law, they would likely not receive the cooperation of domestic courts to grant relief.
If say Google were to not follow the GDPR for example, even if they didn't have any European subsidiaries, the EU or a member country would simply make all Google customers pay their subscription fees to them instead of Google as fine payment for the fine. Customers would see no service disruption.
Feel free to call up your credit card or power company and ask them what happens if you send them a payment but it gets seized by the government along the way. Their answer will be that you still owe them money.
In your example the EU customers would be out the money, not Google. With no EU nexus (in your hypothetical) they cannot compel Google to provide services they were not paid for.
Because they would have been notified by a court beforehand and the fine would constitute an outstanding debt linked to a lost lawsuit.
Once that happens, the national collection agencies would take over and use the tools at their disposal, like collecting from customers directly, which is the equivalent of garnishing wages but for companies.
They would then receive regular updates about the remaining debt and what was already paid and by whom.
> Feel free to call up your credit card or power company and ask them what happens if you send them a payment but it gets seized by the government along the way. Their answer will be that you still owe them money.
If Google then refused service to the customers who's payments were redirected to that country's collection agencies, then additional punitive measure would be taken by the country.
Some of the punitive measure could be:
- growing interests on the outstanding debt
- blocking the service within the country or EU
- advertise that Google is delinquent and is refusing to pay it's debt to financial institutions
- prevent banks and financial institutions from loaning money or investing in Google
- configure an embargo for imports and exports towards Google
- extradition requests for C-suite or adding them to Interpol and Europol wanted people list
- etc.
> In your example the EU customers would be out the money, not Google. With no EU nexus (in your hypothetical) they cannot compel Google to provide services they were not paid for.
They can't force Google to provide services but Google will also lose that market (for the EU that's 450M people) and increasing punitive measures.
Also, Google refusing to pay would probably discourage financial institutions anywhere from servicing Google in the future and other countries from authorising Google on it's national market.