←back to thread

1226 points bishopsmother | 7 comments | | HN request time: 1.317s | source | bottom
Show context
pyentropy ◴[] No.35048126[source]
Almost half of the issues are caused by their use of HashiCorp products.

As someone that has started tons of Consul clusters, analyzed tons of Terraform states, developed providers and wrote a HCL parser, I must say this:

HashiCorp built a brand of consistent design & docs, security, strict configuration, distributed-algos-made-approachable... but at its core, it's a very fragile ecosystem. The only benefit of HashiCorp headaches is that you will quickly learn Golang while reading some obscure github.com/hashicorp/blah/blah/file.go :)

replies(2): >>35048318 #>>35049109 #
tptacek ◴[] No.35048318[source]
We are asking to HashiCorp products to do things they were not designed to do, in configurations that they don't expect to be deployed in. Take a step back, and the idea of a single global namespace bound up with Raft consistency for a fleet deployed in dozens of regions, providing near-real-time state propagation, is just not at all reasonable. Our state propagation needs are much closer to those of a routing protocol than a distributed key-value database.

I have only positive things to say about every HashiCorp product I've worked with since I got here.

replies(3): >>35048609 #>>35049327 #>>35050286 #
otterley ◴[] No.35049327[source]
Well, why did you do that? If you’d asked them whether this was a supported configuration or intended purpose, they’d have said no; and anyone who had experience deploying Consul at large scale would have told you the same.

There is truly no compression algorithm for experience.

replies(2): >>35049708 #>>35055005 #
1. mixmastamyk ◴[] No.35049708[source]
I don't think he personally designed the first implementation. But in any case, understanding of complex topics comes in waves.

Many times I've had to read all the docs then use a system for several months before the epiphany hits me.

replies(2): >>35052309 #>>35055742 #
2. otterley ◴[] No.35052309[source]
I also think there’s this tendency in the industry to want to solve problems on your own without the help from outsiders, even if they know the problem space better than you do, and even if they’d gladly help (often for free) if asked. It’s especially worrisome when it’s powering a key workload that is essential to the functioning of your business. Sometimes it’s because you might not know whom to consult or recruit, but in this case, the vendor was known.
3. JeremyNT ◴[] No.35055742[source]
This is especially true for scaling. A solution that works great for your current deployment may be completely unworkable for 2x your current deployment.

You just won't know until you fall off the cliff. The armchair quarterback can opine that you should have just hired experts in XYZ domains from the start to design robust systems that can scale to arbitrary sizes, but most orgs don't need to scale to arbitrary sizes so this is highly likely to be wasted effort.

replies(1): >>35055829 #
4. otterley ◴[] No.35055829[source]
While I largely agree with you, this isn’t one of those cases. If Fly wasn’t supposed to scale in due course to this size, it probably wouldn’t have been funded. If your business model is predicated on you scaling, yes, you should hire appropriately in anticipation of that.

Besides, I’m not even necessarily talking about hiring here - even consulting would have been sufficient to avoid this catastrophe.

replies(1): >>35073725 #
5. mixmastamyk ◴[] No.35073725{3}[source]
Yes, although it's rarely possible to know which bottlenecks will hurt the most up front. Unless you've done the same thing before, which is not the case with anyone pushing boundaries.

Basically this is an argument around so-called premature optimization. Good to have issues now while it is mostly enthusiasts that are the customers. Guessing that this bump will be forgotten in five years? And not like AWS et al don't have outages occasionally that they learn from.

replies(1): >>35110742 #
6. otterley ◴[] No.35110742{4}[source]
Consul has been around for close to 9 years now, and people have in fact tried to use Consul in the very same way Fly did, in many different business and industries, with similarly failing outcomes. Hashicorp knows this and almost certainly would have counseled against it if asked.
replies(1): >>35139705 #
7. mixmastamyk ◴[] No.35139705{5}[source]
Insert Donald Rumsfeld quote about un/known un/knowns.