←back to thread

492 points storf45 | 4 comments | | HN request time: 0s | source
Show context
shermantanktop ◴[] No.42160502[source]
Every time a big company screws up, there are two highly informed sets of people who are guaranteed to be lurking, but rarely post, in a thread like this:

1) those directly involved with the incident, or employees of the same company. They have too much to lose by circumventing the PR machine.

2) people at similar companies who operate similar systems with similar scale and risks. Those people know how hard this is and aren’t likely to publicly flog someone doing their same job based on uninformed speculation. They know their own systems are Byzantine and don’t look like what random onlookers think it would look like.

So that leaves the rest, who offer insights based on how stuff works at a small scale, or better yet, pronouncements rooted in “first principles.”

replies(15): >>42160568 #>>42160576 #>>42160579 #>>42160888 #>>42160913 #>>42161148 #>>42161164 #>>42161399 #>>42161529 #>>42161703 #>>42161724 #>>42161889 #>>42165352 #>>42166894 #>>42167814 #
karaterobot ◴[] No.42160579[source]
The only time I worked on a project that had a live television launch, it absolutely tipped over within like 2 minutes, and people on HN and Reddit were making fun of it. And I know how hard everyone worked, and how competent they were, so I sympathize with the people in these cases. While the internet was teeing off with easy jokes, engineers were swarming on a problem that was just not resolving, PMs were pacing up and down the hallway, people were getting yelled at by leadership, etc. It's like taking all the stress and complexity of a product launch and multiplying it by 100. And the thing I'm talking about was just a website, not even a live video stream.
replies(6): >>42160663 #>>42160778 #>>42161112 #>>42161381 #>>42161710 #>>42189210 #
1. adamredwoods ◴[] No.42161710[source]
Some breaks are just too difficult to predict. For example, I work in ecommerce and we had a page break because the content team pushed too many items into an array, that caused a back-end service to throw errors. Because we were the middle-service, taking from the CMS and making the request to back-end, not sure how we could have seen that issue coming in advance (and no one knew there was a limit).
replies(2): >>42161948 #>>42162423 #
2. tuukkah ◴[] No.42161948[source]
I'm not saying it's easy, but start by assuming that there's a limit and that any request can throw errors? (Proceed accordingly .)
replies(1): >>42162857 #
3. steve_adams_86 ◴[] No.42162423[source]
> Some breaks are just too difficult to predict.

Absolutely. I think a great filter for developers is determining how well they understand this. Over-simplification of problems and certainty about one’s ability to build reliable services at scale is a massive red flag to me.

I have to say some of the hardest challenges I’ve encountered were in e-commerce, too.

It’s a lot harder and more interesting than I think many people realize. I learned so much working on those projects.

In one case, the system relied on SQLite and god damn did things go sideways as the company grew its customer base. That was the fastest database migration project I’ve ever been on, haha.

I often think it could have worked today. SQLite has made huge leaps in the areas we were struggling. I’m not sure it would have been a forever solution (the company is massive now), but it would have bought us some much-needed time. It’s funny how that stuff changes. A lot of my takeaways about SQLite 10 years ago don’t apply quite the same anymore. I use it for things now that I never would have back then.

4. adamredwoods ◴[] No.42162857[source]
All requests expect errors. How a developer handles them... well...

And for limit checking, how often do you write array limit handlers? And if the BE contract doesn't specify? Additionally, it will need as a regression unit test, because who knows when the next developer will remove that limit check.