←back to thread

492 points storf45 | 2 comments | | HN request time: 0.478s | source
Show context
shermantanktop ◴[] No.42160502[source]
Every time a big company screws up, there are two highly informed sets of people who are guaranteed to be lurking, but rarely post, in a thread like this:

1) those directly involved with the incident, or employees of the same company. They have too much to lose by circumventing the PR machine.

2) people at similar companies who operate similar systems with similar scale and risks. Those people know how hard this is and aren’t likely to publicly flog someone doing their same job based on uninformed speculation. They know their own systems are Byzantine and don’t look like what random onlookers think it would look like.

So that leaves the rest, who offer insights based on how stuff works at a small scale, or better yet, pronouncements rooted in “first principles.”

replies(15): >>42160568 #>>42160576 #>>42160579 #>>42160888 #>>42160913 #>>42161148 #>>42161164 #>>42161399 #>>42161529 #>>42161703 #>>42161724 #>>42161889 #>>42165352 #>>42166894 #>>42167814 #
karaterobot ◴[] No.42160579[source]
The only time I worked on a project that had a live television launch, it absolutely tipped over within like 2 minutes, and people on HN and Reddit were making fun of it. And I know how hard everyone worked, and how competent they were, so I sympathize with the people in these cases. While the internet was teeing off with easy jokes, engineers were swarming on a problem that was just not resolving, PMs were pacing up and down the hallway, people were getting yelled at by leadership, etc. It's like taking all the stress and complexity of a product launch and multiplying it by 100. And the thing I'm talking about was just a website, not even a live video stream.
replies(6): >>42160663 #>>42160778 #>>42161112 #>>42161381 #>>42161710 #>>42189210 #
adamredwoods ◴[] No.42161710[source]
Some breaks are just too difficult to predict. For example, I work in ecommerce and we had a page break because the content team pushed too many items into an array, that caused a back-end service to throw errors. Because we were the middle-service, taking from the CMS and making the request to back-end, not sure how we could have seen that issue coming in advance (and no one knew there was a limit).
replies(2): >>42161948 #>>42162423 #
1. tuukkah ◴[] No.42161948[source]
I'm not saying it's easy, but start by assuming that there's a limit and that any request can throw errors? (Proceed accordingly .)
replies(1): >>42162857 #
2. adamredwoods ◴[] No.42162857[source]
All requests expect errors. How a developer handles them... well...

And for limit checking, how often do you write array limit handlers? And if the BE contract doesn't specify? Additionally, it will need as a regression unit test, because who knows when the next developer will remove that limit check.