←back to thread

669 points danso | 1 comments | | HN request time: 0.203s | source
Show context
_bxg1 ◴[] No.23260967[source]
This is the latest in a string of incidents where critical software systems, facing new pressure due to the pandemic, are catastrophically failing their users. I think what's happened in the past is that most public-facing software systems either a) were not really critical (because people had the alternative of doing things in-person), or b) (as in the case of all the ancient COBOL systems underpinning the US gov) had been made reliable over the years through sheer brute force as opposed to principled engineering. But in the latter case, as we saw with New Jersey's unemployment system, that "reliability" was fragile and contingent on the current state of affairs, and had no hope of withstanding a sudden shift in usage patterns.

Now we have various organizations - governmental and otherwise - hastily setting up online versions of essential services and it seems like every single one of them breaks on arrival.

We need some sort of standard for software engineering quality. I don't think this is an academic question anymore. Real people's lives are being impacted every day now by shoddy software, and with the current crisis they often have no alternative. Software that you or I could probably have executed better, but that the people who were hired to do it either a) couldn't, or b) didn't bother. It's nearly impossible for non-technical decision makers in these orgs to evaluate the quality of the systems they've hired people to build. We need quality assurance at an institutional level.

If not governmental, maybe an organization around this could be made by developers themselves. Not the "certified for $technology" certifications we have now, but a certification of fundamental software engineering skills and principles. A certification you can lose if you do something colossally irresponsible. At the end of the day, this dilution of quality is having a negative impact on our job field, so it concerns all of us. It leads to technical debt, micro-management, excessively rigid deadlines and requirements, which we all have to deal with. All of these are either symptoms of or coping mechanisms for management's inability to evaluate engineering quality.

replies(15): >>23261019 #>>23261187 #>>23261210 #>>23261239 #>>23261289 #>>23261414 #>>23261666 #>>23261696 #>>23261835 #>>23261851 #>>23261876 #>>23262059 #>>23262102 #>>23262525 #>>23263763 #
1. majormajor ◴[] No.23261666[source]
> But in the latter case, as we saw with New Jersey's unemployment system, that "reliability" was fragile and contingent on the current state of affairs, and had no hope of withstanding a sudden shift in usage patterns.

"Reliable" and "Can survive a sudden shift in usage patterns" are extremely different things.

I think you have the causality backward. Engineering is about trade-offs. No quality guild will be able to wave those away. As long as the primary pressure is "get something that is functional enough at minimum time and cost" you're gonna have this.

(Software is particularly complicated because engineers, not just managers, have poor understanding of system quality and of each other's contribution quality. There's a combination of "it's not that complicated" complexity-blindness to business requirements and trade-offs that have to be traced through deep call stacks and across networks. We build things like chaos monkey - to prove resilience by seeing how hard it is to break the thing - because we don't have cost-effect techniques for actually understanding the system well enough short of operating it.)