←back to thread

669 points danso | 3 comments | | HN request time: 0.001s | source
Show context
_bxg1 ◴[] No.23260967[source]
This is the latest in a string of incidents where critical software systems, facing new pressure due to the pandemic, are catastrophically failing their users. I think what's happened in the past is that most public-facing software systems either a) were not really critical (because people had the alternative of doing things in-person), or b) (as in the case of all the ancient COBOL systems underpinning the US gov) had been made reliable over the years through sheer brute force as opposed to principled engineering. But in the latter case, as we saw with New Jersey's unemployment system, that "reliability" was fragile and contingent on the current state of affairs, and had no hope of withstanding a sudden shift in usage patterns.

Now we have various organizations - governmental and otherwise - hastily setting up online versions of essential services and it seems like every single one of them breaks on arrival.

We need some sort of standard for software engineering quality. I don't think this is an academic question anymore. Real people's lives are being impacted every day now by shoddy software, and with the current crisis they often have no alternative. Software that you or I could probably have executed better, but that the people who were hired to do it either a) couldn't, or b) didn't bother. It's nearly impossible for non-technical decision makers in these orgs to evaluate the quality of the systems they've hired people to build. We need quality assurance at an institutional level.

If not governmental, maybe an organization around this could be made by developers themselves. Not the "certified for $technology" certifications we have now, but a certification of fundamental software engineering skills and principles. A certification you can lose if you do something colossally irresponsible. At the end of the day, this dilution of quality is having a negative impact on our job field, so it concerns all of us. It leads to technical debt, micro-management, excessively rigid deadlines and requirements, which we all have to deal with. All of these are either symptoms of or coping mechanisms for management's inability to evaluate engineering quality.

replies(15): >>23261019 #>>23261187 #>>23261210 #>>23261239 #>>23261289 #>>23261414 #>>23261666 #>>23261696 #>>23261835 #>>23261851 #>>23261876 #>>23262059 #>>23262102 #>>23262525 #>>23263763 #
wmf ◴[] No.23261414[source]
18F released a pretty good guide about these topics but I can't shake the feeling that many organizations aren't willing to learn these lessons. https://github.com/18F/technology-budgeting/blob/master/hand...
replies(1): >>23261660 #
_bxg1 ◴[] No.23261660[source]
Guidelines are well and good, but they aren't really helpful when the people who care about them can't enforce them and vice-versa. What we need is accountability when it comes to the engineers who work on systems that are critical to large swaths of society.
replies(1): >>23261986 #
cybwraith ◴[] No.23261986[source]
You think this was an engineering decision? These failing systems were probably contracted to a politically connected company that subcontracted to lowest bidder. Not only that, but that usually these systems were created with COBOL means that it was likely created a very long time ago and minimally updated as laws/requirements changed to be compliant but thats it.

Thats not the fault of the engineer(s). A surge in traffic in the 80s or whenever it was initially created very well may have been able to be handled as designed and its normal traffic in modern pre-COVID times was the equivalent of a constant "surge" when initially designed. It was already on life support and needed a rewrite 10 years ago. Some software engineering certification/quality board wouldn't account for 30 year old systems design and population. Those are political and budget/prioritization issues. It would be a near equivalent of a bridge that was built then ignored for 50 years collapsing when a modern 18 wheeler drives over it.

All the new systems getting spun up ASAP are just quick hacks to try and get some way of addressing the problem. They are bound to be full of failures by the nature of the rapid development cycle and current crisis. In a situation like this, a quality board like proposed would be granting exceptions left and right because theoretically, something is better than nothing.

replies(1): >>23262052 #
1. _bxg1 ◴[] No.23262052[source]
> These failing systems were probably contracted to a politically connected company that subcontracted to lowest bidder.

And what if that government body established a policy that all contractors had to be certified engineers who hadn't lost their certification due to past negligence? Suddenly there's a much higher floor for "lowest bidder".

replies(2): >>23263554 #>>23268511 #
2. cybwraith ◴[] No.23263554[source]
If the software engineers have the legal capability a current professional engineer certification does to tell the project manager 'no', that might work. Its still less about engineering capability, and more about leverage and protection against retaliation for pushing back on bad ideas/timelines. Even in traditional engineering disciplines, not everyone working on the project is a certified professional engineer, in fact they are usually the minority
3. astura ◴[] No.23268511[source]
Good engineering can't undo bad management/process. Project management is what we really should work on to improve software quality