Most active commenters
  • 0xDEAFBEAD(3)
  • wat10000(3)

←back to thread

449 points lemper | 15 comments | | HN request time: 0.256s | source | bottom
Show context
benrutter ◴[] No.45036836[source]
> software quality doesn't appear because you have good developers. It's the end result of a process, and that process informs both your software development practices, but also your testing. Your management. Even your sales and servicing.

If you only take one thing away from this article, it should be this one! The Therac-25 incident is a horrifying and important part of software history, it's really easy to think type-systems, unit-testing and defensive-coding can solve all software problems. They definitely can help a lot, but the real failure in the story of the Therac-25 from my understanding, is that it took far too long for incidents to be reported, investigated and fixed.

There was a great Cautionary Tales podcast about the device recently[0], one thing mentioned was that, even aside from the catasrophic accidents, Therac-25 machines were routinely seen by users to show unexplained errors, but these issues never made it to the desk of someone who might fix it.

[0] https://timharford.com/2025/07/cautionary-tales-captain-kirk...

replies(13): >>45036898 #>>45037054 #>>45037090 #>>45037874 #>>45038109 #>>45038360 #>>45038467 #>>45038827 #>>45043421 #>>45044645 #>>45046867 #>>45046969 #>>45047517 #
1. 0xDEAFBEAD ◴[] No.45038360[source]
Honestly I wish instead of the Therac-25, we were discussing a system which made use of unit testing and defensive coding, yet still failed. That would be more educational. It's too easy to look at the Therac-25 and think "I would never write a mess like that".
replies(5): >>45038635 #>>45038899 #>>45042566 #>>45044920 #>>45046431 #
2. wat10000 ◴[] No.45038635[source]
The lesson is not to write a mess like that. It might seem obvious, but it has to be learned.
replies(1): >>45044742 #
3. roeles ◴[] No.45038899[source]
One instance that crosses my mind often is the airbus a320 incident at Hamburg in 2008. Everything was done right there, but the requirements were wrong.

Despite all the procedures and tests, the software still managed to endanger the lives of the passengers.

replies(3): >>45039042 #>>45045268 #>>45052435 #
4. 0xDEAFBEAD ◴[] No.45039042[source]
Interesting, do you happen to have a case study?
replies(2): >>45039341 #>>45039438 #
5. ◴[] No.45039341{3}[source]
6. shmeeed ◴[] No.45039438{3}[source]
https://skybrary.aero/sites/default/files/bookshelf/1258.pdf
7. hinkley ◴[] No.45042566[source]
I bring up Knight Capital every time people start acting like feature toggles will solve every problem we have with feature rollout.

KC lost over $400 million in less than an hour due to an old feature toggle and a problem with their deployment process.

8. kccqzy ◴[] No.45044742[source]
Software engineering has advanced in the past few decades that the kind of code considered a "mess" has expanded.
replies(1): >>45047468 #
9. jopsen ◴[] No.45044920[source]
I'd agree, it's super easy to think such errors won't happen had they just used a fairly safe language and sane architecture. Or unit test, race detectors, etc.

I suspect that few organizations that do all that, have a process/culture of ignoring bugs in the wild -- and those that do have such complicated domains that explaining the error is hard.

Software best practices today would probably also involve sending metrics, logs, error reports, etc.

That said, it's still extremely easy get embrace a culture were unexplainable errors are ignored. Especially in a cloud environment.

10. I_dream_of_Geni ◴[] No.45045268[source]
Speaking of Airbus, They 'lost' 3-4 different aircraft (from 1988 to 2015) which crashed during development, or, spectacularly during their first airshow. Never slowed down their customers at ALL, and to this day, Boeing has never lost one new commercial airliner in those same circumstances. Yet, Boeing gets all the hate. smh
11. jldugger ◴[] No.45046431[source]
Perhaps this is why the cover of my software correctness book in undergrad used a series of stills from the arianne-5 disaster[1] for the cover.

[1]: https://en.wikipedia.org/wiki/Ariane_5#Notable_launches

12. wat10000 ◴[] No.45047468{3}[source]
We’ve invented entirely new ways to write bad code.
replies(1): >>45050181 #
13. 0xDEAFBEAD ◴[] No.45050181{4}[source]
Examples?
replies(1): >>45052610 #
14. Izkata ◴[] No.45052435[source]
The Boeing 737 MAX had an additional safety feature that was causing crashes due to bad input from the sensors, that pilots didn't know about so they couldn't override. This was 2018 and 2019. After the first crash, the manuals and training were updated to explain what was going on and how to override it.

https://en.wikipedia.org/wiki/Maneuvering_Characteristics_Au...

15. wat10000 ◴[] No.45052610{5}[source]
Dependency managers, Agile, Electron, AI, Java.