←back to thread

449 points lemper | 1 comments | | HN request time: 0.364s | source
Show context
benrutter ◴[] No.45036836[source]
> software quality doesn't appear because you have good developers. It's the end result of a process, and that process informs both your software development practices, but also your testing. Your management. Even your sales and servicing.

If you only take one thing away from this article, it should be this one! The Therac-25 incident is a horrifying and important part of software history, it's really easy to think type-systems, unit-testing and defensive-coding can solve all software problems. They definitely can help a lot, but the real failure in the story of the Therac-25 from my understanding, is that it took far too long for incidents to be reported, investigated and fixed.

There was a great Cautionary Tales podcast about the device recently[0], one thing mentioned was that, even aside from the catasrophic accidents, Therac-25 machines were routinely seen by users to show unexplained errors, but these issues never made it to the desk of someone who might fix it.

[0] https://timharford.com/2025/07/cautionary-tales-captain-kirk...

replies(13): >>45036898 #>>45037054 #>>45037090 #>>45037874 #>>45038109 #>>45038360 #>>45038467 #>>45038827 #>>45043421 #>>45044645 #>>45046867 #>>45046969 #>>45047517 #
WalterBright ◴[] No.45044645[source]
I'm going to disagree.

I have years of experience at Boeing designing aircraft parts. The guiding principle is that no single failure should cause an accident.

The way to accomplish this is not "write quality software", nor is it "test the software thoroughly". The idea is "assume the software does the worst possible thing. Then make sure that there's an independent system that will prevent that worst case."

For the Therac-25, that means a detector of the amount of radiation being generated, which will cut it off if it exceeds a safe value. I'd also add that the radiation generator be physically incapable of generating excessive radiation.

replies(9): >>45045090 #>>45045473 #>>45046078 #>>45046192 #>>45047920 #>>45048437 #>>45048717 #>>45049878 #>>45049910 #
vjvjvjvjghv ◴[] No.45045090[source]
In general I agree but there is bit more complexity. I work in medical devices and there are plenty of situations where a certain output is ok in some circumstance but deadly in another. That makes a stopgap a little more tricky.

I agree with the previous poster that the feedback from the field is lacking a lot. A lot of doctors don’t report problems back because they are used to bad interfaces. And then the feedback gets filtered through several layers of sales reps and product management. So a lot of info gets lost and fixes that could be simple won’t get done.

In general when you work in medical you are so overwhelmed by documentation and regulation that there isn’t much time left to do proper engineering. The FDA mostly looks at documentation done right and less at product done right.

replies(2): >>45045452 #>>45045655 #
darepublic ◴[] No.45045655[source]
16000 - 25000 rads right. Not safe under any circumstance?
replies(1): >>45046950 #
sgerenser ◴[] No.45046950[source]
Completely safe as long as the block of metal was in place. So you couldn’t just prevent the machine from putting out that much energy, you had to prevent it from doing that without the block in place.
replies(1): >>45048706 #
Gud ◴[] No.45048706[source]
So there should have been an interlocking system
replies(1): >>45049829 #
1. tech2 ◴[] No.45049829[source]
The earlier model that the 25 replaced was all mechanically interlocked. The belief was that software provided that same level of assurance. They performed manual testing but what they weren't able to do was reach a level of speed and fluency with the system to result in the failure modes which caused the issues. Lower hardware costs equals higher profit...