IMO the ability for a NN to compensate for bugs and unfounded assumptions in the model isn't a Good Thing in the slightest. Building latent-space diagnostics that can determine whether a network is wasting time working around bugs sounds like a worthwhile research topic in itself (and probably already is.)
The only thing that is scary is the hype, because this will make people sloppily use deep learning architectures for problems that do not need that level of expressive power, and because deep learning is challenging and not theoretically well understood, there will be little to no attempts made to ensure safe operation/quality assurance of the implemented solution.
It's been a few years since I worked on any program using boost asio, but at least back then if you straced it you'd find it constantly attempting to malloc hundreds of TB of ram, failing harmlessly, then continuing on with its life. (bet that will be fun when someone tries to run it on a system that supports that much virtual address space)
Similarly anything with any kind of feedback correction. PID controllers, codecs that code residuals-- you can get things horribly wrong and the later steps will paper it over.
Taking a step back you can even say that common software development practices-- a kind of meta program-- have the issue: A drunk squirrel sends you a patch full of errors, your test suite flags some which you fix. Then you ship all the bugs you didn't catch, because the test suite caused you to fix some issues but didn't change the fact that you were accepting code from a dubious source.
So I would say that the ML world is only special in that they exist almost entirely of self-correcting mechanisms and that inconsistent performance is broadly expected to a vastly greater degree, so when errors leak through you still may not react. If a calculator app told you that 2+2=5 you'd immediately be sure that something is actually broken, while if some LLM does it, it could just be an expected limitation (or even just sampling bad luck).