> I don't think "AI safety" is the right abstraction because it came from the idea that AI would start off as an imaginary agent living in a computer that we'd teach stuff to. Whereas what we actually have is a giant pretrained blob that (unreliably) emits text when you run other text through it.
I disagree, that's simply the behaviour one of the best consumer-facing AI that gets all the air-time at the moment. (Weirdly, loads of people even here talk about AI like it's LLMs even though diffusion based image generators are also making significant progress and being targeted with lawsuits).
AI is automation — the point is to do stuff we don't want to do for whatever reason (including expense), but it does it a bit wrong. People have already died from automation that was carefully engineered but which still had mistakes; machine learning is all about letting a system engineer itself, even if you end up making a checkpoint where it's "good enough", shipping that, and telling people they don't need to train it any more… though they often will keep training it, because that's not actually hard.
We've also got plenty of agentic AI (though as that's a buzzword, bleh, lots of scammers there too), independently of the fact that it's very easy to use even an LLM (which is absolutely not designed or intended for this) as a general agent just by putting it into a loop and telling it the sort of thing it's supposed to be agentic with regards to.
Even with constrained decoding, so far as I can tell the promises are merely advert, while the reality is that's these things are only "pretty good": https://community.openai.com/t/how-to-get-100-valid-json-ans...
(But of course, this is a fast-moving area, so I may just be out of date even though that was only from a few months ago).
However, the "it's only pretty good" becomes "this isn't even possible" in certain domains; this is why, for example, ChatGPT has a disclaimer on the front about not trusting it — there's no way to know, in general, if it's just plain wrong. Which is fine when writing a newspaper column because the Gell-Mann amnesia effect says it was already like that… but not when it's being tasked with anything critical.
Hopefully nobody will use ChatGPT to plan an economy, but the point of automation is to do things for us, so some future AI will almost certainly get used that way. Just as a toy model (because it's late here and I'm tired), imagine if that future AI decides to drop everything and invest only in rice and tulips 0.001% of the time. After all, if it's just as smart as a human, and humans made that mistake…
But on the "what about humans" perspective, you can also look at the environment. I'd say there's no evil moustache twirling villains who like polluting the world, but of course there are genuinely people who do that "to own the libs"; but these are not the main source of pollution in the world, mostly it's people making decisions that seem sensible to them and yet which collectively damage the commons. Plenty of reason to expect an AI to do something that "seems sensible" to its owner, which damages the commons, even if the human is paying attention, which they're probably not doing for the same reason M3 shareholders probably weren't looking very closely to what M3 was doing — "these people are maximising my dividend payments… why is my blood full of microplastics?"