Hallucinations in code are the least dangerous form of LLM mistakes

(simonwillison.net)

371 points ulrischa | 1 comments | 02 Mar 25 19:15 UTC | HN request time: 0s | source

Show context

t_mann ◴[02 Mar 25 21:45 UTC] No.43235506[source]▶

Hallucinations themselves are not even the greatest risk posed by LLMs. A much greater risk (in simple terms of probability times severity) I'd say is that chat bots can talk humans into harming themselves or others. Both of which have already happened, btw [0,1]. Still not sure if I'd call that the greatest overall risk, but my ideas for what could be even more dangerous I don't even want to share here.

[0] https://www.qut.edu.au/news/realfocus/deaths-linked-to-chatb...

[1] https://www.theguardian.com/uk-news/2023/jul/06/ai-chatbot-e...

replies(4): >>43235623 #>>43236225 #>>43238379 #>>43238746 #

hexaga ◴[02 Mar 25 23:04 UTC] No.43236225[source]▶

>>43235506 #

More generally - AI that is good at convincing people is very powerful, and powerful things are dangerous.

I'm increasingly coming around to the notion that AI tooling should have safety features concerned with not directly exposing humans to asymptotically increasing levels of 'convincingness' in generated output. Something like a weaker model used as a buffer.

Projecting out to 5-10 years: what happens when LLMs are still producing hallucinatory semi-sense, but merely comprehending it makes the machine temporarily own you? A bit like getting hair caught in an angle grinder, that.

Like most safety regulations, it'll take blood for the inking. Exposing mass numbers of people to these models strikes me as wildly negligent if we expect continued improvement along this axis.

replies(2): >>43236968 #>>43238275 #

southernplaces7 ◴[03 Mar 25 04:23 UTC] No.43238275[source]▶

>>43236225 #

>Projecting out to 5-10 years: what happens when LLMs are still producing hallucinatory semi-sense, but merely comprehending it makes the machine temporarily own you? A bit like getting hair caught in an angle grinder, that.

Seriously? Do you suppose that it will pull this trick off through some sort of hypnotizing magic perhaps? I have a hard time imagining any sort of overly verbose, clause and condition-ridden chatbot convincing anyone of sound mind to seriously harm themselves or do some egregiously stupid/violent thing.

The kinds of people who would be convinced by such "dangers" are likely to be mentally unstable or suggestible enough about it to in any case be convinced by any number of human beings anyhow.

Aside from demonstrating the persistent AI woo that permeats many comments on this site, the logic above reminds me of the harping nonsense around the supposed dangers of video games or certain violent movies "making kinds do bad things", in years past. The prohibitionist nanny tendencies behind such fears are more dangerous than any silly chatbot AI..

replies(2): >>43241236 #>>43242612 #

hexaga ◴[03 Mar 25 12:49 UTC] No.43241236{3}[source]▶

>>43238275 #

If you believe current models exist at the limit of possible persuasiveness, there obviously isn't any cause for concern.

For various reasons, I don't believe that, which is why my argument is predicated on them improving over time. Obviously current models aren't overly hazardous in the sense I posit - it's a concern for future models that are stronger, or explicitly trained to be more engaging and/or convincing.

The load bearing element is the answer to: "are models becoming more convincing over time?" not "are they very convincing now?"

> [..] I have a hard time imagining any sort of overly verbose, clause and condition-ridden chatbot [..]

Then you're not engaging with the premise at all, and are attacking a point I haven't made. The tautological assurance that non-convincing AI is not convincing is not relevant to a concern predicated on the eventual existence of highly convincing AI: that sufficiently convincing AI is hazardous due to induced loss of control, and that as capabilities increase the loss of control becomes more difficult to resist.

replies(2): >>43247874 #>>43253374 #

southernplaces7 ◴[04 Mar 25 11:46 UTC] No.43253374{4}[source]▶

>>43241236 #

You completely misunderstand my argument with your nitpicking on a specific sarcastic description I made about the current communicative state of most AI chat systems.

In reality, even if they improve to be completely indistinguishable from the sharpest and most persuasive of human minds our society has ever known, i'd still make exactly the same arguments as above. I'd make these for the same reason that I'd argue for how no regulatory body or self-appointed filter of moral arbiters should be able to restrict the specific arguments and formas of expression currently available for persuasive human beings, or people of any kind.

Just as we shouldn't prohibit literature, film, internet blog posts, opinion pieces in media and any other sources by which people communicate their opinions and information to others under the argument that such opinions might be "harmful" , I wouldn't regulate AI sources of information and chatbots.

One can make an easy case for regulating and punishing the acts people try to perform based on information they obtain from AI, in terms of the measurable harm these acts would cause to others, but banning a source of information based on a hypothetical, ambiguous danger of its potential for corrupting minds is little different from the idiocy of restricting free expression because it might morally corrupt supposedly fragile minds.

replies(1): >>43260669 #

1. hexaga ◴[04 Mar 25 22:59 UTC] No.43260669{5}[source]▶

>>43253374 #

If your argument must rest on a caricature of weak persuasiveness attempting to persuade someone of something extremely disadvantageous to show how impossible hazardous persuasion is, there is something wrong. Nevertheless:

First, you argued the implausibility of strong persuasion. Your rhetoric was effectively "look how silly this whole notion of a machine persuading someone of something is, because how dumb would you need to be for this silly thing to convince you to do this very bad thing?"

That is then used to fuel an argument that I am merely propagating AI woo, consumed by magical thinking, and clearly am just afraid of something equivalent to violent video games and/or movies. The level of inferential contortion is difficult to wrap my head around.

Now, you seem to be arguing along an entirely different track: that AI models should have the inalienable right to self expression, for the same reason humans should have that right (I find it deeply ironic that this is the direction you'd choose after accusations of AI woo, but I digress). Or, equivalently, that humans should have the inalienable right to access and use such models.

This is no longer an argument about the plausibility of AI being persuasive, or that persuasion can be hazardous, but that we should permit it in spite of any of that because freedom of expression is generally a good thing.

(This is strange to me, because I never argued that the models should be banned or prohibited, merely that tooling should try to avoid direct human-to-model-output contact, as such contact (when model output is sufficiently persuasive) is hazardous. Much like how angle grinders or power tools are generally not banned, but have safety features preventing egregious bodily harms.)

> In reality, even if they improve to be completely indistinguishable from the sharpest and most persuasive of human minds our society has ever known, I'd still make exactly the same arguments as above.

While my true concern is systems of higher persuasiveness than humans have ever been exposed to, let's see:

> I have a hard time imagining [the most persuasive of human minds our society has ever known] convincing anyone of sound mind to seriously harm themselves or do some egregiously stupid/violent thing.

This is immediately falsified by the myriad examples of exactly this occurring, via a much lower bar than 'most persuasive person ever'. Hmm. Strange wonder that it requires a sarcastic caricature to not immediately seem like a nonsense argument.

Considering my entire position is simply that exposure to persuasion can be hazardous, I don't see what you're trying to prove now. It's certainly not in opposition to something I've said.

As it does seem you have shifted perspectives to the moral rather than the mechanistic, and that you've conceded that persuasion carries with it nontrivial hazard (even if we should entertain that hazard for the sake of our freedoms), are we now determining how much risk is acceptable to maintain freedoms? I'm not interested in having that discussion, as I don't purport to restrict said freedoms in any case.

Going back to the power tool analogy, you are of course free to disable safety precautions on your own personal angle grinder. At work, some sort of regulatory agency (OSHA, etc) will toil to stop your employer from doing so. I, personally, want a future of AI tooling akin to this. If AI are persuasive enough to be hazardous, I don't want to be forced by my employer to directly consume ultra-high-valence generated output. I want such high-valence content to be treated as the light of an arc-welder, something you're required to wear protection to witness or risk intervention by independent agencies that everybody grumbles about but enjoys the fruit of (namely, a distinct lack of exotic skin cancers and blindness in welders).

My point was originally and remains the bare observation that any of this will cost in blood, and whatever regulations are made will be inked in it.

I do understand the deeper motivations of your arguments, the desire to avoid (and/or fear of) gleeful overreach by the hands of AI labs who want nothing more than to wholly control all use of such models. That lies orthogonal to my basis of reasoning. It does not adequately contend with the realities of what to do when persuasiveness approaches sufficient levels. Is the truth now something to be avoided because it would serve the agenda of somebody in particular? Should we distort our understanding to not encroach on ideas that will be misappropriated by those with something to gain?

Ignoring any exposition on whether it is plausible or whether it caps out at human or subhuman or superhuman levels or any of the chaff about freedom of expression or misappropriation by motivated actors: if we do manage to build such a thing as I describe (and the hazard inherent is plainly obvious if the construction is not weakened, but resident still even if weakened), what do we do? How many millions will be exposed to these systems? How can it be made into something that retains utility yet is not a horror beyond reckoning?

There is a great deal more to say on the subject, I unfortunately don't have the time to explore it in any real depth here.

↑