Hallucinations in code are the least dangerous form of LLM mistakes

(simonwillison.net)

Show context

t_mann ◴[02 Mar 25 21:45 UTC] No.43235506[source]▶

Hallucinations themselves are not even the greatest risk posed by LLMs. A much greater risk (in simple terms of probability times severity) I'd say is that chat bots can talk humans into harming themselves or others. Both of which have already happened, btw [0,1]. Still not sure if I'd call that the greatest overall risk, but my ideas for what could be even more dangerous I don't even want to share here.

[0] https://www.qut.edu.au/news/realfocus/deaths-linked-to-chatb...

[1] https://www.theguardian.com/uk-news/2023/jul/06/ai-chatbot-e...

replies(4): >>43235623 #>>43236225 #>>43238379 #>>43238746 #

hexaga ◴[02 Mar 25 23:04 UTC] No.43236225[source]▶

>>43235506 #

More generally - AI that is good at convincing people is very powerful, and powerful things are dangerous.

I'm increasingly coming around to the notion that AI tooling should have safety features concerned with not directly exposing humans to asymptotically increasing levels of 'convincingness' in generated output. Something like a weaker model used as a buffer.

Projecting out to 5-10 years: what happens when LLMs are still producing hallucinatory semi-sense, but merely comprehending it makes the machine temporarily own you? A bit like getting hair caught in an angle grinder, that.

Like most safety regulations, it'll take blood for the inking. Exposing mass numbers of people to these models strikes me as wildly negligent if we expect continued improvement along this axis.

replies(2): >>43236968 #>>43238275 #

1. southernplaces7 ◴[03 Mar 25 04:23 UTC] No.43238275[source]▶

>>43236225 #

>Projecting out to 5-10 years: what happens when LLMs are still producing hallucinatory semi-sense, but merely comprehending it makes the machine temporarily own you? A bit like getting hair caught in an angle grinder, that.

Seriously? Do you suppose that it will pull this trick off through some sort of hypnotizing magic perhaps? I have a hard time imagining any sort of overly verbose, clause and condition-ridden chatbot convincing anyone of sound mind to seriously harm themselves or do some egregiously stupid/violent thing.

The kinds of people who would be convinced by such "dangers" are likely to be mentally unstable or suggestible enough about it to in any case be convinced by any number of human beings anyhow.

Aside from demonstrating the persistent AI woo that permeats many comments on this site, the logic above reminds me of the harping nonsense around the supposed dangers of video games or certain violent movies "making kinds do bad things", in years past. The prohibitionist nanny tendencies behind such fears are more dangerous than any silly chatbot AI..

replies(2): >>43241236 #>>43242612 #

2. hexaga ◴[03 Mar 25 12:49 UTC] No.43241236[source]▶

>>43238275 (TP) #

If you believe current models exist at the limit of possible persuasiveness, there obviously isn't any cause for concern.

For various reasons, I don't believe that, which is why my argument is predicated on them improving over time. Obviously current models aren't overly hazardous in the sense I posit - it's a concern for future models that are stronger, or explicitly trained to be more engaging and/or convincing.

The load bearing element is the answer to: "are models becoming more convincing over time?" not "are they very convincing now?"

> [..] I have a hard time imagining any sort of overly verbose, clause and condition-ridden chatbot [..]

Then you're not engaging with the premise at all, and are attacking a point I haven't made. The tautological assurance that non-convincing AI is not convincing is not relevant to a concern predicated on the eventual existence of highly convincing AI: that sufficiently convincing AI is hazardous due to induced loss of control, and that as capabilities increase the loss of control becomes more difficult to resist.

replies(2): >>43247874 #>>43253374 #

3. aaronbaugher ◴[03 Mar 25 15:24 UTC] No.43242612[source]▶

>>43238275 (TP) #

I've seen people talk about using ChatGPT as a free therapist, so yes, I do think there's a good chance that they could be talked into self-destructive behavior by a chat bot that latched onto something they said and is "trying" to tell them what they want to hear. Maybe not killing themselves, but blowing up good relationships or quitting good jobs, absolutely.

These are people who have jobs and apartments and are able to post online about their problems in complete sentences. If they're not "of sound mind," we have a lot more mentally unstable people running around than we like to think we do.

replies(1): >>43253324 #

4. OkayPhysicist ◴[03 Mar 25 23:01 UTC] No.43247874[source]▶

>>43241236 #

You're describing a phase change in persuasiveness which we have no evidence for. If humans were capable of being immediately compelled to do something based on reading some text, advertisers would have taken advantage of that a looooong time ago.

Persuasion is mostly about establishing that doing or believing what you're telling them is in their best interest. If all my friends start telling me a piece of information, belief in that information has a real interest to me, as it would help strengthen social bonds. If I have a consciously weakly held belief in something, then a compelling argument would consist of providing enough evidence for a viewpoint that I could confidently hold that view and not worry I'll appear misinformed when speaking on it.

Convincing me to do something involves establishing that either I'll face negative consequences for not doing it, or positive rewards for doing it. AI has an extremely difficult time establishing that kind of credibility.

To argue that an AI could become persuasive to the point of mind control is to assert that one can compell a belief in another without the ability to take real-world action.

The absolute worst case scenario for a rogue AI is it leveraging people's belief in it to compel actions in others by way of a combination of blackmail, rewards, and threats of compelling others to commit violence on its behalf by a combination of the same.

We already live in a world with such artificial intelligences: we call them governments and corporations.

replies(1): >>43261846 #

5. southernplaces7 ◴[04 Mar 25 11:37 UTC] No.43253324[source]▶

>>43242612 #

>we have a lot more mentally unstable people running around than we like to think we do.

So what do you believe should be the case? That AI in any flexible communicative form be limited to a select number of people who can prove they're of sound enough mind to use it unfiltered?

You see how similar this is to historical nonsense about restricting the loaning or sale of books on certain subjects only to people of a certain supposed caliber or authority? Or banning the production and distribution of movies that were claimed to be capable of corrupting minds into committing harmful and immoral acts. How stupid do these historical restrictions look today in any modern society? That's how stupid this harping about the dangers of AI chatbots will look down the road.

The limitation of AI because it may or may not cause some people to do irrational things not only smacks of a persistent AI woo on this site, which drastically overstates the power of these stochastic parrot systems, but also seems to forget that we live in a world in which all kinds of information triggers could maybe make someone make stupid choices. These include books, movies, and all kinds of other content produced far more effectively and with greater emotional impact by completely human authors.

By claiming a need for regulating the supposed information and discourse dangers of AI chat systems, you're not only serving the cynically fear-mongering arguments of major AI companies who would love such a regulatory moat around their overvalued pet projects, you're also tacitly claiming that literature, speech and other forms of written, spoken or digitally produced expression should be restricted unless they stick to the banally harmless, by some very vague definitions of what exactly harmful content even is.

In sum, fuck that and the entire chain of implicit long-used censorship, moralizing nannyism, potential for speech restriction and legal over-reach that it so bloody obviously entails.

6. southernplaces7 ◴[04 Mar 25 11:46 UTC] No.43253374[source]▶

>>43241236 #

You completely misunderstand my argument with your nitpicking on a specific sarcastic description I made about the current communicative state of most AI chat systems.

In reality, even if they improve to be completely indistinguishable from the sharpest and most persuasive of human minds our society has ever known, i'd still make exactly the same arguments as above. I'd make these for the same reason that I'd argue for how no regulatory body or self-appointed filter of moral arbiters should be able to restrict the specific arguments and formas of expression currently available for persuasive human beings, or people of any kind.

Just as we shouldn't prohibit literature, film, internet blog posts, opinion pieces in media and any other sources by which people communicate their opinions and information to others under the argument that such opinions might be "harmful" , I wouldn't regulate AI sources of information and chatbots.

One can make an easy case for regulating and punishing the acts people try to perform based on information they obtain from AI, in terms of the measurable harm these acts would cause to others, but banning a source of information based on a hypothetical, ambiguous danger of its potential for corrupting minds is little different from the idiocy of restricting free expression because it might morally corrupt supposedly fragile minds.

replies(1): >>43260669 #

7. hexaga ◴[04 Mar 25 22:59 UTC] No.43260669{3}[source]▶

>>43253374 #

If your argument must rest on a caricature of weak persuasiveness attempting to persuade someone of something extremely disadvantageous to show how impossible hazardous persuasion is, there is something wrong. Nevertheless:

First, you argued the implausibility of strong persuasion. Your rhetoric was effectively "look how silly this whole notion of a machine persuading someone of something is, because how dumb would you need to be for this silly thing to convince you to do this very bad thing?"

That is then used to fuel an argument that I am merely propagating AI woo, consumed by magical thinking, and clearly am just afraid of something equivalent to violent video games and/or movies. The level of inferential contortion is difficult to wrap my head around.

Now, you seem to be arguing along an entirely different track: that AI models should have the inalienable right to self expression, for the same reason humans should have that right (I find it deeply ironic that this is the direction you'd choose after accusations of AI woo, but I digress). Or, equivalently, that humans should have the inalienable right to access and use such models.

This is no longer an argument about the plausibility of AI being persuasive, or that persuasion can be hazardous, but that we should permit it in spite of any of that because freedom of expression is generally a good thing.

(This is strange to me, because I never argued that the models should be banned or prohibited, merely that tooling should try to avoid direct human-to-model-output contact, as such contact (when model output is sufficiently persuasive) is hazardous. Much like how angle grinders or power tools are generally not banned, but have safety features preventing egregious bodily harms.)

> In reality, even if they improve to be completely indistinguishable from the sharpest and most persuasive of human minds our society has ever known, I'd still make exactly the same arguments as above.

While my true concern is systems of higher persuasiveness than humans have ever been exposed to, let's see:

> I have a hard time imagining [the most persuasive of human minds our society has ever known] convincing anyone of sound mind to seriously harm themselves or do some egregiously stupid/violent thing.

This is immediately falsified by the myriad examples of exactly this occurring, via a much lower bar than 'most persuasive person ever'. Hmm. Strange wonder that it requires a sarcastic caricature to not immediately seem like a nonsense argument.

Considering my entire position is simply that exposure to persuasion can be hazardous, I don't see what you're trying to prove now. It's certainly not in opposition to something I've said.

As it does seem you have shifted perspectives to the moral rather than the mechanistic, and that you've conceded that persuasion carries with it nontrivial hazard (even if we should entertain that hazard for the sake of our freedoms), are we now determining how much risk is acceptable to maintain freedoms? I'm not interested in having that discussion, as I don't purport to restrict said freedoms in any case.

Going back to the power tool analogy, you are of course free to disable safety precautions on your own personal angle grinder. At work, some sort of regulatory agency (OSHA, etc) will toil to stop your employer from doing so. I, personally, want a future of AI tooling akin to this. If AI are persuasive enough to be hazardous, I don't want to be forced by my employer to directly consume ultra-high-valence generated output. I want such high-valence content to be treated as the light of an arc-welder, something you're required to wear protection to witness or risk intervention by independent agencies that everybody grumbles about but enjoys the fruit of (namely, a distinct lack of exotic skin cancers and blindness in welders).

My point was originally and remains the bare observation that any of this will cost in blood, and whatever regulations are made will be inked in it.

I do understand the deeper motivations of your arguments, the desire to avoid (and/or fear of) gleeful overreach by the hands of AI labs who want nothing more than to wholly control all use of such models. That lies orthogonal to my basis of reasoning. It does not adequately contend with the realities of what to do when persuasiveness approaches sufficient levels. Is the truth now something to be avoided because it would serve the agenda of somebody in particular? Should we distort our understanding to not encroach on ideas that will be misappropriated by those with something to gain?

Ignoring any exposition on whether it is plausible or whether it caps out at human or subhuman or superhuman levels or any of the chaff about freedom of expression or misappropriation by motivated actors: if we do manage to build such a thing as I describe (and the hazard inherent is plainly obvious if the construction is not weakened, but resident still even if weakened), what do we do? How many millions will be exposed to these systems? How can it be made into something that retains utility yet is not a horror beyond reckoning?

There is a great deal more to say on the subject, I unfortunately don't have the time to explore it in any real depth here.

8. hexaga ◴[05 Mar 25 02:22 UTC] No.43261846{3}[source]▶

>>43247874 #

> You're describing a phase change in persuasiveness which we have no evidence for.

That's reasonable, and I really do hope this keeps on being the case. However, I would nit that I see this as a continuum rather than a phase change. That is, I think hazard smoothly increases with persuasiveness. I can point to some far off region and say: "oh, that seems quite concerning" but it doesn't start being so there.

Persuasiveness below the threshold of 'instant mind control' is still a hazard. Hanging out with salesmen on the job is like to loosen your wallet, even if it isn't guaranteed.

> If humans were capable of being immediately compelled to do something based on reading some text, advertisers would have taken advantage of that a looooong time ago.

I'd base my counter on the notion that the problem of persuasion is harder when you have less information about whom you're trying to convince.

To expand on the intuition behind that: advertisement-persuasion is hard in a way that conversational-persuasion is not. Shilling in conversational contexts (word of mouth) is more effective than generic advertisement.

A message that will convince one specific person is easier to generate than a message that will convince any random 10 people.

This proceeds to the idea that information about a person-under-persuasion is akin to power over them. Knowing not only what you believe but why you believe it and what else you believe adjacent to it and what you want is a force multiplier in this regard.

And so we get to AI models, which gather specific information about the mind of each person they interact with. The message is tailored to you and you alone, it is not a wide spectrum net cast to catch the largest possible number. Advertisements are qualitatively different; they do not 'pick your brain' nearly so much as the model does.

> Convincing me to do something involves establishing that either I'll face negative consequences for not doing it, or positive rewards for doing it. AI has an extremely difficult time establishing that kind of credibility.

> To argue that an AI could become persuasive to the point of mind control is to assert that one can compell a belief in another without the ability to take real-world action.

I don't agree with this because I don't agree with the premise that you must use a 'principled' approach to convince someone as you've described. People use heuristics to decide what to believe.

By dint of the bitter lesson, I think superhuman persuasion will involve stupid tricks of no particular principled basis that take advantage of 'invisible' vulnerabilities in human cognition.

That is, I don't think those 'reasons to believe the belief' matter. A child will believe the voice of their parents; it doesn't necessarily register that it's in their best interest or it will be bad for them if they don't. Bootstrapping children involves exploiting vulnerabilities in their psyche via implicit trust. Will the AI speak in the voice of my father, as I might hear it in prelingual childhood? Are all such mechanisms gone by adulthood? Is there anything like a generalized follow-the-leader-with-leader-detection pattern?

How hard is it for gradient descent to fit a solution to the boundaries of such heuristics?

This is however, getting into the weeds of exact mechanisms which I'm not too concerned with. I believe (but can't prove) that exploits of that nature exist (or that similarly effective means exist), and that they can be found via brute force search. I think the dominant methodology of continuously training chat models on conversational data those same models participate in is among the likeliest of ways to get to that point.

Ultimately, so long as there's no directed pressure to force people into contact with very convincing model output (see your rogue AI scenario), it doesn't seem that hard to make it safe: limit direct contact and/or require that tooling limits contact by default. Avoid multi-turn refinement and conversational history (amplification of persuasive power via mechanism described above). Treat it like a spinning blade and be it on your own head if you want to break yourself.

However, as I mentioned in my original comment, it will take blood for the inking. The incentives don't align to guard against this class of hazard from the get-go or even admit it is possible (merely to produce appearances of caring about 'safety' (read: our model won't do scary politically incorrect things!)), so we're going to see what happens when you mindlessly expose millions of people to it.

↑