←back to thread

586 points mizzao | 2 comments | | HN request time: 0s | source
1. olliej ◴[] No.40666255[source]
While I still think presenting LLMs as "intelligent" is nonsense, I think this issue is interesting given the goal of these LLMs is just to produce a statistically plausible stream of text, it's always just a matter of constructing queries where the inappropriate output is statistically plausible given the model.

Similarly I think the concerns about bad output are overblown: an LLM may tell you how to make an X, where X is bad, but so will google, an LLM may produce biased output but so will google, the real issue is the people making these systems have managed to convince people that there is some kind of actual intelligence, so people accept the output as "a computer created it so it must be true" rather than "glorified output of google". People understand if you google "why is race X terrible" you'll get racist BS, but don't understand that if you ask an LLM to "explain why race X is terrible" you're just getting automatically rewritten version of the google output. (Though maybe google's "AI" search results will actually fix this misunderstanding more effectively than any explanatory blog post :D )

Anyway back to the problem, I really don't think there's a solution that is anything other then "run the output through a separate system that is just giving a 'is this text allowed given our rules'" before transmitting it to the requestor. You could combine this with training in future as well (you will eventually build up a large test set of queries producing inappropriate output that the generative model produces, and you can use that as the basis for adversarial training of the LLM). I know there's the desire to wrap in the content restrictions into the basic query handling because it's negligible more work to add those tokens to the stream, but mechanisms for filtering/identifying type of content are vastly cheaper than LLMs level "AI".

replies(1): >>40666458 #
2. nonrandomstring ◴[] No.40666458[source]
> so people accept the output as "a computer created it so it must be true"

This is the general form of the problem underlying half the news stories on any day.

Oddly there are historical roots in science fiction. But always, giant robots flailing their pincers and shouting "does not compute!!" were also cautionary tropes against silly conceits of perfection.

What keeps it going, is that it perfectly suits the richest and largest corporations since the East India Tea Company to have people (even very smart people) believing the things they sell are 'infallible'.