Uncensor any LLM with abliteration

(huggingface.co)

586 points mizzao | 4 comments | 13 Jun 24 03:42 UTC | HN request time: 0.021s | source

Show context

k__ ◴[13 Jun 24 07:21 UTC] No.40666893[source]▶

>>40665721 (OP) #

I played around with Amazon Q and while setting it up, I needed to create an IAM identity center.

Never did this before, so I was asking Q in the AWS docs how to do it.

It refused to help, as it didn't answer security related questions.

thank.

replies(7): >>40666950 #>>40667091 #>>40667339 #>>40669069 #>>40669289 #>>40669327 #>>40671251 #

menacingly ◴[13 Jun 24 07:54 UTC] No.40667091[source]▶

>>40666893 #

it’s similar asking the gemini-1.5 models about coding questions that involve auth

one of my questions about a login form also tripped a harassment flag

replies(1): >>40667279 #

michaelt ◴[13 Jun 24 08:34 UTC] No.40667279[source]▶

>>40667091 #

I suspect the refusal to answer questions about auth aren't a matter of hacking or offensive material.

I suspect instead the people training these models have identified areas of questioning where their model is 99% right, but because the 1% wrong is incredibly costly they dodge the entire question.

Would you want your LLM to give out any legal advice, or medical advice, or can-I-eat-this-mushroom advice, if you knew due to imperfections in your training process, it sometimes recommended people put glue in their pizza sauce?

replies(1): >>40667649 #

TeMPOraL ◴[13 Jun 24 09:42 UTC] No.40667649[source]▶

>>40667279 #

"If you can't take a little bloody nose, maybe you ought to go back home and crawl under your bed. It's not safe out here. It's wondrous, with treasures to satiate desires both subtle and gross... but it's not for the timid."

So sure, the LLM occasionally pranks someone, in a way similar to how random Internet posts do; it is confidently wrong, in a way similar to how most text on the Internet is confidently wrong because content marketers don't give a damn about correctness, that's not what the text is there for. As much as this state of things pains me, general population has mostly adapted.

Meanwhile, people who would appreciate a model that's 99% right on things where the 1% is costly, rightfully continue to ignore Gemini and other models by companies too afraid to play in the field for real.

replies(2): >>40667683 #>>40667933 #

1. rockskon ◴[13 Jun 24 09:49 UTC] No.40667683[source]▶

>>40667649 #

AI is not like some random person posting on the Internet.

A random person on the Internet often has surrounding context to help discern trustworthiness. A researcher can also query multiple sources to determine how much there is concensus about.

You can't do that with LLMs.

I cannot stress strongly enough that direct comparisons between LLMs and experts on the Internet are inappropriate.

replies(2): >>40667739 #>>40667813 #

2. TeMPOraL ◴[13 Jun 24 09:58 UTC] No.40667739[source]▶

>>40667683 (TP) #

> I cannot stress strongly enough that direct comparisons between LLMs and experts on the Internet are inappropriate.

In this context, I very much agree. But I'd like to stress that "experts on the Internet" is not what 99% of the users read 99% of the time, because that's not what search engines surface by default. When you make e.g. food or law or health-related queries, what you get back isn't written by experts - it's written by content marketers. Never confuse the two.

> A researcher can also query multiple sources to determine how much there is concensus about.

> You can't do that with LLMs.

A person like that will know LLMs hallucinate, and query multiple sources and/or their own knowledge, and/or even re-query the LLM several times. Such people are not in danger - but very much annoyed when perfectly reasonable queries get rejected on the grounds of "safety".

3. Y_Y ◴[13 Jun 24 10:12 UTC] No.40667813[source]▶

>>40667683 (TP) #

Why can't you estimate the trustworthiness of an LLM? I happen to think that you can, and that the above analogy was fine. You don't need to read someone's forum history to know you shouldn't to trust them on something high-stakes. Maybe instead of strongly stressing you should present a convincing argument.

replies(1): >>40673476 #

4. rockskon ◴[13 Jun 24 19:19 UTC] No.40673476[source]▶

>>40667813 #

Because if I already knew the answer then I wouldn't be asking the LLM?

↑