Uncensor any LLM with abliteration

(huggingface.co)

586 points mizzao | 2 comments | 13 Jun 24 03:42 UTC | HN request time: 0.438s | source

Show context

olalonde ◴[13 Jun 24 10:31 UTC] No.40667926[source]▶

>>40665721 (OP) #

> Modern LLMs are fine-tuned for safety and instruction-following, meaning they are trained to refuse harmful requests.

It's sad that it's now an increasingly accepted idea that information one seeks can be "harmful".

replies(5): >>40667968 #>>40668086 #>>40668163 #>>40669086 #>>40670974 #

ajkjk ◴[13 Jun 24 10:38 UTC] No.40667968[source]▶

>>40667926 #

Seems like an obviously good thing given that it is true. These new beliefs are solutions to new problems

replies(1): >>40668117 #

1. noduerme ◴[13 Jun 24 11:09 UTC] No.40668117[source]▶

>>40667968 #

Since LLMs spit out lies and misinformation as often as truth, getting them to spit out less harmful lies is probably good. However, the whole technology is just a giant bullshit generator. It's only viable because no one actually checks facts and facts are rapidly being replaced with LLM-generated bullshit.

So I'm not sure how much it matters if the LLM masters prevent it from repeating things that are overtly racist, or quoting how to make thermite from the Jolly Roger. (I wouldn't trust GPT-4's recipe for thermite even if it would give one). At the end of the day, the degradation of truth and fidelity of the world's knowledge is the ultimate harm that's unavoidable in a technology that is purported to be intelligent but is in fact a black box autocomplete system spewing endless garbage into our infosphere.

replies(1): >>40668826 #

2. ajkjk ◴[13 Jun 24 12:28 UTC] No.40668826[source]▶

>>40668117 (TP) #

So you're saying, because it can't be done perfectly, it's not worth doing at all?

Seems wrong. Although otherwise I feel the same way about LLMs.

↑