Uncensor any LLM with abliteration

1. okwhateverdude ◴[13 Jun 24 05:01 UTC] No.40666128[source]▶

I gave some of the llama3 ablated models (eg. https://huggingface.co/cognitivecomputations/Llama-3-8B-Inst...) a try and was pretty disappointed in the result. Could have been problems in the dataset, but overall, the model felt like it had been given a lobotomy. It would fail to produce stop tokens frequently and then start talking to itself.

replies(2): >>40666138 #>>40666399 #

2. Der_Einzige ◴[13 Jun 24 05:04 UTC] No.40666138[source]▶

>>40666128 (TP) #

I have entirely the opposite experience. Llama3 70b obliterated works perfectly and is willing to tell me how to commit mass genocide, all while maintaining quality outputs.

replies(3): >>40666337 #>>40666433 #>>40667051 #

3. infotainment ◴[13 Jun 24 05:50 UTC] No.40666337[source]▶

>>40666138 #

Same, I installed an implementation of an orthagonalized LLama3 and it seems to work just as well as the base model, sans refusals.

I believe this is the model I had good results with:

https://huggingface.co/wassname/meta-llama-3-8b-instruct-hel...

replies(1): >>40668315 #

4. lhl ◴[13 Jun 24 06:02 UTC] No.40666399[source]▶

>>40666128 (TP) #

They might have been doing it wrong, the code can be a bit tricky. I did a recent ablation on Qwen2 (removing Chinese censorship refusals) and ran MixEval benchmarks (0.96 correlation w/ ChatArena results)and saw a neglible performance difference (see model card for results): https://huggingface.co/augmxnt/Qwen2-7B-Instruct-deccp

5. m463 ◴[13 Jun 24 06:06 UTC] No.40666433[source]▶

>>40666138 #

> how to commit mass genocide, all while maintaining quality outputs.

sounds like a messed up eugenics filter.

replies(1): >>40667290 #

6. fransje26 ◴[13 Jun 24 07:47 UTC] No.40667051[source]▶

>>40666138 #

> Der_Einzige

> and is willing to tell me how to commit mass genocide, all while maintaining quality outputs

Ah, I see they fine-tuned it to satisfy the demands of the local market.. /s /s

7. ◴[13 Jun 24 08:36 UTC] No.40667290{3}[source]▶

>>40666433 #

8. tarruda ◴[13 Jun 24 11:31 UTC] No.40668315{3}[source]▶

>>40666337 #

The author also says this edited model increased perplexity (which as far as I understand, means the quality was lowered)