(huggingface.co)

586 points mizzao | 1 comments | 13 Jun 24 03:42 UTC | HN request time: 0s | source

Show context

okwhateverdude ◴[13 Jun 24 05:01 UTC] No.40666128[source]▶

I gave some of the llama3 ablated models (eg. https://huggingface.co/cognitivecomputations/Llama-3-8B-Inst...) a try and was pretty disappointed in the result. Could have been problems in the dataset, but overall, the model felt like it had been given a lobotomy. It would fail to produce stop tokens frequently and then start talking to itself.

replies(2): >>40666138 #>>40666399 #

1. lhl ◴[13 Jun 24 06:02 UTC] No.40666399[source]▶

>>40666128 #

They might have been doing it wrong, the code can be a bit tricky. I did a recent ablation on Qwen2 (removing Chinese censorship refusals) and ran MixEval benchmarks (0.96 correlation w/ ChatArena results)and saw a neglible performance difference (see model card for results): https://huggingface.co/augmxnt/Qwen2-7B-Instruct-deccp

↑

Uncensor any LLM with abliteration