(huggingface.co)

586 points mizzao | 3 comments | 13 Jun 24 03:42 UTC | HN request time: 0.61s | source

1. Mathnerd314 ◴[13 Jun 24 05:28 UTC] No.40666249[source]▶

Reminds me of https://vgel.me/posts/representation-engineering/. There they were adding a control vector, w' = cvec + w, here they are "ablating" it, w' = w - dot(w,cvec)*cvec. There is an interesting field of learning how to "brain chip" LLMs into doing what you want.

replies(1): >>40666307 #

2. Der_Einzige ◴[13 Jun 24 05:43 UTC] No.40666307[source]▶

>>40666249 (TP) #

There's so much work just like this coming out simultaneously.

Steering Vectors, Control Vectors, PyReft, PeFT improvements, Obliteration. It's a great time to be doing representation engineering.

replies(1): >>40670760 #

3. Mathnerd314 ◴[13 Jun 24 15:19 UTC] No.40670760[source]▶

>>40666307 #

There is some difference between fine-tuning with PyReft / PeFT, the approaches here are more on-the-fly. Like you can regenerate the control vectors from prompts in a few seconds.

↑

Uncensor any LLM with abliteration