←back to thread

178 points themgt | 1 comments | | HN request time: 0.208s | source
Show context
matheist ◴[] No.45777358[source]
Can anyone explain (or link) what they mean by "injection", at a level of explanation that discusses what layers they're modifying, at which token position, and when?

Are they modifying the vector that gets passed to the final logit-producing step? Doing that for every output token? Just some output tokens? What are they putting in the KV cache, modified or unmodified?

It's all well and good to pick a word like "injection" and "introspection" to describe what you're doing but it's impossible to get an accurate read on what's actually being done if it's never explained in terms of the actual nuts and bolts.

replies(1): >>45778812 #
1. wbradley ◴[] No.45778812[source]
I’m guessing they adjusted the activations of certain edges within the hidden layers during forward propagation in a manner that resembles the difference in activation between two concepts, in order to make the “diff” seem to show up magically within the forward prop pass. Then the test is to see how the output responds to this forced “injected thought.”