←back to thread

213 points Philpax | 1 comments | | HN request time: 0.207s | source
Show context
1024core ◴[] No.42175184[source]
I didn't get the sudden leap from "position encodings" to "QKV" magic.

What is the connection between the two? Where does "Q" come from? What are "K" and "V"? (I know they stand for "Query", "Key", "Value"; but what do they have to do with position embeddings?)

replies(2): >>42175364 #>>42176614 #
1. flebron ◴[] No.42175364[source]
All of them are vectors of embedded representations of tokens. In a transformer, you want to compute the inner product between a query (the token who is doing the attending) and the key (the token who is being attended to). An inductive bias we have is that the neural network's performance will be better if this inner product depends on the relative distance between the query token's position, and the key token's position. We thus encode each one with positional information, in such a way that (for RoPE at least) the inner product depends only on the distance between these tokens, and not their absolute positions in the input sentence.