←back to thread

213 points Philpax | 2 comments | | HN request time: 0.001s | source
Show context
valine ◴[] No.42169009[source]
One of the things I really love about rope is that it allows for a lot of interesting encoding schemes at inference time without model retraining. I’ve had a lot of fun playing with different relative positions. You can elicit a lot of interesting behaviors from the model when you use different rotations for keys vs queries, they don’t always have to match.

For example exact position doesn’t matter too much when tokens are spaced out. Let’s say you use token position 100 for your query, you can shift all the keys around position 100, and the further they are back in the context the more freedom you have to play with the value.

replies(2): >>42171938 #>>42175164 #
1. zackangelo ◴[] No.42175164[source]
I'm surprised this is the case! I've been working on a rope implementation for my own project (needed to account for padding in unique situations) and even an off by one error usually causes the model to produce non-sensical output.
replies(1): >>42175269 #
2. valine ◴[] No.42175269[source]
You have to be careful to keep the relative positions for adjacent and nearby tokens intact. The relative positions of distant tokens are less brittle.